Commit graph

172 commits

Author SHA1 Message Date
Fredrik Roubert
0178a07a26 ICU-22793 Clang-Tidy: google-readability-casting
https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/google/readability-casting.html
2024-07-04 22:32:12 +02:00
Fredrik Roubert
5401c12018 ICU-22621 Clang-Tidy: modernize-use-nullptr
https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/modernize/use-nullptr.html
2024-03-15 14:31:54 +01:00
Frank Tang
02d5e71903 ICU-22342 Implement ExternalBreakEngineAPI
ICU-22342 Fix comments
2023-08-30 11:43:16 -07:00
Markus Scherer
b6dcc95d3c ICU-21833 remove redundant void parameter lists
See #2351
2023-03-02 09:31:57 -08:00
Frank Yung-Fong Tang
80414a247b ICU-22224 Enable UBSAN and fix breakage
See #2324
2023-02-27 17:31:49 -08:00
Frank Tang
638acd0c38 ICU-21374 Add a CFI build bot for ICU4C
Add the github action bot to build with cfi
Also fix all the known issues which require the change from C style cast to
static_cast inside the i18n and common directory while we are sure about
the object. and use
C++ style dynamic_cast for base-to-derive cast in other code inside i18n
and common and in test code or tool.
Change to use const_cast for casting between const / non-const
2023-02-06 15:47:14 -08:00
Andy Heninger
67a7e2caf0 ICU-21180 RuleBasedBreakIterator, refactor init.
In class RuleBasedBreakIterator, refactor how object initialization is handled
by the various constructors, taking advantage of C++11's ability to directly
initialize data members in the class declaration.

This will simplify ongoing maintenance of the code by eliminating the need
to keep initialization lists synchronized with the class data members.
This is being done now in preparation for additional changes to fix problems
with the handling of memory allocation failures.
2022-11-02 16:25:41 -07:00
Andy Heninger
866254ef12 ICU-21180 BreakIterator, change all NULL to nulptr
In the C++ break iterator code, change all use of NULL to nullptr.
This is in preparation for follow-on PRs to improve out-of-memory error handling
in Break Iterators, keeping use of nullptr consistent between old and new
or updated code.
2022-10-26 18:55:48 -07:00
Fredrik Roubert
030fa1a479 ICU-21148 Consistently use standard lowercase true/false everywhere.
This is the normal standard way in C, C++ as well as Java and there's no
longer any reason for ICU to be different. The various internal macros
providing custom boolean constants can all be deleted and code as well
as documentation can be updated to use lowercase true/false everywhere.
2022-09-07 20:56:33 +02:00
Andy Heninger
85705f04e0 ICU-21960 C++20 Warnings from ATOMIC_VAR_INIT
Remove the ICU macros ATOMIC_INT32_T_INITIALIZER and U_INITONCE_INITIALIZER,
which made use of C++ ATOMIC_VAR_INIT, which has been removed from C++20.

With modern C++ features being available, these macros no longer served
any real need.
2022-05-17 15:45:06 -07:00
allenwtsu
d0290c03db ICU-21699 Phrase based breaking(C++)
See #1936
2022-01-13 20:22:05 -08:00
Andy Heninger
081cf77330 ICU-21662 Improve UVector error handling.
- Add updated versions of UVector::addElement() and ensureCapacity() that respect
  incoming errors.
  Follow on to c26aebe, which renamed the original versions.

- Add UVector::adoptElement() as a replacement for addElement() when the vector
  has a deleter function set, meaning that it adopts ownership of its elements.

  The intent is to make the behavior clearer at the call sites when looking
  at unfamiliar code.

- Make all functions with an incoming failure, as indicated by a UErrorCode parameter,
  leave the vector unchanged.

- Change all functions that store object pointers into the vector such that,
  when the store cannot be completed for any reason _and_ the vector has a deleter function,
  then the incoming object is deleted.

  This change can simplify the error handling code around calls to the affected functions
  (addElement() and insertElementAt(), in particular)

- Add index bounds checking on functions where it was possible - that is, on functions
  with both U_ErrorCode and index parameters.

- Changed to more modern C++ idioms in some parts of the UVector implementation.

- Review & update as required all uses of the UVector functions
  setElementAt(), insertElementAt(), setSize(), sortedInsert()
  these being the functions with changed behavior on error conditions
  (aside from addElement()).

This PR will be followed by more, switching call sites in various ICU services
from UVector::addElementX() (old behavior on errors)
to   UVector::addElement()  (new behavior on errors)
2021-09-02 19:15:36 -07:00
Fredrik Roubert
0a1cfa398c ICU-20973 Use standard keywords true & false to initialize type bool.
Now when all equality operators return standard bool (commit 633438f),
it no longer makes any sense to use the ICU4C constants TRUE & FALSE
or local variables of type UBool for their return value.
2021-08-26 18:53:10 +02:00
Fredrik Roubert
633438f8da ICU-20973 Change all equality operator return types from UBool to bool. 2021-08-17 00:35:00 +02:00
luz paz
73eca0a9c9 ICU-21580 Fix typos in icu4c/
Found via `codespell -q 3 -L ans,anumber,atleast,ba,bre,hace,nd,nin,ois,rsource,som,sur,tht -S icu4c/source/data/zone,icu4c/source/data/lang`
ICU-21580 Fix source (related) typos
ICU-21580 Revert extraneous auto-encoding
ICU-21580 Re-add previous reverted fix without auto-encoding
2021-07-19 13:22:38 -05:00
Erik Torres
3f043c7693 ICU-21555 Fix typos from G to L
See #1737
2021-06-07 16:09:09 -07:00
Erik Torres Aguilar
bd3b202741 ICU-21018 Fix typos across repo that start with letter A
See #1506
2021-01-06 15:15:35 -08:00
Andy Heninger
003b431540 ICU-13590 RBBI, improve handling of concurrent look-ahead rules.
Change the mapping from rule number to boundary position to use a simple array
instead of a linear search lookup map.

Look-ahead rules have a preceding context, a boundary position, and following context.
In the implementation, when the preceding context matches, the potential boundary
position is saved. Then, if the following context proves to match, the saved boundary is
returned as an actual boundary.

Look-ahead rules are numbered, and the implementation maintains a map from
rule number to the tentative saved boundary position.

In an earlier improvement to the rule builder, the rule numbering was changed to be a
contiguous sequence, from the original sparse numbering. In anticipation of
changing the mapping from number to position to use a simple array.
2020-07-21 14:39:15 -07:00
Andy Heninger
1eef362329 ICU-13565 Break Iteration, remove the dictionary bit from the implementation.
For identifying text that needs to be handled by a word dictionary for Break Iteration,
change from using a bit in the character category to sorting all dictionary categories
together, and recording the boundary between the non-dictionary and dictionary ranges.

This is internal to the implementaion. It does not affect behavior.
It does increase the number of character categories that can be handled using a
compact 8 bit Trie, from 127 to 255.
2020-06-17 12:00:14 -07:00
Andy Heninger
f0ad454691 ICU-13565 RBBI, make all state table row data be unsigned. 2020-06-01 20:05:17 -07:00
Frank Tang
c5ebb80a73 ICU-13565 Reduce size of BreakIterator brk files
See #1100
2020-05-27 14:26:10 -07:00
Frank Tang
94c9ff2089 ICU-20991 Trace BreakIterator/BreakEngine creation
See #1014
2020-03-06 14:18:43 -08:00
Andy Heninger
faa2f9f9e1 ICU-20303 Break Iterator, improve handling of look-ahead rules.
- Merge the look-ahead results slots used when multiple rules share a common accepting state.
- Sequentially number the look-ahead result slot. Will eventually allow replacing the runtime map with an array.
- Inhibit chaining out of look-ahead rules. This could never actually happen; when a hard break
  rule matches, the engine is stopped immediately, but the state table was being constructed
  as if it could  happen. Reduces table size for line break rules.
- Remove incorrect handling of fAccepting and fLookAhead fields of a state table row
  when removing duplicate states. Look-ahead slot number was being mis-interpreted as a state number.
2019-12-13 13:17:21 -08:00
Markus Scherer
f02b496494 ICU-20783 C++ covariant return types: clone(), freeze() & friends 2019-08-22 16:24:41 -07:00
Fredrik Roubert
5d6d29b76a ICU-20601 Remove superfluous semicolons (-Wextra-semi-stmt).
These are the same changes for the C++ code as was done for the C code
by commit 17606e0345.
2019-08-15 12:30:21 +02:00
Shane Carr
ab657778e4 ICU-20543 Fix -Wundef in library and test code. 2019-04-10 18:52:16 -07:00
Shane Carr
b596462d5a ICU-20508 Fixing -Wextra-semi in library code. 2019-03-22 15:29:45 -07:00
Jeff Genovy
5c8960e59e ICU-20074 Revise UPRV_UNREACHABLE macro to always call abort().
Moved the macro from platform.h to uassert.h.
Removed any "unreachable" code that previously occurred after the UPRV_UNREACHABLE macro is used.
Changes based on review from Andy.

Co-authored-by: Daniel Ju <daju@microsoft.com>
2019-01-24 18:50:04 -08:00
Daniel Ju
7453181fff ICU-20074 Define UPRV_UNREACHABLE macro for unreachable code
Replaced occurrences of U_ASSERT(FALSE) with new UPRV_UNREACHABLE macro.
2019-01-14 14:16:26 -08:00
Andy Heninger
6e5a5463b4
ICU-20043 Compile warning fix with improved portability. (#78) 2018-09-27 14:27:38 -07:00
Daniel Ju
b13c951348
ICU-20043 ICU-13214 ICU-13764 MSVC W3 and W4 warning cleanup (#53)
Cleaned up all of the MSVC W3 warnings and most of the W4 warnings in the common and i18n projects.
2018-09-27 14:27:38 -07:00
Andy Heninger
aead9fb553 ICU-13194 RBBI auto reverse tables: size reduction, and remove hand written rules.
X-SVN-Rev: 41163
2018-03-28 01:20:13 +00:00
Andy Heninger
b1b0be93ea ICU-13194 RBBI safe tables, all tests passing!
X-SVN-Rev: 41155
2018-03-26 23:01:16 +00:00
Andy Heninger
660d38bc7f ICU-13194 rbbi safe rule synth, work in progress.
X-SVN-Rev: 41118
2018-03-17 00:34:48 +00:00
Andy Heninger
0a41842733 ICU-13541 rbbi.cpp, try again to fix xlC build problem.
X-SVN-Rev: 41042
2018-03-01 21:00:46 +00:00
Andy Heninger
627506cfb1 ICU-13541 RBBI object layout optimizations, revert failed AIX fix.
X-SVN-Rev: 41040
2018-03-01 19:33:46 +00:00
Andy Heninger
595e9e61c4 ICU-13541 RBBI object layout optimizations, try to fix AIX build.
X-SVN-Rev: 40987
2018-02-26 22:59:42 +00:00
Andy Heninger
8640bee541 ICU-10688 Remove redundant break type logic from BreakIterators. Merge to trunk.
X-SVN-Rev: 40967
2018-02-21 23:10:10 +00:00
Markus Scherer
c9d3abe36f ICU-11955 return nullptr without dereferencing when out-of-memory
X-SVN-Rev: 40943
2018-02-16 22:32:05 +00:00
Andy Heninger
3d4a3fbaa8 ICU-13569 rbbi state table opt, work in progress.
X-SVN-Rev: 40855
2018-02-08 01:42:04 +00:00
Andy Heninger
628ec44872 ICU-13541 RBBI patch #2 from grhoten. Optimize object layout.
X-SVN-Rev: 40812
2018-01-27 01:07:26 +00:00
Andy Heninger
ac0972f12c ICU-13541 Improve RuleBasedBreakIterator construction time, patch from grhoten.
X-SVN-Rev: 40789
2018-01-19 22:30:56 +00:00
Markus Scherer
27f8d70bcd ICU-13503 declare variable-length array at end of struct with length 1 to disable bounds checkers
X-SVN-Rev: 40736
2017-12-14 21:25:46 +00:00
Andy Heninger
023e8b289f ICU-10688 Remove break iterator type logic. It's implicit from the rules.
X-SVN-Rev: 40687
2017-12-04 02:14:32 +00:00
Andy Heninger
ca7b62180e ICU-10688 branch, work in progress.
X-SVN-Rev: 40686
2017-12-03 00:36:54 +00:00
Andy Heninger
cb1f0a68f4 ICU-9954 Fix coverity warning.
X-SVN-Rev: 40436
2017-09-20 22:58:39 +00:00
Andy Heninger
4e1c4096a6 ICU-9954 Break Iteration, remove reverse rules, add boundary caching.
X-SVN-Rev: 40433
2017-09-19 18:17:22 +00:00
Andy Heninger
2b5557fce6 ICU-12519 Break Iterator assignment handles Locales.
X-SVN-Rev: 40301
2017-07-31 20:20:37 +00:00
Andy Heninger
a3a2b57516 ICU-12507 ICU4C RBBI, switch to UTrie2
X-SVN-Rev: 40105
2017-05-03 23:44:14 +00:00
Andy Heninger
b1880dfdb7 ICU-13028 Thread safe static init of default string for RuleBasedBreakIterator::getRules()
X-SVN-Rev: 40074
2017-04-23 19:35:52 +00:00