Add the github action bot to build with cfi
Also fix all the known issues which require the change from C style cast to
static_cast inside the i18n and common directory while we are sure about
the object. and use
C++ style dynamic_cast for base-to-derive cast in other code inside i18n
and common and in test code or tool.
Change to use const_cast for casting between const / non-const
In class RuleBasedBreakIterator, refactor how object initialization is handled
by the various constructors, taking advantage of C++11's ability to directly
initialize data members in the class declaration.
This will simplify ongoing maintenance of the code by eliminating the need
to keep initialization lists synchronized with the class data members.
This is being done now in preparation for additional changes to fix problems
with the handling of memory allocation failures.
In the C++ break iterator code, change all use of NULL to nullptr.
This is in preparation for follow-on PRs to improve out-of-memory error handling
in Break Iterators, keeping use of nullptr consistent between old and new
or updated code.
This is the normal standard way in C, C++ as well as Java and there's no
longer any reason for ICU to be different. The various internal macros
providing custom boolean constants can all be deleted and code as well
as documentation can be updated to use lowercase true/false everywhere.
Remove the ICU macros ATOMIC_INT32_T_INITIALIZER and U_INITONCE_INITIALIZER,
which made use of C++ ATOMIC_VAR_INIT, which has been removed from C++20.
With modern C++ features being available, these macros no longer served
any real need.
- Add updated versions of UVector::addElement() and ensureCapacity() that respect
incoming errors.
Follow on to c26aebe, which renamed the original versions.
- Add UVector::adoptElement() as a replacement for addElement() when the vector
has a deleter function set, meaning that it adopts ownership of its elements.
The intent is to make the behavior clearer at the call sites when looking
at unfamiliar code.
- Make all functions with an incoming failure, as indicated by a UErrorCode parameter,
leave the vector unchanged.
- Change all functions that store object pointers into the vector such that,
when the store cannot be completed for any reason _and_ the vector has a deleter function,
then the incoming object is deleted.
This change can simplify the error handling code around calls to the affected functions
(addElement() and insertElementAt(), in particular)
- Add index bounds checking on functions where it was possible - that is, on functions
with both U_ErrorCode and index parameters.
- Changed to more modern C++ idioms in some parts of the UVector implementation.
- Review & update as required all uses of the UVector functions
setElementAt(), insertElementAt(), setSize(), sortedInsert()
these being the functions with changed behavior on error conditions
(aside from addElement()).
This PR will be followed by more, switching call sites in various ICU services
from UVector::addElementX() (old behavior on errors)
to UVector::addElement() (new behavior on errors)
Now when all equality operators return standard bool (commit 633438f),
it no longer makes any sense to use the ICU4C constants TRUE & FALSE
or local variables of type UBool for their return value.
Change the mapping from rule number to boundary position to use a simple array
instead of a linear search lookup map.
Look-ahead rules have a preceding context, a boundary position, and following context.
In the implementation, when the preceding context matches, the potential boundary
position is saved. Then, if the following context proves to match, the saved boundary is
returned as an actual boundary.
Look-ahead rules are numbered, and the implementation maintains a map from
rule number to the tentative saved boundary position.
In an earlier improvement to the rule builder, the rule numbering was changed to be a
contiguous sequence, from the original sparse numbering. In anticipation of
changing the mapping from number to position to use a simple array.
For identifying text that needs to be handled by a word dictionary for Break Iteration,
change from using a bit in the character category to sorting all dictionary categories
together, and recording the boundary between the non-dictionary and dictionary ranges.
This is internal to the implementaion. It does not affect behavior.
It does increase the number of character categories that can be handled using a
compact 8 bit Trie, from 127 to 255.
- Merge the look-ahead results slots used when multiple rules share a common accepting state.
- Sequentially number the look-ahead result slot. Will eventually allow replacing the runtime map with an array.
- Inhibit chaining out of look-ahead rules. This could never actually happen; when a hard break
rule matches, the engine is stopped immediately, but the state table was being constructed
as if it could happen. Reduces table size for line break rules.
- Remove incorrect handling of fAccepting and fLookAhead fields of a state table row
when removing duplicate states. Look-ahead slot number was being mis-interpreted as a state number.
Moved the macro from platform.h to uassert.h.
Removed any "unreachable" code that previously occurred after the UPRV_UNREACHABLE macro is used.
Changes based on review from Andy.
Co-authored-by: Daniel Ju <daju@microsoft.com>