Modify ICU4C and ICU4J test readers to handle all tests
Add `ignoreJava` and `ignoreCpp` properties to tests where needed
Includes parser bug fixes:
ICU4J: require a complex-body after declarations
ICU4J: Correctly parse the complex body after an unsupported statement
ICU4J: Handle date params in tests and remove default params for tests
ICU4J: Handle decimal params in tests
ICU4J: Require whitespace before variable/literal in reserved annotation
ICU4J: Require whitespace between options
ICU4J: Require a variable-expression in an .input declaration
ICU4J: don't require space between last key and pattern in variant
ICU4J: don't require space between selectors
ICU4J: allow whitespace after '=' in option
ICU4J: parse escape sequences in quoted literals according to grammar
ICU4J: allow whitespace within markup after attributes list
This is the normal standard way in C, C++ as well as Java and there's no
longer any reason for ICU to be different. The various internal macros
providing custom boolean constants can all be deleted and code as well
as documentation can be updated to use lowercase true/false everywhere.
- 7 new properties: API constants & property names
- u_stringHasBinaryProperty(s, property) & UCharacter.hasBinaryProperty(s, property)
- two additional source data files
- new genprops part for writing new binary data file uemoji.icu
- data for existing emoji properties moved from uprops.icu (hardcoded in C++) to uemoji.icu (always loaded)
- new EmojiProps implementation
1. Add GA to test BreakIterator under LSTM configuration (remove Thai
and Burmese dictionary and include Thai and Burmese LSTM)
2. Add LSTMDataName for the purpose of testing.
3. Add file base test code to test BreakIterator match results from test
file generated by pythong code in
https://github.com/unicode-org/lstm_word_segmentation/blob/master/segment_text.py
4. Fix a LSTMBreakEngine::divideUpDictionaryRange bug when the return value
should only contains the number of words found when the passed in foundBreaks
already contains some data.
5. Change the cintltest TestSwapData from testing thaidict to laodict so
it will not break while we filter out thaidict under the LSTM
configuration.
- Still allows "1234" or "cldrbug:1234" format ticket IDs
- However, docs recommend "ICU-1234" or "CLDR-1234" format
in the future.
- Other ticket IDs could be used, but won't be linkified.
- Use STATIC_NEW for mutex creation, to avoid order-of-destruction problems
by avoiding destruction altogether, while avoiding memory leak reports.
- Remove UConditionVar, replace with direct use of std::condition_variable
Remove the dependencies from the ICU library code on static constructors
that were introduced by using std::mutex and condition variables. The
mutexes are lazily initialized by embedding them as local static variables
in getter functions, and relying on the C++ compiler/runtime to do thread
safe initialization of them.
Other issues addressed:
* Some performance enhancements were added for good measure. Creating new RuleBasedNumberFormat objects can take a long time due to all the rule parsing. This was ported from ICU4J.
* I fixed a potential infinite recursion problem when RuleBasedNumberFormat used NumberFormat.createInstance, which could occasionally depend on creating RuleBasedNumberFormat for itself, which was bad. This was ported from ICU4J.
* I fixed a potential memory leak due to lazy initialization of some RBNF data members in a multithreaded environment, which is fine in Java, but it's not okay in C++. We no longer cast away const due to this, which is good.
* There were some compiler warnings and errors found while trying to debug this code on my machine. I fixed those too.
X-SVN-Rev: 37810