Commit graph

360 commits

Author SHA1 Message Date
Robin Leroy
215131c1a4 ICU-22984 Remove some obnoxious tests from 2003 2025-02-19 23:30:34 +01:00
Robin Leroy
8a11097b2e ICU-22984 Generate the C++ UAX29 monkeys 2025-01-29 10:02:44 +01:00
Robin Leroy
a154b5839c ICU-22984 code motion: move SegmentationRule before RBBIMonkeyKind 2025-01-29 10:02:44 +01:00
Robin Leroy
7d60bb844e ICU-22986 GL takes CM 2024-12-20 03:54:59 +01:00
Robin Leroy
e59065cc74 ICU-22984 Clean up old monkeys 2024-12-04 18:38:23 +01:00
Robin Leroy
757f27cd35 ICU-22984 Move old monkeys 2024-12-04 18:38:23 +01:00
Robin Leroy
3f959352b5 ICU-22984 Optimize old monkeys 2024-12-04 18:38:23 +01:00
Robin Leroy
5519b85730 ICU-22984 Generate old monkeys 2024-12-04 18:38:23 +01:00
Robin Leroy
e000c5c3cc ICU-22127 Remove obsolete WordBreakTest.txt known issues 2024-11-22 18:40:54 +01:00
Robin Leroy
0b9eb9ca71 ICU-22956 Use InCB for grapheme cluster segmentation 2024-11-12 10:45:16 +01:00
Robin Leroy
8d86ca142e ICU-22941 Revert "ICU-22112 word break updates for @,colon; colon tailorings for fi,sv"
This reverts commit 49d192fefe.
2024-11-05 22:59:24 +01:00
Robin Leroy
ca9fcca3c7 ICU-21097 Remove LineBreakTest.txt workarounds 2024-10-09 00:03:34 +02:00
Frank Yung-Fong Tang
8437d1d86b ICU-22767 Fix GCC warning and turn warning to errors
See #3129
2024-10-02 13:35:03 -07:00
Fredrik Roubert
37b2bc6999 ICU-22721 Use correct initializer list syntax.
This will make the code ever so slightly simpler but more importantly
make it possible to compile also when using -D_GLIBCXX_DEBUG.
2024-08-13 21:33:53 -07:00
Robin Leroy
20fdebcb35 ICU-22707 UTC-180? Give up on 16.0β rules, amend LB10 and LB21a instead. 2024-07-18 23:56:34 +00:00
Robin Leroy
cc64ec7c94 ICU-22707 feed more bits to the starving monkeys: ranlux48 rather than a 32-bit LCG 2024-07-18 23:56:34 +00:00
Robin Leroy
47a8ea4065 ICU-22707 smarter old monkeys: refine the partition on interesting sets 2024-07-18 23:56:34 +00:00
Robin Leroy
79f4745494 ICU-22707 UTC-179-C25 Limit LB21a to the Hebrew-hyphen-non-Hebrew case 2024-07-18 23:56:34 +00:00
Robin Leroy
83f3334b96 ICU-22707 UTC-179-C32 Upstream and improve the old Finnish tailoring LB20a from CLDR-3029 and ICU-8151 2024-07-18 23:56:34 +00:00
Robin Leroy
36fe0f0660 ICU-22707 UTC-179-C28 Simplify the UAX14 formulation 2024-07-18 23:56:34 +00:00
Robin Leroy
84ff5dacf8 ICU-22707 UTC-179-C28 LB19 change for simplified chinese 2024-07-18 23:56:34 +00:00
Robin Leroy
1513b66c32 ICU-22707 UTC-179-C35 Remove redundant rule, caught after UTC-179. Called out in review note in https://www.unicode.org/reports/tr14/tr14-52.html#LB25. 2024-07-18 23:56:34 +00:00
Robin Leroy
d3b361f23a ICU-22707 UTC-179-C35 LB25 alignment with the UAX14 formulation from 15.1 and earlier. 2024-07-18 23:56:34 +00:00
Robin Leroy
509f552e38 ICU-22707 UTC-179-C35 No regexes for old monkeys: Express LB25 using UAX14-style rules rather than a regex 2024-07-18 23:56:34 +00:00
Robin Leroy
9391cbb0b3 ICU-22707 Print random 🙈🙉🙊🐵🐒 rather than ... so that progress is visible when the screen is full and report monkey counts 2024-07-18 23:56:34 +00:00
Fredrik Roubert
0178a07a26 ICU-22793 Clang-Tidy: google-readability-casting
https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/google/readability-casting.html
2024-07-04 22:32:12 +02:00
Markus Scherer
fe23620f12 ICU-22707 appease rbbitst UBSan for RBBIStateTableRow 2024-04-29 17:00:55 -07:00
Fredrik Roubert
6ad78a08c7 ICU-22621 Clang-Tidy: readability-redundant-control-flow
https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/readability/redundant-control-flow.html
2024-03-19 15:55:56 +01:00
Fredrik Roubert
5401c12018 ICU-22621 Clang-Tidy: modernize-use-nullptr
https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/modernize/use-nullptr.html
2024-03-15 14:31:54 +01:00
Fredrik Roubert
2a1853c9a9 ICU-22621 Clang-Tidy: modernize-use-emplace
https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/modernize/use-emplace.html
2024-03-13 16:31:47 +01:00
Robin Leroy
ba1208e49b ICU-22518 Add a flag to export the output of the reference implementation from the old segmentation monkey tests 2024-02-08 04:54:33 +01:00
Frank Tang
9832f48e22 ICU-22636 Return U_BRK_RULE_SYNTAX when status number is too large
See #2793
2024-01-19 17:16:54 -08:00
Frank Tang
19af9e7ce3 ICU-22602 Fix stack overflow inside flattenVariables
Limit the recursive call of flattenVariables to maximum depth 3500
since Java on my machine throw stack overflow exception around 3900.
2023-12-14 15:14:21 -08:00
Frank Tang
4a7d61d261 ICU-22579 Fix Null deref while Unicode Set only has string 2023-12-12 14:39:12 -08:00
Frank Tang
8b14c05791 ICU-22585 Fix infinity loop while unicode set contains single surrogate 2023-12-11 15:33:12 -08:00
Frank Tang
7d3cd7cba5 ICU-22584 Fix def of nullptr
ICU-22584 fix
2023-12-11 14:35:10 -08:00
Frank Tang
73f972f7ff ICU-22581 Fix RBBI leakage
Duplicate variable references in the rule should not cause leakage
2023-12-08 15:47:51 -08:00
Andy Heninger
e6892996b1 ICU-22584 Fix RBBI rule builder stack overflow.
The problem was found by fuzz testing.

A rule consisting of a long literal string produces a large, unbalanced parse tree,
one node per string element. Deleting the tree was recursive, once per node, resulting
in deep recursion.

This PR changes node deletion to use an iterative (non-recursive) approach.

This change only affects rule building. There is no change to the RBBI run time
using pre-built rules.
2023-12-08 12:49:26 -08:00
Fredrik Roubert
f99f8c678b ICU-22522 Delete unused variables.
Clang 16 is more thorough in finding unused variables, so these must be
removed to be able to compile this code using Clang 16 and -Werror.
2023-11-30 15:34:36 +01:00
Frank Tang
9fb9bd4950 ICU-22342 Rename fillBreak to fillBreaks 2023-09-14 10:04:57 -07:00
Frank Tang
02d5e71903 ICU-22342 Implement ExternalBreakEngineAPI
ICU-22342 Fix comments
2023-08-30 11:43:16 -07:00
Elango Cheran
2e45e6ec0e ICU-22404 Unicode 15.1 beta data files & API constants
See #2492

Co-authored-by: Andy Heninger <andy.heninger@gmail.com>
Co-authored-by: Robin Leroy <egg.robin.leroy@gmail.com>
2023-07-13 19:26:14 -07:00
Peter Edberg
5618203821 ICU-22360 revert portions of #2159 which included @ in ALetter for wordbreak, update tests 2023-05-06 21:36:46 -07:00
Markus Scherer
b6dcc95d3c ICU-21833 remove redundant void parameter lists
See #2351
2023-03-02 09:31:57 -08:00
Frank Tang
638acd0c38 ICU-21374 Add a CFI build bot for ICU4C
Add the github action bot to build with cfi
Also fix all the known issues which require the change from C style cast to
static_cast inside the i18n and common directory while we are sure about
the object. and use
C++ style dynamic_cast for base-to-derive cast in other code inside i18n
and common and in test code or tool.
Change to use const_cast for casting between const / non-const
2023-02-06 15:47:14 -08:00
Fredrik Roubert
2de88f9d9c ICU-21833 Replace UChar with char16_t in all C++ code. 2023-02-06 19:27:44 +01:00
Frank Tang
de0a28644b ICU-22251 Move sprintf to snprintf.
See #2291
2023-01-25 23:23:29 -08:00
Shuhei Iitsuka
b6b7b045e9 ICU-22100 Incorporate BudouX into ICU (C++) 2022-12-02 10:11:06 -08:00
Andy Heninger
67a7e2caf0 ICU-21180 RuleBasedBreakIterator, refactor init.
In class RuleBasedBreakIterator, refactor how object initialization is handled
by the various constructors, taking advantage of C++11's ability to directly
initialize data members in the class declaration.

This will simplify ongoing maintenance of the code by eliminating the need
to keep initialization lists synchronized with the class data members.
This is being done now in preparation for additional changes to fix problems
with the handling of memory allocation failures.
2022-11-02 16:25:41 -07:00
Andy Heninger
866254ef12 ICU-21180 BreakIterator, change all NULL to nulptr
In the C++ break iterator code, change all use of NULL to nullptr.
This is in preparation for follow-on PRs to improve out-of-memory error handling
in Break Iterators, keeping use of nullptr consistent between old and new
or updated code.
2022-10-26 18:55:48 -07:00