Robin Leroy
215131c1a4
ICU-22984 Remove some obnoxious tests from 2003
2025-02-19 23:30:34 +01:00
Robin Leroy
8a11097b2e
ICU-22984 Generate the C++ UAX29 monkeys
2025-01-29 10:02:44 +01:00
Robin Leroy
a154b5839c
ICU-22984 code motion: move SegmentationRule before RBBIMonkeyKind
2025-01-29 10:02:44 +01:00
Robin Leroy
7d60bb844e
ICU-22986 GL takes CM
2024-12-20 03:54:59 +01:00
Robin Leroy
e59065cc74
ICU-22984 Clean up old monkeys
2024-12-04 18:38:23 +01:00
Robin Leroy
757f27cd35
ICU-22984 Move old monkeys
2024-12-04 18:38:23 +01:00
Robin Leroy
3f959352b5
ICU-22984 Optimize old monkeys
2024-12-04 18:38:23 +01:00
Robin Leroy
5519b85730
ICU-22984 Generate old monkeys
2024-12-04 18:38:23 +01:00
Robin Leroy
e000c5c3cc
ICU-22127 Remove obsolete WordBreakTest.txt known issues
2024-11-22 18:40:54 +01:00
Robin Leroy
0b9eb9ca71
ICU-22956 Use InCB for grapheme cluster segmentation
2024-11-12 10:45:16 +01:00
Robin Leroy
8d86ca142e
ICU-22941 Revert "ICU-22112 word break updates for @,colon; colon tailorings for fi,sv"
...
This reverts commit 49d192fefe
.
2024-11-05 22:59:24 +01:00
Robin Leroy
ca9fcca3c7
ICU-21097 Remove LineBreakTest.txt workarounds
2024-10-09 00:03:34 +02:00
Frank Yung-Fong Tang
8437d1d86b
ICU-22767 Fix GCC warning and turn warning to errors
...
See #3129
2024-10-02 13:35:03 -07:00
Fredrik Roubert
37b2bc6999
ICU-22721 Use correct initializer list syntax.
...
This will make the code ever so slightly simpler but more importantly
make it possible to compile also when using -D_GLIBCXX_DEBUG.
2024-08-13 21:33:53 -07:00
Robin Leroy
20fdebcb35
ICU-22707 UTC-180? Give up on 16.0β rules, amend LB10 and LB21a instead.
2024-07-18 23:56:34 +00:00
Robin Leroy
cc64ec7c94
ICU-22707 feed more bits to the starving monkeys: ranlux48 rather than a 32-bit LCG
2024-07-18 23:56:34 +00:00
Robin Leroy
47a8ea4065
ICU-22707 smarter old monkeys: refine the partition on interesting sets
2024-07-18 23:56:34 +00:00
Robin Leroy
79f4745494
ICU-22707 UTC-179-C25 Limit LB21a to the Hebrew-hyphen-non-Hebrew case
2024-07-18 23:56:34 +00:00
Robin Leroy
83f3334b96
ICU-22707 UTC-179-C32 Upstream and improve the old Finnish tailoring LB20a from CLDR-3029 and ICU-8151
2024-07-18 23:56:34 +00:00
Robin Leroy
36fe0f0660
ICU-22707 UTC-179-C28 Simplify the UAX14 formulation
2024-07-18 23:56:34 +00:00
Robin Leroy
84ff5dacf8
ICU-22707 UTC-179-C28 LB19 change for simplified chinese
2024-07-18 23:56:34 +00:00
Robin Leroy
1513b66c32
ICU-22707 UTC-179-C35 Remove redundant rule, caught after UTC-179. Called out in review note in https://www.unicode.org/reports/tr14/tr14-52.html#LB25 .
2024-07-18 23:56:34 +00:00
Robin Leroy
d3b361f23a
ICU-22707 UTC-179-C35 LB25 alignment with the UAX14 formulation from 15.1 and earlier.
2024-07-18 23:56:34 +00:00
Robin Leroy
509f552e38
ICU-22707 UTC-179-C35 No regexes for old monkeys: Express LB25 using UAX14-style rules rather than a regex
2024-07-18 23:56:34 +00:00
Robin Leroy
9391cbb0b3
ICU-22707 Print random 🙈 🙉 🙊 🐵 🐒 rather than ... so that progress is visible when the screen is full and report monkey counts
2024-07-18 23:56:34 +00:00
Fredrik Roubert
0178a07a26
ICU-22793 Clang-Tidy: google-readability-casting
...
https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/google/readability-casting.html
2024-07-04 22:32:12 +02:00
Markus Scherer
fe23620f12
ICU-22707 appease rbbitst UBSan for RBBIStateTableRow
2024-04-29 17:00:55 -07:00
Fredrik Roubert
6ad78a08c7
ICU-22621 Clang-Tidy: readability-redundant-control-flow
...
https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/readability/redundant-control-flow.html
2024-03-19 15:55:56 +01:00
Fredrik Roubert
5401c12018
ICU-22621 Clang-Tidy: modernize-use-nullptr
...
https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/modernize/use-nullptr.html
2024-03-15 14:31:54 +01:00
Fredrik Roubert
2a1853c9a9
ICU-22621 Clang-Tidy: modernize-use-emplace
...
https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/modernize/use-emplace.html
2024-03-13 16:31:47 +01:00
Robin Leroy
ba1208e49b
ICU-22518 Add a flag to export the output of the reference implementation from the old segmentation monkey tests
2024-02-08 04:54:33 +01:00
Frank Tang
9832f48e22
ICU-22636 Return U_BRK_RULE_SYNTAX when status number is too large
...
See #2793
2024-01-19 17:16:54 -08:00
Frank Tang
19af9e7ce3
ICU-22602 Fix stack overflow inside flattenVariables
...
Limit the recursive call of flattenVariables to maximum depth 3500
since Java on my machine throw stack overflow exception around 3900.
2023-12-14 15:14:21 -08:00
Frank Tang
4a7d61d261
ICU-22579 Fix Null deref while Unicode Set only has string
2023-12-12 14:39:12 -08:00
Frank Tang
8b14c05791
ICU-22585 Fix infinity loop while unicode set contains single surrogate
2023-12-11 15:33:12 -08:00
Frank Tang
7d3cd7cba5
ICU-22584 Fix def of nullptr
...
ICU-22584 fix
2023-12-11 14:35:10 -08:00
Frank Tang
73f972f7ff
ICU-22581 Fix RBBI leakage
...
Duplicate variable references in the rule should not cause leakage
2023-12-08 15:47:51 -08:00
Andy Heninger
e6892996b1
ICU-22584 Fix RBBI rule builder stack overflow.
...
The problem was found by fuzz testing.
A rule consisting of a long literal string produces a large, unbalanced parse tree,
one node per string element. Deleting the tree was recursive, once per node, resulting
in deep recursion.
This PR changes node deletion to use an iterative (non-recursive) approach.
This change only affects rule building. There is no change to the RBBI run time
using pre-built rules.
2023-12-08 12:49:26 -08:00
Fredrik Roubert
f99f8c678b
ICU-22522 Delete unused variables.
...
Clang 16 is more thorough in finding unused variables, so these must be
removed to be able to compile this code using Clang 16 and -Werror.
2023-11-30 15:34:36 +01:00
Frank Tang
9fb9bd4950
ICU-22342 Rename fillBreak to fillBreaks
2023-09-14 10:04:57 -07:00
Frank Tang
02d5e71903
ICU-22342 Implement ExternalBreakEngineAPI
...
ICU-22342 Fix comments
2023-08-30 11:43:16 -07:00
Elango Cheran
2e45e6ec0e
ICU-22404 Unicode 15.1 beta data files & API constants
...
See #2492
Co-authored-by: Andy Heninger <andy.heninger@gmail.com>
Co-authored-by: Robin Leroy <egg.robin.leroy@gmail.com>
2023-07-13 19:26:14 -07:00
Peter Edberg
5618203821
ICU-22360 revert portions of #2159 which included @ in ALetter for wordbreak, update tests
2023-05-06 21:36:46 -07:00
Markus Scherer
b6dcc95d3c
ICU-21833 remove redundant void parameter lists
...
See #2351
2023-03-02 09:31:57 -08:00
Frank Tang
638acd0c38
ICU-21374 Add a CFI build bot for ICU4C
...
Add the github action bot to build with cfi
Also fix all the known issues which require the change from C style cast to
static_cast inside the i18n and common directory while we are sure about
the object. and use
C++ style dynamic_cast for base-to-derive cast in other code inside i18n
and common and in test code or tool.
Change to use const_cast for casting between const / non-const
2023-02-06 15:47:14 -08:00
Fredrik Roubert
2de88f9d9c
ICU-21833 Replace UChar with char16_t in all C++ code.
2023-02-06 19:27:44 +01:00
Frank Tang
de0a28644b
ICU-22251 Move sprintf to snprintf.
...
See #2291
2023-01-25 23:23:29 -08:00
Shuhei Iitsuka
b6b7b045e9
ICU-22100 Incorporate BudouX into ICU (C++)
2022-12-02 10:11:06 -08:00
Andy Heninger
67a7e2caf0
ICU-21180 RuleBasedBreakIterator, refactor init.
...
In class RuleBasedBreakIterator, refactor how object initialization is handled
by the various constructors, taking advantage of C++11's ability to directly
initialize data members in the class declaration.
This will simplify ongoing maintenance of the code by eliminating the need
to keep initialization lists synchronized with the class data members.
This is being done now in preparation for additional changes to fix problems
with the handling of memory allocation failures.
2022-11-02 16:25:41 -07:00
Andy Heninger
866254ef12
ICU-21180 BreakIterator, change all NULL to nulptr
...
In the C++ break iterator code, change all use of NULL to nullptr.
This is in preparation for follow-on PRs to improve out-of-memory error handling
in Break Iterators, keeping use of nullptr consistent between old and new
or updated code.
2022-10-26 18:55:48 -07:00