Commit graph

122 commits

Author SHA1 Message Date
Fredrik Roubert
3f5fd0df73 ICU-22901 Update ulocimp_canonicalize() &co. to use std::string_view. 2025-02-13 15:50:54 +01:00
Fredrik Roubert
afa06d2bcd ICU-22901 Update _canonicalize() to use std::string_view. 2025-02-13 15:50:54 +01:00
Fredrik Roubert
5463eac8b4 ICU-22901 Update ulocimp_getKeywords() to use std::string_view. 2025-02-13 15:50:54 +01:00
Fredrik Roubert
aa724e1e3f ICU-22901 Move calls to uloc_getDefault() out of _canonicalize(). 2025-02-13 15:50:54 +01:00
Fredrik Roubert
83726260ef ICU-23031 Reinstate special case for "-u-va-posix" lost by ICU-22520.
Inside of locimp_forLanguageTag() in _appendKeywords() in uloc_tag.cpp
there's a hardcoded special case for "-u-va-posix" which appends the
"_POSIX" variant but this was missed during the refactoring made for
ICU-22520 (there isn't any test case that covers this).

So the call to locimp_forLanguageTag() did more than previously
understood, but we still don't want to have to call that for every
language tag that has BCP-47 extensions just in order to get to this
special case. Instead, add a special case also to ulocimp_getSubtags().

For this to work nicely, the loop in _getVariant() that copies variants
needs to be refactored so that it easily can break when encountering the
start of any BCP-47 extension (which also has the welcome side-effect of
making it more efficient, being able to append an entire variant at once
to the output sink).

This was broken by commit 678d5c1273.
2025-02-13 08:50:17 +01:00
Fredrik Roubert
424d6a3e8b ICU-22901 Update ulocimp_getSubtags() &co. to use std::string_view. 2024-11-22 19:05:03 +01:00
Fredrik Roubert
1dccc10085 ICU-22901 Move calls to uloc_getDefault() out of ulocimp_getSubtags(). 2024-11-22 19:05:03 +01:00
Fredrik Roubert
7ffbe77e12 ICU-22696 Update ulocimp_setKeywordValue() to use std::string_view. 2024-08-13 14:03:18 +02:00
Fredrik Roubert
8a6d59ec80 ICU-22696 Update ulocimp_to*{Key,Type}() to use std::string_view. 2024-08-07 14:14:23 +02:00
Fredrik Roubert
dd65ee3f0b ICU-22696 Update ulocimp_getKeywordValue() to use std::string_view. 2024-07-31 15:39:15 +02:00
Fredrik Roubert
5d7cbdbc02 ICU-22696 Delete unused code.
These optional output parameters weren't used when these function were
originally added so they were most likely included just in case someone
would want to use them in the future, but that was 10 years ago now and
they still haven't been used yet, so it's unlikely that they'll be used
in the foreseeable future and call sites as well as the implementation
can instead be simplified by removing them.
2024-07-29 22:03:10 +02:00
Fredrik Roubert
0178a07a26 ICU-22793 Clang-Tidy: google-readability-casting
https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/google/readability-casting.html
2024-07-04 22:32:12 +02:00
Frank Tang
d259da8118 ICU-22700 Fix large POSIX charset name cause hang
Fix fuzzer found issue of hang that caused by long POSIX charset name.
Limit the POSIX charset name to at most 64 chars.
2024-03-21 11:33:52 -07:00
Fredrik Roubert
5401c12018 ICU-22621 Clang-Tidy: modernize-use-nullptr
https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/modernize/use-nullptr.html
2024-03-15 14:31:54 +01:00
Frank Tang
de9910659d ICU-22661 Limit the size of variants in Locale
See #2821
2024-03-14 16:23:51 -07:00
Fredrik Roubert
53568e8dfc ICU-22520 Refactor CharString & CharStringByteSink into helper.
The repeated sequence of allocating a CharString and CharStringByteSink,
before calling some function that writes into this, can be moved into a
single shared helper function which then is used to give all ulocimp.h
functions that write to ByteSink an overload that instead returns a
CharString, to make call sites look like perfectly normal C++ code.
2024-03-05 23:44:50 +01:00
Fredrik Roubert
02a1bfc59f ICU-22520 Refactor CheckedArrayByteSink & u_terminateChars into helper.
The repeated sequence of allocating a CheckedArrayByteSink, calling some
function that writes into this, then checking for overflow and returning
through u_terminateChars() can all be moved into a single shared helper
function.
2024-03-05 20:09:54 +01:00
Fredrik Roubert
232362bf17 ICU-22520 Use operator* instead of calling std::optional::value().
There's a subtle difference between these two ways of accessing the
value of an optional and that is that the value() method can throw an
exception if there isn't any value, but operator* won't do that (it's
just undefined behavior if there isn't any value).

ICU4C code never tries to access any optional value without first
checking that it exists, but the ability of the value() method to throw
an exception in case there wasn't any such check first is the reason why
std::exception symbols previously could show up in debug builds.

This reverts the changes that were made to dependencies.txt by
commit dc70b5a056.
2024-03-04 23:40:15 +01:00
Fredrik Roubert
929cd9bb4f ICU-22520 Standardize return on error for all locale functions.
· No function should do anything if an error has already occurred.
· On error, a value of 0, nullptr, {}, etc., should be returned.
· Values shouldn't have overloaded meanings (eg. index or found).
· Values that are never used should not be returned at all.
2024-02-29 20:42:03 +01:00
Fredrik Roubert
939f08f274 ICU-22520 Use C++ function signatures for internal C++ functions.
Some of this code was originally written as C code and some of this code
was originally written as C++ code but made to resemble the then already
existing code that had once been C code. Changing it all to normal C++
now will make it easier and safer to work with going forward.

· Use unnamed namespace instead of static.
· Use reference instead of non-nullable pointer.
· Use bool instead of UBool.
· Use constexpr for static data.
· Use U_EXPORT instead of U_CAPI or U_CFUNC.
· Use the default calling convention instead of U_EXPORT2.
2024-02-12 21:44:06 +01:00
Fredrik Roubert
63ae786bf7 ICU-22520 Refactor function macros into inline functions.
This is to facilitate further refactoring of the locale code.
2024-02-08 14:24:48 +01:00
Fredrik Roubert
699555a5bd ICU-22520 Use a ByteSink append buffer instead of a local CharString.
These functions that eventually write their output to a ByteSink need a
small temporary buffer for processing the subtag they're about to write
and currently use a local CharString object to provide this buffer,
which then gets written to the ByteSink and discarded.

This intermediate step is unnecessary as a ByteSink can provide an
append buffer which can be used instead, eliminating the need to
allocate a local temporary buffer and to copy the data around.

This approach also makes it natural to split the processing into two
steps, first calculating the length of the subtag, then processing it,
which makes it possible to return early when no output is requested.
2024-02-08 00:38:09 +01:00
Fredrik Roubert
d28e12b1f2 ICU-22520 Replace char arrays with icu::CharString. 2024-02-06 19:53:53 +01:00
Fredrik Roubert
930b4d9ab9 ICU-22520 Add convenience wrappers for calling ulocimp_getSubtags().
These wrappers that call ulocimp_getSubtags() to get only one particular
subtag and then return that as icu::CharString will be convenient for
replacing code that currently calls the uloc_get*() functions writing
into a fixed size buffer.
2024-02-06 19:53:53 +01:00
Fredrik Roubert
835b009314 ICU-22520 Make ulocimp_get*() internal to ulocimp_getSubtags().
These functions now no longer have any other callers so they can be made
internal to the compilation unit of ulocimp_getSubtags(), thus bringing
them back to how they originally were intended to be used (and making
the comment above them true once again).

This also makes it possible to remove the temporary icu::CharString
objects that previously were returned to callers and instead write
directly to icu::ByteSink, making the code both simpler and less
wasteful (also that how this was once intended).
2024-02-06 13:12:55 +01:00
Fredrik Roubert
1b768edbdf ICU-22520 Update all users of ulocimp_get*() to ulocimp_getSubtags().
This simplifies the code by removing the need for finding the positions
of the subtags, all that logic is now in just one single place.
2024-02-06 13:12:55 +01:00
Fredrik Roubert
dc70b5a056 ICU-22520 Move all localeID parsing logic into new ulocimp_getSubtags().
The logic for parsing a localeID string into its constituent subtags is
currently repeated over and over again in each one of the uloc_get*()
functions, so that calling all these functions one after the other in
order to get all the subtags does the parsing all over again from the
beginning for each function call.

In order to avoid having to do this parsing over and over again, a lot
of code instead has its own copy of the parsing logic in order to call
the underlying ulocimp_get*() functions directly for lower runtime cost
at the price of increased code complexity and repetition.

This new ulocimp_getSubtags() function, which writes natively to
icu::ByteSink and has a convenience wrapper to write to icu::CharString,
removes the repeated code from the uloc_get*() functions and makes it
possible to update all code that calls the ulocimp_get*() functions.
2024-02-06 13:12:55 +01:00
Fredrik Roubert
678d5c1273 ICU-22520 Replace use of ulocimp_forLanguageTag() in uloc_getVariant().
Originally added by commit 24055f8585
for ICU-7882, converting any language tag with a BCP-47 extension into a
legacy Unicode locale ID was a simple way to make the existing code keep
working unchanged also with BCP-47 extensions.

But the only thing that uloc_getVariant() needs is being able to find
out where variants end and extensions begin, for which converting the
entire language tag is unnecessary, it's much more straightforward to
instead just check for the -t-, -u- or -x- marker that indicates the
start of a BCP-47 extension.
2024-02-06 13:12:55 +01:00
Fredrik Roubert
ae9cc8cbd1 ICU-22520 Replace char arrays with icu::CharString. 2024-01-30 12:04:53 +01:00
Fredrik Roubert
1b0f5e41c5 ICU-22520 Switch to using CharString for calling uloc_setKeywordValue(). 2024-01-30 12:04:53 +01:00
Fredrik Roubert
340806bf9a ICU-22520 Add a ulocimp_setKeywordValue() that writes to icu::ByteSink. 2024-01-30 12:04:53 +01:00
Fredrik Roubert
5eded36279 ICU-22520 Bugfix: Use macro parameter name instead of variable name. 2024-01-23 13:00:26 +09:00
Peter Edberg
2f7bfd87cb ICU-22326 CLDR release-44-beta5 to ICU main part 3 (ICU sources: lib, tools, tests) 2023-10-26 10:59:18 -07:00
Fredrik Roubert
037449fff8 ICU-21289 Switch to using CharString for calling uloc_getParent(). 2023-09-26 23:41:24 +02:00
Fredrik Roubert
96dcaf7da8 ICU-21289 Switch to using CharString for calling uloc_forLanguageTag(). 2023-09-26 17:52:51 +02:00
Peter Edberg
2270c174a5 ICU-22325 CLDR release-44-alpha1 to main:
- binaries, binary-as-source, CLDR data sources;
  - CLDR test data & dtd, ICU lib/tool/test source updates.
2023-08-22 14:40:51 -07:00
Markus Scherer
b6dcc95d3c ICU-21833 remove redundant void parameter lists
See #2351
2023-03-02 09:31:57 -08:00
Peter Edberg
18f6a3a6e2 ICU-22220 CLDR release-43-alpha2 to ICU main 2023-02-27 11:09:02 -08:00
Fredrik Roubert
2e0d30cfcf ICU-21833 Replace NULL with nullptr in all C++ code. 2023-02-03 20:20:38 +01:00
Fredrik Roubert
030fa1a479 ICU-21148 Consistently use standard lowercase true/false everywhere.
This is the normal standard way in C, C++ as well as Java and there's no
longer any reason for ICU to be different. The various internal macros
providing custom boolean constants can all be deleted and code as well
as documentation can be updated to use lowercase true/false everywhere.
2022-09-07 20:56:33 +02:00
Peter Edberg
0266970e97 ICU-21957 integrate CLDR release-42-alpha1 to ICU main for 72 2022-08-05 09:39:58 -07:00
Peter Edberg
6330704974 ICU-21944 Sync recent uloc_getLanguage/Countries updates to ICU4J; add "mo" mapping for C 2022-03-16 09:01:59 -07:00
Peter Edberg
dbf7c20be6 ICU-21942 Fix Kosovo 3-letter code to be XKK for uloc_getISO3Country etc. 2022-03-15 10:59:13 -07:00
Frank Tang
adb109f440 ICU-21749 Fix stack-use-after-scope bug in uloc
See #1858
2021-09-15 11:25:45 -07:00
Rich Gillam
01e1adc9e4 ICU-21460 Changed the ULocale initializers to allow locale IDs that use BCP47 syntax, but with '_' as a field delimiter.
(APIs that specifically require BCP47 syntax are unaffected-- they still require '-').
2021-09-03 12:47:02 -07:00
Rich Gillam
b03b8be741 ICU-21639 Added an internal utility class to streamline preflighting and heap-allocating a char buffer for a locale ID
and changed several internal methods in ULocale to use it, so that they work correctly on locale IDs that are longer
than ULOC_FULLNAME_CAPACITY.
2021-08-02 13:15:29 -07:00
Peter Edberg
6c26ea21b2 ICU-21341 for getISO codes, add some CLDR-valid regions, remove some CLDR-invalid langs 2021-03-11 14:31:07 -08:00
Frank Tang
9663195189 ICU-21385 Fix assertion when setKeywordValue w/ long value.
See #1461
2020-11-12 10:00:28 -08:00
Markus Scherer
18c4a69f80 ICU-9961 replace U_DRAFT/U_STABLE/U_INTERNAL with U_CAPI 2020-09-10 11:23:44 -07:00
Fredrik Roubert
936f53a1f1 ICU-21035 Update locale implementation to use ulocimp_getKeywordValue(). 2020-09-03 19:02:47 +02:00