The repeated sequence of allocating a CheckedArrayByteSink, calling some
function that writes into this, then checking for overflow and returning
through u_terminateChars() can all be moved into a single shared helper
function.
There's a subtle difference between these two ways of accessing the
value of an optional and that is that the value() method can throw an
exception if there isn't any value, but operator* won't do that (it's
just undefined behavior if there isn't any value).
ICU4C code never tries to access any optional value without first
checking that it exists, but the ability of the value() method to throw
an exception in case there wasn't any such check first is the reason why
std::exception symbols previously could show up in debug builds.
This reverts the changes that were made to dependencies.txt by
commit dc70b5a056.
tz2024a change "Asia/Qostanay" "Asia/Almaty" but test machines has
not yet update their zoneinfo to 2024a so we mark them as known issues
extern long timezone; in <time.h> (set man tzset on Linux shell)
returns wrong value when TZ=America/Scoresbysund
· No function should do anything if an error has already occurred.
· On error, a value of 0, nullptr, {}, etc., should be returned.
· Values shouldn't have overloaded meanings (eg. index or found).
· Values that are never used should not be returned at all.
- [x] Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22671
- [x] Required: The PR title must be prefixed with a JIRA Issue number. <!-- For example: "ICU-1234 Fix xyz" -->
- [x] Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
- [x] Required: Each commit message must be prefixed with a JIRA Issue number. <!-- For example: "ICU-1234 Fix xyz" -->
- [ ] Issue accepted (done by Technical Committee after discussion)
- [ ] Tests included, if applicable
- [ ] API docs and/or User Guide docs changed or added, if applicable
When building icu4c, it defaults to clang instead of gcc when the default compiler, cc / c++, is a symlink to gcc / g++.
This not the expected behavior when building C and C++ code.
It appears that this behavior was put in place originally for supporting C++11, which hopefully is no longer such a concern.
This PR adjusts the configure.ac for icu4c to prefer the cc and c++ compilers first.
Some of this code was originally written as C code and some of this code
was originally written as C++ code but made to resemble the then already
existing code that had once been C code. Changing it all to normal C++
now will make it easier and safer to work with going forward.
· Use unnamed namespace instead of static.
· Use reference instead of non-nullable pointer.
· Use bool instead of UBool.
· Use constexpr for static data.
· Use U_EXPORT instead of U_CAPI or U_CFUNC.
· Use the default calling convention instead of U_EXPORT2.
Now when the parseTagString() helper function just is a wrapper over
ulocimp_getSubtags() it can be replaced by calling that function
directly instead and letting it handle variant subtags as well.
These functions that eventually write their output to a ByteSink need a
small temporary buffer for processing the subtag they're about to write
and currently use a local CharString object to provide this buffer,
which then gets written to the ByteSink and discarded.
This intermediate step is unnecessary as a ByteSink can provide an
append buffer which can be used instead, eliminating the need to
allocate a local temporary buffer and to copy the data around.
This approach also makes it natural to split the processing into two
steps, first calculating the length of the subtag, then processing it,
which makes it possible to return early when no output is requested.
These wrappers that call ulocimp_getSubtags() to get only one particular
subtag and then return that as icu::CharString will be convenient for
replacing code that currently calls the uloc_get*() functions writing
into a fixed size buffer.
These functions now no longer have any other callers so they can be made
internal to the compilation unit of ulocimp_getSubtags(), thus bringing
them back to how they originally were intended to be used (and making
the comment above them true once again).
This also makes it possible to remove the temporary icu::CharString
objects that previously were returned to callers and instead write
directly to icu::ByteSink, making the code both simpler and less
wasteful (also that how this was once intended).
The logic for parsing a localeID string into its constituent subtags is
currently repeated over and over again in each one of the uloc_get*()
functions, so that calling all these functions one after the other in
order to get all the subtags does the parsing all over again from the
beginning for each function call.
In order to avoid having to do this parsing over and over again, a lot
of code instead has its own copy of the parsing logic in order to call
the underlying ulocimp_get*() functions directly for lower runtime cost
at the price of increased code complexity and repetition.
This new ulocimp_getSubtags() function, which writes natively to
icu::ByteSink and has a convenience wrapper to write to icu::CharString,
removes the repeated code from the uloc_get*() functions and makes it
possible to update all code that calls the ulocimp_get*() functions.
Originally added by commit 24055f8585
for ICU-7882, converting any language tag with a BCP-47 extension into a
legacy Unicode locale ID was a simple way to make the existing code keep
working unchanged also with BCP-47 extensions.
But the only thing that uloc_getVariant() needs is being able to find
out where variants end and extensions begin, for which converting the
entire language tag is unnecessary, it's much more straightforward to
instead just check for the -t-, -u- or -x- marker that indicates the
start of a BCP-47 extension.