Some of this code was originally written as C code and some of this code
was originally written as C++ code but made to resemble the then already
existing code that had once been C code. Changing it all to normal C++
now will make it easier and safer to work with going forward.
· Use unnamed namespace instead of static.
· Use reference instead of non-nullable pointer.
· Use bool instead of UBool.
· Use constexpr for static data.
· Use U_EXPORT instead of U_CAPI or U_CFUNC.
· Use the default calling convention instead of U_EXPORT2.
Now when the parseTagString() helper function just is a wrapper over
ulocimp_getSubtags() it can be replaced by calling that function
directly instead and letting it handle variant subtags as well.
These functions that eventually write their output to a ByteSink need a
small temporary buffer for processing the subtag they're about to write
and currently use a local CharString object to provide this buffer,
which then gets written to the ByteSink and discarded.
This intermediate step is unnecessary as a ByteSink can provide an
append buffer which can be used instead, eliminating the need to
allocate a local temporary buffer and to copy the data around.
This approach also makes it natural to split the processing into two
steps, first calculating the length of the subtag, then processing it,
which makes it possible to return early when no output is requested.
These wrappers that call ulocimp_getSubtags() to get only one particular
subtag and then return that as icu::CharString will be convenient for
replacing code that currently calls the uloc_get*() functions writing
into a fixed size buffer.
These functions now no longer have any other callers so they can be made
internal to the compilation unit of ulocimp_getSubtags(), thus bringing
them back to how they originally were intended to be used (and making
the comment above them true once again).
This also makes it possible to remove the temporary icu::CharString
objects that previously were returned to callers and instead write
directly to icu::ByteSink, making the code both simpler and less
wasteful (also that how this was once intended).
The logic for parsing a localeID string into its constituent subtags is
currently repeated over and over again in each one of the uloc_get*()
functions, so that calling all these functions one after the other in
order to get all the subtags does the parsing all over again from the
beginning for each function call.
In order to avoid having to do this parsing over and over again, a lot
of code instead has its own copy of the parsing logic in order to call
the underlying ulocimp_get*() functions directly for lower runtime cost
at the price of increased code complexity and repetition.
This new ulocimp_getSubtags() function, which writes natively to
icu::ByteSink and has a convenience wrapper to write to icu::CharString,
removes the repeated code from the uloc_get*() functions and makes it
possible to update all code that calls the ulocimp_get*() functions.
Originally added by commit 24055f8585
for ICU-7882, converting any language tag with a BCP-47 extension into a
legacy Unicode locale ID was a simple way to make the existing code keep
working unchanged also with BCP-47 extensions.
But the only thing that uloc_getVariant() needs is being able to find
out where variants end and extensions begin, for which converting the
entire language tag is unnecessary, it's much more straightforward to
instead just check for the -t-, -u- or -x- marker that indicates the
start of a BCP-47 extension.