mirror of
https://github.com/unicode-org/icu.git
synced 2025-04-04 21:15:35 +00:00
ICU-22723 download 76rc
This commit is contained in:
parent
73626da0ca
commit
61bbeb8898
1 changed files with 177 additions and 43 deletions
|
@ -14,15 +14,29 @@ License & terms of use: http://www.unicode.org/copyright.html
|
|||
|
||||
# ICU 76
|
||||
|
||||
ICU is the [premier library for software internationalization](https://icu.unicode.org/#h.i33fakvpjb7o), used by a [wide array of companies and organizations](https://icu.unicode.org/#h.f9qwubthqabj).
|
||||
ICU is the [premier library for software internationalization](https://icu.unicode.org/#h.i33fakvpjb7o),
|
||||
used by a [wide array of companies and organizations](https://icu.unicode.org/#h.f9qwubthqabj).
|
||||
|
||||
## Release Overview
|
||||
|
||||
ICU 76 updates to [Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/) (TODO: link to blog),
|
||||
ICU 76 updates to
|
||||
[Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/)
|
||||
([blog](https://blog.unicode.org/2024/09/announcing-unicode-standard-version-160.html)),
|
||||
including new characters and scripts, emoji, collation & IDNA changes, and corresponding APIs and implementations.
|
||||
It also updates to [CLDR 46](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md) (TODO: link to blog) locale data with new locales and various additions and corrections.
|
||||
|
||||
It also updates to
|
||||
[CLDR 46](https://cldr.unicode.org/downloads/cldr-46)
|
||||
([beta blog](https://blog.unicode.org/2024/09/unicode-cldr-46-beta-available-for.html))
|
||||
locale data with new locales, signficant updates to existing locales,
|
||||
and various additions and corrections.
|
||||
For example, the CLDR and Unicode default sort orders are now very nearly the same.
|
||||
|
||||
Most of the java.time (Temporal) types can now be formatted directly
|
||||
using the existing ICU4J date/time formatting classes.
|
||||
|
||||
There are some new APIs to make ICU easier to use with modern C++ and Java patterns.
|
||||
Most of the C/C++ APIs added for this purpose are implemented as C++ header-only APIs,
|
||||
and usable on top of binary stable C APIs, which is a first for ICU.
|
||||
|
||||
The Java and C++ technology preview implementations of the (also in [tech preview](https://github.com/unicode-org/message-format-wg?tab=readme-ov-file#messageformat-2-technical-preview)) CLDR MessageFormat 2.0 specification have been updated to match recent changes.
|
||||
|
||||
|
@ -34,7 +48,7 @@ Please use the [icu-support mailing list](https://icu.unicode.org/contacts) and/
|
|||
|
||||
The initial release has library version number 76.1.
|
||||
|
||||
* Release date: 2024-10-TODO
|
||||
* Release date: _planned for_ 2024-10-24
|
||||
* [List of tickets fixed in ICU 76](https://unicode-org.atlassian.net/issues/?jql=project%20%3D%20ICU%20AND%20status%20%3D%20Done%20AND%20resolution%20in%20%28Fixed%2C%20%22Fixed%20by%20Other%20Ticket%22%29%20AND%20fixVersion%20%3D%2076.1%20ORDER%20BY%20component%20ASC%2C%20created%20DESC)
|
||||
|
||||
If there are maintenance releases, they will be 76.2, 76.3, etc. (During ICU 76 development, the library version number was 76.0.x.)
|
||||
|
@ -43,51 +57,168 @@ Note: There may be additional commits on the [maint/maint-76](https://github.com
|
|||
|
||||
## Common Changes
|
||||
|
||||
* [Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/) (TODO: link to blog):
|
||||
* TODO
|
||||
* [CLDR 46](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md) (TODO: link to blog):
|
||||
* TODO: new stuff
|
||||
* TODO: below is from 45
|
||||
* MessageFormat 2.0 tech preview being included into LDML.
|
||||
* Structural “under the hood” work and limited data bug fixes, but no new data collection.
|
||||
* Some time zones deprecated following IANA TZ database changes.
|
||||
* TODO: new stuff
|
||||
* TODO: below is from 75
|
||||
* New Unicode properties APIs for Identifier_Status and Identifier_Type, defined by UTS \#39 Unicode Security Mechanisms, [General Security Profile for Identifiers](https://www.unicode.org/reports/tr39/#General_Security_Profile). ([ICU-11396](https://unicode-org.atlassian.net/browse/ICU-11396))
|
||||
* Time zone data (tzdata) version 2024a (2024-jan). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream [tzdata](https://www.iana.org/time-zones) release since 2021b.
|
||||
* [Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/)
|
||||
([blog](https://blog.unicode.org/2024/09/announcing-unicode-standard-version-160.html)):
|
||||
* Adds five modern-use scripts: Garay, Gurung Khema, Kirat Rai, Ol Onal, Sunuwar
|
||||
* Adds two historic scripts & almost 4000 additional Egyptian Hieroglyphs
|
||||
* Seven new emoji characters
|
||||
* Over 700 symbols from legacy computing environments
|
||||
* ICU line breaking improvements have been upstreamed into
|
||||
[UAX #14](https://www.unicode.org/reports/tr14/tr14-53.html#Modifications)
|
||||
* ICU 76 adds support for the new UCD property Modifier_Combining_Mark for
|
||||
[UAX #53](https://www.unicode.org/reports/tr53/) Arabic Mark Rendering
|
||||
* ICU 76 also adds support for the UCD property Indic_Conjunct_Break
|
||||
which was new in Unicode 15.1. ([ICU-22503](https://unicode-org.atlassian.net/browse/ICU-22503))
|
||||
* [IDNA](https://www.unicode.org/reports/tr46/tr46-33.html#Modifications):
|
||||
The handling of UseSTD3ASCIIRules was simplified.
|
||||
Some existing characters changed from disallowed (when that was only for compatibility with
|
||||
long-obsolete IDNA2003) to valid.
|
||||
* [CLDR 46](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md)
|
||||
([beta blog](https://blog.unicode.org/2024/09/unicode-cldr-46-beta-available-for.html)):
|
||||
* Significant data updates across all locales
|
||||
* Locales which are now at modern coverage level: Nigerian Pidgin, Tigrinya
|
||||
* Locales which are now at moderate coverage level:
|
||||
Akan, Baluchi (Latin), Kangri, Tajik, Tatar, Wolof
|
||||
* New measurement units "night" and "light-speed"
|
||||
* Note: ICU 76 does not yet support `portion-per-1e9` (aka per-billion). (See [ICU-22781](https://unicode-org.atlassian.net/browse/ICU-22781))
|
||||
* [MessageFormat 2.0 tech preview updates](https://cldr.unicode.org/downloads/cldr-46#message-format-specification)
|
||||
* Language matching: Dropped the fallback mapping
|
||||
desired="uk" → supported="ru"
|
||||
(so that Ukrainian (uk) doesn’t fall back to Russian (ru))
|
||||
* [Collation](https://cldr.unicode.org/downloads/cldr-46#collation-data-changes):
|
||||
Significant changes to the CLDR root collation (CLDR default sort order)
|
||||
* Realigned With DUCET:
|
||||
The order of groups of characters which sort below letters is now the same.
|
||||
In both sort orders, non-decimal-digit numeric characters now sort after decimal digits,
|
||||
and the CLDR root collation no longer tailors any currency symbols
|
||||
(making some of them sort like letter sequences, as in the DUCET).
|
||||
_These changes eliminate sort order differences among almost all
|
||||
regular characters between the CLDR root collation and the DUCET._
|
||||
* Improved Han Radical-Stroke Order:
|
||||
The CLDR radical-stroke order now matches that of the Unicode Radical-Stroke Index;
|
||||
traditional vs. simplified forms of radicals are now distinguished on a lower level than the number of residual strokes.
|
||||
In alphabetic indexes for radical-stroke sort orders,
|
||||
only the traditional forms of radicals are now available as index characters.
|
||||
* Time zone data (tzdata) version 2024b (2024-sep). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream [tzdata](https://www.iana.org/time-zones) release since 2021b.
|
||||
* The Asia/Almaty time zone has become an alias following IANA TZ database changes.
|
||||
* CLDR added support for deprecated timezone codes by remapping:
|
||||
CST6CDT → America/Chicago, EST → America/Panama, EST5EDT → America/New_York,
|
||||
MST7MDT → America/Denver, PST8PDT → America/Los_Angeles
|
||||
(These IANA TZ changes were motivated by CLDR, see
|
||||
[CLDR-17111](https://unicode-org.atlassian.net/browse/CLDR-17111))
|
||||
|
||||
## ICU4C Specific Changes
|
||||
|
||||
* [API changes since ICU4C 75 (Markdown)](https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.md) / [(HTML)](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.html)
|
||||
* TODO: new stuff
|
||||
* TODO: below is from 75
|
||||
* MessageFormat 2.0 tech preview new API ([ICU-22261](https://unicode-org.atlassian.net/browse/ICU-22261))
|
||||
* C: Require C11 (up from C99)
|
||||
* C++: Require C++17 (up from C++11)
|
||||
* Many changes for more robust string and buffer handling.
|
||||
* [API changes since ICU4C 75 (Markdown)](https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.md) / [(HTML)](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.html)
|
||||
* A UnicodeString can now be converted to & from UTF-16 standard string_view types
|
||||
(std::u16string_view, and on Windows to/from std::wstring_view)
|
||||
and other UTF-16 types (string literals, standard string classes).
|
||||
Several other member functions have been widened to accept standard UTF-16 types as well.
|
||||
([ICU-22843](https://unicode-org.atlassian.net/browse/ICU-22843))
|
||||
* New APIs for colloquial iteration over the elements of a C++ UnicodeSet or a C USet. ([ICU-22876](https://unicode-org.atlassian.net/browse/ICU-22876))
|
||||
* For details and an example see the “C++ Header-Only APIs” section of the [Migration Issues](#migration-issues) below.
|
||||
* New APIs for colloquial use of C++ Collator / C UCollator with
|
||||
standard C++ algorithms (e.g, sort) & data structures (e.g., map).
|
||||
([ICU-22879](https://unicode-org.atlassian.net/browse/ICU-22879))
|
||||
(The UCollator wrappers are also C++ header-only APIs.)
|
||||
* Note: Some APIs were changed to accept a wider range of input types than before,
|
||||
but in the API change report they look like the old, stable signatures are removed,
|
||||
and like the wider signatures are added as “born stable”.
|
||||
For example, several UnicodeString constructors that take a raw pointer
|
||||
have been replaced with a signature that accepts such raw pointers but also additional input types.
|
||||
* Note: Similarly, the API change report appears to show removal+addition of
|
||||
certain UnicodeString::remove() and UnicodeString::removeBetween() overloads,
|
||||
but only the _expression_ of one of their default parameter values has changed.
|
||||
* Many changes for more robust string and memory handling.
|
||||
|
||||
## ICU4J Specific Changes
|
||||
|
||||
* [API Changes since ICU4J 75](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4j/APIChangeReport.html)
|
||||
* TODO: new stuff
|
||||
* TODO: below is from 75
|
||||
* MessageFormat 2.0 tech preview update ([ICU-22690](https://unicode-org.atlassian.net/browse/ICU-22690))
|
||||
* Performance (multi-threading / lock contention) improvement for BreakIterator.clone() and ULocale.getDefault(). ([ICU-22582](https://unicode-org.atlassian.net/browse/ICU-22582))
|
||||
* [API Changes since ICU4J 75](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4j/APIChangeReport.html)
|
||||
* Most of the java.time (Temporal) types can now be formatted directly
|
||||
using the existing ICU4J date/time formatting classes. ([ICU-22853](https://unicode-org.atlassian.net/browse/ICU-22853))
|
||||
* New APIs for colloquial iteration over the elements of a UnicodeSet.
|
||||
In addition to the existing ranges(), strings(), and UnicodeSet-is-an-Iterable,
|
||||
there is a new codePoints() (returns an Iterable),
|
||||
and new methods that return Streams (e.g., codePointStream() & rangeStream()).
|
||||
([ICU-22845](https://unicode-org.atlassian.net/browse/ICU-22845))
|
||||
|
||||
## Known Issues
|
||||
|
||||
* TODO: new stuff
|
||||
* TODO: below is from 75
|
||||
* [ICU-22729](https://unicode-org.atlassian.net/browse/ICU-22729) udatpg_getBestPattern requires exact skeleton match in ICU 76
|
||||
* Due to a combination of an ICU bug fix and issues with CLDR availableFormats data, some skeletons in some languages yield inconsistent data/time formatting patterns.
|
||||
* None yet
|
||||
|
||||
## Migration Issues
|
||||
|
||||
* See [CLDR 46 migration issues](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md#migration)
|
||||
* TODO: new stuff
|
||||
* TODO: below is from 75
|
||||
* ICU4C behavior for ill-formed locale IDs/language tags: uloc_getName(), uloc_getLanguage() and similar functions (and functions that rely on them) may fail with a U_ILLEGAL_ARGUMENT_ERROR when they used to fail only with a U_BUFFER_OVERFLOW_ERROR. (due to changes for [ICU-22520](https://unicode-org.atlassian.net/browse/ICU-22520))
|
||||
* On Linux, the configure script now defaults to "cc" rather than preferring "clang". If you want to choose clang, then configure for "Linux/clang". ([ICU-22556](https://unicode-org.atlassian.net/browse/ICU-22556))
|
||||
### IDNA Default Option Changed to Nontransitional Processing
|
||||
After all major browsers have switched to nontransitional processing,
|
||||
Unicode 15.1 (a year ago) changed the [UTS #46 spec](https://www.unicode.org/reports/tr46/#Processing)
|
||||
to declare transitional processing deprecated.
|
||||
|
||||
ICU 76 changes the "DEFAULT" API constants from 0 to UIDNA_NONTRANSITIONAL_TO_ASCII | UIDNA_NONTRANSITIONAL_TO_UNICODE.
|
||||
|
||||
ICU 76 does not change the behavior of using options value 0.
|
||||
(That would change the behavior of existing binaries linking with new ICU libraries.)
|
||||
However, when code is recompiled against a new version of ICU,
|
||||
and when it uses the DEFAULT constant, then it will pass these option flags into the factory method.
|
||||
|
||||
* In C/C++: unicode/uidna.h [UIDNA_DEFAULT](https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uidna_8h.html#a726ca809ffd3d67ab4b8476646f26635aa1eb63014cdaf41c7ea6cf3abecf1169)
|
||||
* In Java: IDNA.java [DEFAULT](https://unicode-org.github.io/icu-docs/apidoc/dev/icu4j/com/ibm/icu/text/IDNA.html#DEFAULT)
|
||||
|
||||
See [ICU-22294](https://unicode-org.atlassian.net/browse/ICU-22294)
|
||||
|
||||
### SimpleNumber::truncateStart() Removed
|
||||
ICU 75 renamed the still-draft SimpleNumber::truncateStart() to setMaximumIntegerDigits().
|
||||
ICU 76 removes the never-stable, original function.
|
||||
Same for the C API usnum_truncateStart().
|
||||
([ICU-22900](https://unicode-org.atlassian.net/browse/ICU-22900))
|
||||
|
||||
### C++ Header-Only APIs
|
||||
ICU 76 is the first version where we add what we call C++ header-only APIs.
|
||||
These are especially intended for users who rely on only binary stable DLL/library exports of C APIs
|
||||
(C++ APIs cannot be binary stable).
|
||||
|
||||
_Please test these new APIs and let us know if you find problems —
|
||||
especially if you find a platform/compiler/options combination
|
||||
where the call site does end up calling into ICU DLL/library exports._
|
||||
|
||||
Remember that regular C++ APIs can be hidden by callers defining `U_SHOW_CPLUSPLUS_API=0`.
|
||||
The new header-only APIs can be separately enabled via `U_SHOW_CPLUSPLUS_HEADER_API=1`.
|
||||
|
||||
([GitHub query for `U_SHOW_CPLUSPLUS_HEADER_API` in public header files](https://github.com/search?q=repo%3Aunicode-org%2Ficu+U_SHOW_CPLUSPLUS_HEADER_API+path%3Aunicode%2F*.h&type=code))
|
||||
|
||||
These are C++ definitions that are not exported by the ICU DLLs/libraries,
|
||||
are thus inlined into the calling code,
|
||||
and which may call ICU C APIs but not into ICU non-header-only C++ APIs.
|
||||
|
||||
The header-only APIs are defined in a nested `header` namespace.
|
||||
If entry point renaming is turned off (the main namespace is `icu` rather than `icu_76` etc.),
|
||||
then the new `U_HEADER_ONLY_NAMESPACE` is `icu::header`.
|
||||
|
||||
([Link to the API proposal which introduced this concept](https://docs.google.com/document/d/1xERVccTYsptzjfbjcj6HDtoKVF_mEKmslPsOiQzzaFg/view#heading=h.cf4bmhjgozry))
|
||||
|
||||
For example, for iterating over the code point ranges in a `USet` (excluding the strings):
|
||||
|
||||
```c++
|
||||
U_NAMESPACE_USE
|
||||
using U_HEADER_NESTED_NAMESPACE::USetRanges;
|
||||
LocalUSetPointer uset(uset_openPattern(u"[abcçカ🚴]", -1, &errorCode));
|
||||
for (auto [start, end] : USetRanges(uset.getAlias())) {
|
||||
printf("uset.range U+%04lx..U+%04lx\n", (long)start, (long)end);
|
||||
}
|
||||
for (auto range : USetRanges(uset.getAlias())) {
|
||||
for (UChar32 c : range) {
|
||||
printf("uset.range.c U+%04lx\n", (long)c);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
(Implementation note: On most platforms, when compiling ICU itself,
|
||||
the `U_HEADER_ONLY_NAMESPACE` is `icu::internal`,
|
||||
so that any such symbols that get exported differ from the ones that calling code sees.
|
||||
On Windows, where DLL exports are explicit,
|
||||
the namespace is always the same, but these header-only APIs are not marked for export.)
|
||||
|
||||
### Migration Issues Related to CLDR
|
||||
* See [CLDR 46 migration issues](https://cldr.unicode.org/downloads/cldr-46#migration)
|
||||
|
||||
## ICU4C Platform Support
|
||||
|
||||
|
@ -97,27 +228,30 @@ We routinely test on recent versions of Linux, macOS, and Windows.
|
|||
|
||||
We accept patches for other platforms.
|
||||
|
||||
For ICU 76, we have received a contribution to make ICU4C work again on z/OS,
|
||||
using a newer (clang-based) compiler. ([ICU-22714](https://unicode-org.atlassian.net/browse/ICU-22714) [icu/pull/3008](https://github.com/unicode-org/icu/pull/3008) + [ICU-22916](https://unicode-org.atlassian.net/browse/ICU-22916) [icu/pull/3208](https://github.com/unicode-org/icu/pull/3208))
|
||||
|
||||
Windows: The minimum supported version is Windows 7. (See [How To Build And Install On Windows](../userguide/icu4c/build.html#how-to-build-and-install-on-windows) for more details.)
|
||||
|
||||
## ICU4J Platform Support
|
||||
|
||||
ICU4J works on Java 8..17 (at least).
|
||||
ICU4J works on Java 8..21 (at least).
|
||||
|
||||
ICU4J should work on Android API level 21 and later but may require “[library desugaring](https://developer.android.com/studio/write/java8-support#library-desugaring)”.
|
||||
|
||||
## Download
|
||||
|
||||
Source and binary downloads are available on the git/GitHub tag page: TODO: https://github.com/unicode-org/icu/releases/tag/release-76-1
|
||||
Source and binary downloads are available on the git/GitHub tag page: https://github.com/unicode-org/icu/releases/tag/release-76-rc
|
||||
|
||||
See the [Source Code Setup](../devsetup/source/) page for how to download the ICU file tree directly from GitHub.
|
||||
|
||||
ICU locale data was generated from CLDR data equivalent to:
|
||||
|
||||
* TODO: fix/update
|
||||
* https://github.com/unicode-org/cldr/releases/tag/release-46-beta4
|
||||
* https://github.com/unicode-org/cldr-staging/releases/tag/release-46-beta4
|
||||
* https://github.com/unicode-org/cldr/releases/tag/release-46-beta3
|
||||
* https://github.com/unicode-org/cldr-staging/releases/tag/release-46-beta3
|
||||
|
||||
TODO: Maven dependency:
|
||||
[Maven dependency](https://central.sonatype.com/artifact/com.ibm.icu/icu4j):
|
||||
TODO
|
||||
```
|
||||
<dependency>
|
||||
<groupId>com.ibm.icu</groupId>
|
||||
|
|
Loading…
Add table
Reference in a new issue