ICU-22773 Migrate the CLDR conversion tool to Maven

This commit is contained in:
Mihai Nita 2024-12-09 19:26:40 +00:00
parent 3b9c0fc4a5
commit 2fa8a0908c
32 changed files with 1347 additions and 1114 deletions

View file

@ -1,6 +1,6 @@
// © 2022 and later: Unicode, Inc. and others.
// License & terms of use: http://www.unicode.org/copyright.html
// Generated using tools/cldr/cldr-to-icu/build-icu-data.xml
// Generated using tools/cldr/cldr-to-icu/
//
// Include Japanese adaboost model.
{

View file

@ -1,6 +1,6 @@
// © 2021 and later: Unicode, Inc. and others.
// License & terms of use: http://www.unicode.org/copyright.html
// Generated using tools/cldr/cldr-to-icu/build-icu-data.xml
// Generated using tools/cldr/cldr-to-icu/
//
// Include Burmese and Thai lstm models.
{

View file

@ -27,8 +27,8 @@ All Rights Reserved.
# Intro and setup
These instructions describe how to regenerate ICU4C locale and linguistic data from CLDR,
and then how to convert that ICU4 data for ICU4J (data jars and maven resources).
They apply to CLDR 44 / ICU 74 and later.
and then how to convert that ICU4C data for ICU4J (data jars and maven resources).
They apply to CLDR 47 / ICU 77 and later.
To use these instructions just for generating ICU4J data from ICU4C, you only need to use
steps 1, 8, and 12 in the Process section.
@ -37,22 +37,26 @@ The full process requires local copies of
* CLDR (the source of most of the data, and some Java tools)
* The complete ICU source tree, including:
* tools: includes the LdmlConverter build tool and associated config files
* icu4c: the target for converted CLDR data, and source for ICU4J data; includes tests for the converted data
* icu4j: the target for updated data jars; includes tests for the converted data
* `tools`: includes the `LdmlConverter` build tool and associated config files
* `icu4c`: the target for converted CLDR data, and source for ICU4J data; includes tests for the converted data
* `icu4j`: the target for updated data jars; includes tests for the converted data
For an official CLDR data integration into ICU, these should be clean, freshly
checked-out. For released CLDR sources, an alternative to checking out sources
for a given version is downloading the zipped sources for the common (core.zip)
and tools (tools.zip) directory subtrees from the Data column in
for a given version is downloading the zipped sources for the common (`core.zip`)
and tools (`tools.zip`) directory subtrees from the Data column in
[CLDR Releases/Downloads](https://cldr.unicode.org/index/downloads)
Besides a standard JDK, the process also requires [ant](https://ant.apache.org) and
Besides a standard JDK 11+, the process also requires [ant](https://ant.apache.org) and
[maven](https://maven.apache.org) plus the xml-apis.jar from the
[Apache xalan package](https://xalan.apache.org/xalan-j/downloads.html) _(Is this
latter requirement still true?)_. You will also need to have performed the
latter requirement still true?)_.
If you do CLDR development you can configure maven as documented at
[CLDR Maven setup](http://cldr.unicode.org/development/maven) (non-Eclipse version).
But for the CLDR to ICU data conversion, or for regular ICU development this is not needed.
Notes:
* Enough things can (and will) fail in this process that it is best to
@ -65,12 +69,12 @@ Notes:
files are used in addition to the CLDR files as inputs to the CLDR data build
process for ICU):
* The primary file to edit for adding/removing locales and/or collation and
rbnf data is<br>
`$TOOLS_ROOT/cldr/cldr-to-icu/build-icu-data.xml`.
`rbnf` data is \
`$ICU_DIR/tools/cldr/cldr-to-icu/config.xml`.
* There are some files in `icu4c/source/data/xml/` that may need editing for
certain additions. This is especially true for brkitr additions; however there
are rbnf files there that add some rules. The collation files there mainly
hook up the UCA collation rules in `icu4c/data/unidata/UCARules.txt` to the
certain additions. This is especially true for `brkitr` additions; however there
are `rbnf` files there that add some rules. The collation files there mainly
hook up the UCA collation rules in `icu4c/source/data/unidata/UCARules.txt` to the
collation data. To process these files, certain CLDR dtds are copied over to
ICU.
@ -88,14 +92,14 @@ considerations:
# CLDR prerequisites for BRS integrations
The following tasks should be done in the CLDR repo before beginning a CLDR-ICU
integration that ss part of the BRS process; handle each of these using a separate
integration that is part of the BRS process; handle each of these using a separate
ticket and a separate PR:
1. Generate updated CLDR test data (which is copied to ICU), using the process in
[Generating CLDR testData](https://docs.google.com/document/d/1-RC99npKcSSwUoYGkSzxaKOe76gYRkWhGdFzCdIBCu4/edit#heading=h.2rum9c6hrr4w)
2. Run CLDRModify with no options with no options and then with -fP. The webpage
for CLDRModify is currently being converted to markdown, a reference to it will
2. Run `CLDRModify` with no options with no options and then with `-fP`. The web page
for `CLDRModify` is currently being converted to markdown, a reference to it will
be added when that process is complete.
# Environment variables
@ -120,61 +124,61 @@ There are several environment variables that need to be defined.
* `CLDR_TMP_DIR`: Parent of temporary CLDR production data. Defaults to
`$CLDR_DIR/../cldr-aux` (sibling to `CLDR_DIR`).
> **NOTE:** As of CLDR 36 and 37, the GenerateProductionData tool no longer
> **NOTE:** As of CLDR 36 and 37, the `GenerateProductionData` tool no longer
generates data by default into `$CLDR_TMP_DIR/production`; instead it
generates data into `$CLDR_DIR/../cldr-staging/production` (though there is
a command-line option to override this). However the rest of the build still
assumes that the generated data is in `$CLDR_TMP_DIR/production`.
So `CLDR_TMP_DIR` must be defined to be `CLDR_DIR/../cldr-staging`.
3. ICU-related variables
* `ICU4C_DIR`: Path to root of ICU4C sources, below which is the source dir.
* `ICU_DIR`: Path to root of ICU directory, below which are (e.g.) the
`icu4c`, `icu4j`, and `tools` directories.
* `ICU4J_ROOT`: Path to root of ICU4J sources, below which is the main dir.
* `ICU4C_DIR`: Path to root of ICU4C sources, below which is the `source` dir.
* `ICU4J_ROOT`: Path to root of ICU4J sources, below which is the `main` dir.
* `TOOLS_ROOT`: Path to root of ICU tools directory, below which are (e.g.) the
cldr and unicodetools dirs.
# Process
## 1 Environment variables
1a. Java, ant, and maven variables, adjust for your system
```
```sh
export JAVA_HOME=/usr/libexec/java_home
export ANT_OPTS="-Xmx8192m"
export MAVEN_ARGS="--no-transfer-progress"
```
1b. CLDR variables, adjust for your setup; with cygwin it might be e.g.
```
```sh
CLDR_DIR=`cygpath -wp /build/cldr`
```
Note that for cldr-staging we do not use personal forks, we commit directly.
```
```sh
export CLDR_DIR=$HOME/cldr-myfork
export CLDR_TMP_DIR=$HOME/cldr-staging
export CLDR_DATA_DIR=$HOME/cldr-staging/production
```
1c. ICU variables
```
```sh
export ICU4C_DIR=$HOME/icu-myfork/icu4c
export ICU4J_ROOT=$HOME/icu-myfork/icu4j
export TOOLS_ROOT=$HOME/icu-myfork/tools
```
1d. Directory for logs/notes (create if does not exist)
```
```sh
export NOTES=...(some directory)...
mkdir -p $NOTES
```
1e. The name of the icu data directory for Java (for example `icudt74b`)
```
```sh
export ICU_DATA_VER=icudt(version)b
```
@ -182,10 +186,10 @@ export ICU_DATA_VER=icudt(version)b
2a. Configure ICU4C, build and test without new data first, to verify that
there are no pre-existing errors, and to build some tools needed for later
steps. Here `<platform>` is the runConfigureICU code for the platform you
steps. Here `<platform>` is the `runConfigureICU` code for the platform you
are building on, e.g. Linux, macOS, Cygwin.
(optionally build with debug enabled)
```
```sh
cd $ICU4C_DIR/source
./runConfigureICU [--enable-debug] <platform>
make clean
@ -195,7 +199,7 @@ make check 2>&1 | tee $NOTES/icu4c-oldData-makeCheck.txt
2b. Now with ICU4J, build and test without new data first, to verify that
there are no pre-existing errors (or at least to have the pre-existing errors
as a base for comparison):
```
```sh
cd $ICU4J_ROOT
mvn clean
mvn verify 2>&1 | tee $NOTES/icu4j-oldData-mvnCheck.txt
@ -210,31 +214,33 @@ cp -p $CLDR_DIR/common/dtd/ldmlICU.dtd $ICU4C_DIR/source/data/dtd/cldr/common/dt
```
3b. Update the cldr-icu tooling to use the latest tagged version of ICU
```
open $TOOLS_ROOT/cldr/cldr-to-icu/pom.xml
```sh
open $ICU_DIR/tools/cldr/cldr-to-icu/pom.xml
```
(search for `icu4j-for-cldr` and update to the latest tagged version per instructions)
3c. Update the build for any new icu version, added locales, etc.
```sh
# ICU version
open $ICU_DIR/tools/cldr/cldr-to-icu/pom.xml
# Locales and other configuration changes
open $ICU_DIR/tools/cldr/cldr-to-icu/config.xml
```
open $TOOLS_ROOT/cldr/cldr-to-icu/build-icu-data.xml
```
(update icuVersion, icuDataVersion if necessary; update lists of locales to include if necessary)
(update `icuVersion`, `icuDataVersion` if necessary; update lists of locales to include if necessary)
3d. If there are new data types or variants in CLDR, you may need to update the
files that specify mapping of CLDR data to ICU rseources:
```
open $TOOLS_ROOT/cldr/cldr-to-icu/src/main/resources/ldml2icu_locale.txt
open $TOOLS_ROOT/cldr/cldr-to-icu/src/main/resources/ldml2icu_supplemental.txt
files that specify mapping of CLDR data to ICU resources:
```sh
open $ICU_DIR/tools/cldr/cldr-to-icu/src/main/resources/ldml2icu_locale.txt
open $ICU_DIR/tools/cldr/cldr-to-icu/src/main/resources/ldml2icu_supplemental.txt
```
## 4 Build and install CLDR jar
See `$TOOLS_ROOT/cldr/lib/README.txt` for more information on the CLDR
jar and the `install-cldr-jars.sh` script.
```
cd $TOOLS_ROOT/cldr
ant install-cldr-libs
See `$ICU_DIR/tools/cldr/cldr-to-icu/README.md` for more information on the CLDR jar.
```sh
cd "$CLDR_DIR"
mvn clean install -pl :cldr-all,:cldr-code -DskipTests -DskipITs
```
## 5 Generate CLDR production data and convert for ICU
@ -247,14 +253,15 @@ This process uses ant with ICU4C's `data/build.xml`
(usually `$CLDR_TMP_DIR/production`), required if any CLDR data has changed.
* Running `ant setup` is not required, but it will print useful errors to
debug issues with your path when it fails.
```
```sh
cd $ICU4C_DIR/source/data
ant cleanprod
ant setup
ant proddata 2>&1 | tee $NOTES/cldr-newData-proddataLog.txt
```
> Note, for CLDR development, at this point tests are sometimes run on the
> Note, for CLDR development, at this point tests are sometimes run on the
production data, see
[BRS: Run tests on production data](https://cldr.unicode.org/development/cldr-big-red-switch/brs-run-tests-on-production-data)
@ -262,26 +269,27 @@ ant proddata 2>&1 | tee $NOTES/cldr-newData-proddataLog.txt
These include .txt files and .py files. These new files will replace whatever was
already present in the ICU4C sources. This process uses the `LdmlConverter` in
`$TOOLS_ROOT/cldr/cldr-to-icu/`; see `$TOOLS_ROOT/cldr/cldr-to-icu/README.txt`.
`$ICU_DIR/tools/cldr/cldr-to-icu/`; see `$ICU_DIR/tools/cldr/cldr-to-icu/README.md`.
* This process will take several minutes, during most of which there will be no log
output (so do not assume nothing is happening). Keep a log so you can investigate
anything that looks suspicious.
* Note that `ant clean` should _not_ be run before this. The `build-icu-data.xml` process
* The conversion tool
will automatically run its own "clean" step to delete files it cannot determine to
be ones that it would generate, except for pasts listed in `<retain>` elements such as
`coll/de__PHONEBOOK.txt`, `coll/de_.txt`, etc.
* Before running ant to regenerate the data, make any necessary changes to the
build-icu-data.xml file, such as adding new locales etc.
```
cd $TOOLS_ROOT/cldr/cldr-to-icu
ant -f build-icu-data.xml -DcldrDataDir="$CLDR_TMP_DIR/production" | tee $NOTES/cldr-newData-builddataLog.txt
* Before running the tool to regenerate the data, make any necessary changes to the
`config.xml` file, such as adding new locales etc.
```sh
cd $ICU_DIR/tools/cldr/cldr-to-icu
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar --cldrDataDir="$CLDR_TMP_DIR/production" | tee $NOTES/cldr-newData-builddataLog.txt
```
5c. Update the CLDR testData files needed by ICU4C/J tests, ensuring
they are representative of the newest CLDR data.
```
cd $TOOLS_ROOT/cldr
```sh
cd $ICU_DIR/tools/cldr
ant copy-cldr-testdata
```
@ -289,7 +297,7 @@ ant copy-cldr-testdata
(This step has been subsumed into 5c above)
5e. For now, manually re-add the `lstm` entries in `data/brkitr/root.txt`
```
```sh
open $ICU4C_DIR/source/data/brkitr/root.txt
```
Paste the following block after the dictionaries block and before the final closing '}':
@ -302,20 +310,20 @@ Paste the following block after the dictionaries block and before the final clos
5f. Update hard-coded lists in ICU
ICU4 has some hard-coded lists of locale-related codes that may need updating. Ideally these should
ICU has some hard-coded lists of locale-related codes that may need updating. Ideally these should
be replaced by data converted from CLDR ([ICU-22839](https://unicode-org.atlassian.net/browse/ICU-22839)). In the
meantime these need to be updated manually.
| code type | icu4c/source library file(s) | icu4c/source test file(s) |
| -------------------------------------------------------------------------------------------- | ------------------------------------------- | ------------------------------------------- |
| language<BR>(at least all language codes in ICU locales or CLDR attributeValueValidity.xml) | common/uloc.cpp: LANGUAGES[], LANGUAGES_3[] | test/testdata/structLocale.txt: Languages |
| region<BR>(at least all region codes in ICU locales or CLDR attributeValueValidity.xml) | common/uloc.cpp: COUNTRIES[], COUNTRIES_3[] | test/testdata/structLocale.txt: Countries |
| currency (see note below)<BR>(at least everything in CLDR supplementalData.xml currencyData) | common/ucurr.cpp: gCurrencyList[]] | test/testdata/structLocale.txt: Currencies,CurrencyPlurals<BR>test/cintltst/currtest.c:TestEnumList() |
| timezone | (not currently aware of hard-coded list) | test/testdata/structLocale.txt: zoneStrings |
| language<BR>(at least all language codes in ICU locales or CLDR `attributeValueValidity.xml`) | `common/uloc.cpp`: `LANGUAGES[], LANGUAGES_3[]` | `test/testdata/structLocale.txt`: Languages |
| region<BR>(at least all region codes in ICU locales or CLDR `attributeValueValidity.xml`) | `common/uloc.cpp`: `COUNTRIES[], COUNTRIES_3[]` | `test/testdata/structLocale.txt`: Countries |
| currency (see note below)<BR>(at least everything in CLDR `supplementalData.xml` `currencyData`) | `common/ucurr.cpp`: `gCurrencyList[]]` | `test/testdata/structLocale.txt`: `Currencies`,`CurrencyPlurals`<BR>`test/cintltst/currtest.c`:`TestEnumList()` |
| timezone | (not currently aware of hard-coded list) | `test/testdata/structLocale.txt`: `zoneStrings` |
Note: currency code lists are also in other code lists along with measurement units,
but these are re-generated using the procedure in
[Updating MeasureUnit with new CLDR data](https://unicode-org.github.io/icu/processes/release/tasks/updating-measure-unit.html)
[Updating `MeasureUnit` with new CLDR data](https://unicode-org.github.io/icu/processes/release/tasks/updating-measure-unit.html)
(also mentioned in step 14 below).
## 6 Check the results
@ -323,7 +331,7 @@ but these are re-generated using the procedure in
Check which data files have modifications, which have been added or removed
(if there are no changes, you may not need to proceed further). Make sure the
list seems reasonable. You may want to save logs, and possibly examine them...
```
```sh
cd $ICU4C_DIR/..
git status
git status > $NOTES/gitStatusDelta-data.txt
@ -332,7 +340,7 @@ open $NOTES/gitDiffDelta-data.txt
```
6a. You may also want to check which files were modified in CLDR production data:
```
```sh
cd $CLDR_TMP_DIR
git status
git status > $NOTES/gitStatusDelta-staging.txt
@ -342,25 +350,25 @@ git diff > $NOTES/gitDiffDelta-staging.txt
## 7 Fix data generation errors
Look for evident errors in the list of file changes, or in the file diffs.
Fixing them may entail modifying CLDR source data or `TOOLS_ROOT` config files or
Fixing them may entail modifying CLDR source data or `$ICU_DIR/tools/cldr/cldr-to-icu` config files or
tooling.
## 8 Rebuild ICU4C with new data, run tests
8a. Re-run configure and make clean, necessary to handle any files added or deleted:
```
```sh
cd $ICU4C_DIR/source
./runConfigureICU [--enable-debug] <platform>
make clean
```
8b. Do the rebuild, keeping a log as before:
```
```sh
make check 2>&1 | tee $NOTES/icu4c-newData-makeCheck.txt
```
To re-run a specific test if necessary when fixing bugs; for example:
```
```sh
cd test/intltest
DYLD_LIBRARY_PATH=../../lib:../../stubdata:../../tools/ctestfw:$DYLD_LIBRARY_PATH ./intltest -e -G format/NumberTest/NumberPermutationTest
cd ../..
@ -380,7 +388,8 @@ ticket under which you are performing the integration, if you have one), fix the
and regenerate from step 4.
If the data is OK , other sources of failure can include:
* Problems with the CLDR-ICU conversion process (pehaps some locale data is not getting
* Problems with the CLDR-ICU conversion process (perhaps some locale data is not getting
converted properly; go back to step 3, adjust and repeat from there.
* Problems with ICU library code that may not be using new resources properly. Fix and
repeat from step 8.
@ -390,9 +399,9 @@ If the data is OK , other sources of failure can include:
you will need to update `icu4c/test/testdata/structLocale.txt` (otherwise
`/tsutil/cldrtest/TestLocaleStructure` may fail).
## 10 Running ICU4C tests in exhaustive mode.
## 10 Running ICU4C tests in exhaustive mode
Exhautive tests should always be run for a CLDR-ICU integration PR before it is merged.
Exhaustive tests should always be run for a CLDR-ICU integration PR before it is merged.
Once you have a PR, you can do this for both C and J as part of the pre-merge CI tests
by manually running a workflow (the exhaustive tests are not run automatically on every PR).
See [Continuous Integration / Exhaustive Tests](../userguide/dev/ci.md#exhaustive-tests).
@ -400,7 +409,7 @@ See [Continuous Integration / Exhaustive Tests](../userguide/dev/ci.md#exhaustiv
The following instructions run the ICU4C exhaustive tests locally (which you may want to do
before even committing changes, or which may be necessary to diagnose failures in the
CI tests):
```
```sh
cd $ICU4C_DIR/source
export INTLTEST_OPTS="-e"
export CINTLTST_OPTS="-e"
@ -415,13 +424,13 @@ appropriate, and repeating from step 4 or 8 as appropriate.
## 12 Transfer the ICU4C data to ICU4J
12a. You need to reconfigure ICU4C to include the unicore data.
```
```sh
cd $ICU4C_DIR/source
ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ./runConfigureICU <platform>
```
12b. Rebuild the data with the new config setting, then create the ICU4J data jar.
```
```sh
cd $ICU4C_DIR/source/data
make clean
make -j -l2.5
@ -429,13 +438,13 @@ make icu4j-data-install
```
12c. Create the test data jar
```
```sh
cd $ICU4C_DIR/source/test/testdata
make icu4j-data-install
```
12d. Update the extracted {main, test} data files in the Maven build
```
```sh
cd $ICU4J_ROOT
./extract-data-files.sh
```
@ -443,7 +452,7 @@ cd $ICU4J_ROOT
## 13 Rebuild ICU4J with new data, run tests
13a. Run the tests using the maven build
```
```sh
cd $ICU4J_ROOT
mvn clean
mvn install 2>&1 | tee $NOTES/icu4j-newData-mvnCheck.txt
@ -451,26 +460,29 @@ mvn install 2>&1 | tee $NOTES/icu4j-newData-mvnCheck.txt
It is possible to re-run a specific test class or method if necessary when fixing bugs.
For example (using artifactId, full class name, test all methods):
```
For example (using `artifactId`, full class name, test all methods):
```sh
mvn install -pl :core -Dtest=com.ibm.icu.dev.test.util.LocaleBuilderTest
```
or (example of using module path, class name, one method):
```
```sh
mvn install -pl main/common_tests -Dtest=MeasureUnitTest#TestGreek
```
13b. Optionally run the tests in exhautive mode
13b. Optionally run the tests in exhaustive mode
Optionally run before committing changes, or run to diagnose failures from
running exhastive CI tests in the PR using `/azp run CI-Exhaustive`:
```
Optionally run exhaustive tests locally before committing changes:
```sh
cd $ICU4J_ROOT
mvn install -DICU.exhaustive=10 2>&1 | tee $NOTES/icu4j-newData-mvnCheckEx.txt
```
Exhaustive tests in CI can be triggered by running the "Exhaustive Tests for ICU"
action from the GitHub web UI.
See [Continuous Integration / Exhaustive Tests](../userguide/dev/ci.md#exhaustive-tests).
Running a specific test is the same as above:
```
```sh
mvn install --pl :core -DICU.exhaustive=10 -Dtest=ExhaustiveNumberTest
```
@ -482,7 +494,7 @@ step 4, as appropriate, until there are no more failures in ICU4C or ICU4J.
Note that certain data changes and related test failures may require the
rebuilding of other kinds of data and/or code. For example:
### Updating MeasureUnit code and tests
### Updating `MeasureUnit` code and tests
If you see a failure such as
```
@ -490,7 +502,7 @@ MeasureUnitTest testCLDRUnitAvailability Failure (MeasureUnitTest.java:3410) : U
```
then you will need to update the C and J library and test code for new measurement
units, see the procedure at
[Updating MeasureUnit with new CLDR data](https://unicode-org.github.io/icu/processes/release/tasks/updating-measure-unit.html)
[Updating `MeasureUnit` with new CLDR data](https://unicode-org.github.io/icu/processes/release/tasks/updating-measure-unit.html)
### Updating plurals test data
@ -503,12 +515,12 @@ To address these requires updating the LOCALE_SNAPSHOT data in
```
$ICU4J_ROOT/main/common_tests/src/test/java/com/ibm/icu/dev/test/format/PluralRulesTest.java
```
by modifying the TestLocales() test there to run `generateLOCALE_SNAPSHOT()` and
by modifying the `TestLocales()` test there to run `generateLOCALE_SNAPSHOT()` and
then copying in the updated data.
## 15 Check the ICU file changes and commit
```
```sh
cd $ICU4C_DIR/source
make clean
cd $ICU4J_ROOT
@ -528,13 +540,13 @@ git push origin ICU-nnnnn-branchname
(Only for an official integration from CLDR git repositories)
16a. Check cldr-staging changes, and commit
```
```sh
cd $CLDR_TMP_DIR
git status
```
Then `git add` or `git rm` files as necessary. Record the changes, commit and push.
```
```sh
git status > $NOTES/gitStatusDelta-production-afterAdd.txt
git commit -m 'CLDR-nnnnn production data corresponding to CLDR release-nn-stage'
git push origin main
@ -545,8 +557,8 @@ git push origin main
(There may be other cldr-staging changes unrelated to production data, such as charts
or spec; we want to include them in the tag, so pull first, but log to see what the
chnages are first)
```
changes are first)
```sh
cd $CLDR_TMP_DIR
git pull
git log
@ -559,7 +571,7 @@ git push --tags
We need to tag the main cldr repository. If $CLDR_DIR represents that repository,
this is easy:
```
```sh
cd $CLDR_DIR
git tag -a "release-nn-stage" -m "CLDR-nnnnn: tag CLDR release-nn-stage"
git push --tags
@ -567,7 +579,7 @@ git push --tags
However if $CLDR_DIR represents your personal fork or a branch from it, you need to
figure out what commit hash yo have integrated, and tag that hash in the main repo.
```
```sh
cd $CLDR_DIR
git log
```
@ -575,7 +587,7 @@ Note the latest commit hash hhhhhhhh...
Then switch to the main repo, update it, and tag the appropriate hash (making sure
it is in that repo!):
```
```sh
cd $HOME/cldr
git pull
git log
@ -583,7 +595,7 @@ git tag -a "release-nn-stage" -m "CLDR-nnnnn: tag CLDR release-nn-stage" hhhhhhh
git push --tags
```
## 18 Pubish the cldr tags in github
## 18 Publish the cldr tags in github
You should publish the cldr and cldr-staging tags in github.

View file

@ -53,6 +53,13 @@ need to be correspondingly updated. See below for more files to be updated and s
[icu4c/source/data/misc/icuver.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/misc/icuver.txt)
needs to be updated with the correct version number for ICU and its data.
#### Since ICU 77
The tool takes the `icuVersion` and `icuDataVersion` from the official ICU APIs.
(from the icu4j listed as a dependency of the tool, usually the one you just built from the `icu4j` folder).
If you need values different than that, you can specify them as the command line parameters (`--icuVersion` and `--icuDataVersion`).
#### Since ICU 68
In
@ -212,8 +219,18 @@ The command requires a version number string that follows the typical Java / Mav
6. cldr-to-icu build tool has a dependency on the icu4j packages which needs to be updated in [`tools/cldr/cldr-to-icu/pom.xml`](https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/pom.xml). Please update it to match the version that was updated in `icu4j/pom.xml` in the steps above.
`<version>74.0.1-SNAPSHOT</version>`
```xml
version>74.0.1-SNAPSHOT</version>
```
Since ICU 77 this moved to a property:
```xml
<icu4j.version>77.0.1-SNAPSHOT</icu4j.version>
```
Which can be easily be set from command line:
```sh
mvn versions:set-property -Dproperty=icu4j.version -DnewVersion=77.1 -f $ICU_DIR/tools/cldr/cldr-to-icu
```
#### Until ICU 73 (inclusive)

View file

@ -290,6 +290,8 @@ copying that version number into the $ICU_SRC/.bazeliskrc config file.
- run Unicode Tools GenerateUnihanCollators & GenerateUnihanCollatorFiles,
check CLDR diffs, copy to CLDR, test CLDR, ... as documented there
- generate ICU zh collation data
WARNING: outdated, don't do this, follow the tools/cldr/cldr-to-icu/README.md file!
--- Old text from here:
instructions inspired by
https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/README.txt and
https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt

13
tools/cldr/.gitignore vendored
View file

@ -1,9 +1,4 @@
# Exclude the Maven local repository but keep the lib directory and the top-level readme, scripts and build config.
/lib/**
!/lib/README.txt
!/lib/install-cldr-jars.sh
!/lib/pom.xml
# Ignore the default Maven target directory.
/cldr-to-icu/target
# Eclipse IDE generated files
.classpath
.project
.settings/

View file

@ -3,7 +3,7 @@
<!-- This build file is intended to become the single mechanism for working with CLDR
code and data when building ICU data.
Eventually it will encompass:
* Building ICU data form CLDR data via cldr-to-icu.
* Building the CLDR libraries needed to support ICU data conversion.
@ -70,23 +70,4 @@
<delete dir="${testDataDir4J}"/>
</target>
<!-- Builds the ICU data, using the Ant build file in the cldr-to-icu directory and passing.
through any specified arguments for controlling the build. If you need more control when
building ICU data (such as incrementally building parts of the data), you should use the
build-icu-data.xml file directly. -->
<target name="build-icu-data">
<ant dir="cldr-to-icu" antfile="build-icu-data.xml" target="all" inheritAll="true"/>
</target>
<!-- Deletes generated ICU data by invoking "clean" in cldr-to-icu/build-icu-data.xml -->
<target name="clean-icu-data">
<ant dir="cldr-to-icu" antfile="build-icu-data.xml" target="clean" inheritAll="true"/>
</target>
<!-- Installs the CLDR library dependencies needed for building ICU data. -->
<target name="install-cldr-libs" depends="init-args">
<exec dir="lib" executable="install-cldr-jars.sh" resolveexecutable="true" failonerror="true">
<arg line="${cldrDir}"/>
</exec>
</target>
</project>

View file

@ -1,31 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<classpath>
<classpathentry kind="src" output="target/classes" path="src/main/java">
<attributes>
<attribute name="optional" value="true"/>
<attribute name="maven.pomderived" value="true"/>
</attributes>
</classpathentry>
<classpathentry excluding="**" kind="src" output="target/classes" path="src/main/resources">
<attributes>
<attribute name="maven.pomderived" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="src" output="target/test-classes" path="src/test/java">
<attributes>
<attribute name="optional" value="true"/>
<attribute name="maven.pomderived" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.8">
<attributes>
<attribute name="maven.pomderived" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="con" path="org.eclipse.m2e.MAVEN2_CLASSPATH_CONTAINER">
<attributes>
<attribute name="maven.pomderived" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="output" path="target/classes"/>
</classpath>

View file

@ -1,23 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<projectDescription>
<name>cldr-to-icu</name>
<comment></comment>
<projects>
</projects>
<buildSpec>
<buildCommand>
<name>org.eclipse.jdt.core.javabuilder</name>
<arguments>
</arguments>
</buildCommand>
<buildCommand>
<name>org.eclipse.m2e.core.maven2Builder</name>
<arguments>
</arguments>
</buildCommand>
</buildSpec>
<natures>
<nature>org.eclipse.jdt.core.javanature</nature>
<nature>org.eclipse.m2e.core.maven2Nature</nature>
</natures>
</projectDescription>

View file

@ -1,5 +0,0 @@
eclipse.preferences.version=1
org.eclipse.jdt.core.compiler.codegen.targetPlatform=1.8
org.eclipse.jdt.core.compiler.compliance=1.8
org.eclipse.jdt.core.compiler.problem.forbiddenReference=warning
org.eclipse.jdt.core.compiler.source=1.8

View file

@ -1,5 +0,0 @@
eclipse.preferences.version=1
org.eclipse.jdt.ui.ignorelowercasenames=true
org.eclipse.jdt.ui.importorder=java;javax;org;com;
org.eclipse.jdt.ui.ondemandthreshold=9999
org.eclipse.jdt.ui.staticondemandthreshold=9999

View file

@ -1,4 +0,0 @@
activeProfiles=
eclipse.preferences.version=1
resolveWorkspaceProjects=true
version=1

View file

@ -6,32 +6,56 @@ License & terms of use: http://www.unicode.org/copyright.html
# Basic instructions for running the LdmlConverter via Maven
> Note: While this document provides useful background information about the
LdmlConverter, the actual complete process for integrating CLDR data to ICU
`LdmlConverter`, the actual complete process for integrating CLDR data to ICU
is described in the document `../../../docs/processes/cldr-icu.md` which is
best viewed as
[CLDR-ICU integration](https://unicode-org.github.io/icu/processes/cldr-icu.html)
## TLDR
* Define the `ICU_DIR`, `CLDR_DIR`, and `CLDR_DATA_DIR` environment variables, or (see below)
* Check / update versions
* Build ICU4J:
```sh
cd "$ICU_DIR"
mvn clean install -f icu4j -DskipTests -DskipITs
```
* Build the `cldr-code` library from the `cldr` repo:
```sh
cd "$CLDR_DIR"
mvn clean install -pl :cldr-all,:cldr-code -DskipTests -DskipITs
```
* Build the conversion tool:
```sh
cd "$ICU_DIR/tools/cldr/cldr-to-icu/"
mvn clean package -DskipTests -DskipITs
```
* Run the conversion tool:
```sh
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar
```
## Requirements
* A CLDR release for supplying CLDR data and the CLDR API.
* JDK 11+
* The Maven build tool
* The Ant build tool (using JDK 11+)
## Important directories
| Directory | Description |
|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `TOOLS_ROOT` | Path to root of ICU tools directory, below which are (e.g.) the `cldr/` and `unicodetools/` directories. |
| `ICU_DIR` | Path to root of ICU directory, below which are (e.g.) the `icu4c/`, `icu4j/` and `tools/` directories. |
| `CLDR_DIR` | This is the path to the to root of standard CLDR sources, below which are the `common/` and `tools/` directories. |
| `CLDR_DATA_DIR` | The top-level directory for the CLDR production data (typically the "production" directory in the staging repository). Usually generated locally or obtained from: https://github.com/unicode-org/cldr-staging/tree/main/production |
In Posix systems, it's best to set these as exported shell variables, and any
following instructions assume they have been set accordingly:
```
$ export TOOLS_ROOT=/path/to/icu/tools
$ export CLDR_DIR=/path/to/cldr
$ export CLDR_DATA_DIR=/path/to/cldr-staging/production
```sh
export TOOLS_ROOT=/path/to/icu/tools
export CLDR_DIR=/path/to/cldr
export CLDR_DATA_DIR=/path/to/cldr-staging/production
```
Note that you should not attempt to use data from the CLDR project directory
@ -40,65 +64,132 @@ relies on a pre-processing step, and the CLDR data must come from the separate
"staging" repository (i.e. https://github.com/unicode-org/cldr-staging) or be
pre-processed locally into a different directory.
:point_right: **Note**: the 3 folders can also be overridden:
* with Java properties (e.g. `-DCLDR_DIR=/foo/bar`)
* from the command line when invoking the tool (the `icuDir`, `cldrDir`, and `cldrDataDir` options)
## Initial Setup
This project relies on the Maven build tool for managing dependencies and uses
Ant for configuration purposes, so both will need to be installed. On a Debian
This project relies on the Maven build tool for managing dependencies, so it will need to be installed. On a Debian
based system, this should be as simple as:
```
$ sudo apt-get install maven ant
```sh
sudo apt-get install maven
```
You must also install an additional CLDR JAR file the local Maven repository at
`$TOOLS_ROOT/cldr/lib` (see the `README.txt` in that directory for more
information).
## Check / update versions
### Real versions
**ICU version (`real_icu_ver`):**
```sh
mvn help:evaluate -Dexpression=project.version -q -DforceStdout -f $ICU_DIR/icu4j
```
$ cd "$TOOLS_ROOT/cldr/lib"
$ ./install-cldr-jars.sh "$CLDR_DIR"
**CLDR Library version (`real_cldr_ver`):**
```sh
mvn help:evaluate -Dexpression=project.version -q -DforceStdout -f $CLDR_DIR/tools
```
### Dependency versions
**ICU version used by the cldr conversion tool:** \
⚠️ **Warning:** Must be the same as `real_icu_ver`
```sh
mvn help:evaluate -Dexpression=icu4j.version -q -DforceStdout -f $ICU_DIR/tools/cldr/cldr-to-icu
```
**CLDR library version used by the cldr conversion tool:** \
⚠️ **Warning:** Must be the same as `real_cldr_ver`
```sh
mvn help:evaluate -Dexpression=cldr-code.version -q -DforceStdout -f $ICU_DIR/tools/cldr/cldr-to-icu
```
**ICU version used by the cldr library:** \
⚠️ **Warning:** Must be the same as `real_icu_ver`
```sh
mvn help:evaluate -Dexpression=icu4j.version -q -DforceStdout -f $CLDR_DIR/tools
```
### TLDR (Quick update versions without checking)
```sh
# Get real versions
real_icu_ver=`mvn help:evaluate -Dexpression=project.version -q -DforceStdout -f $ICU_DIR/icu4j`
echo $real_icu_ver
real_cldr_ver=`mvn help:evaluate -Dexpression=project.version -q -DforceStdout -f $CLDR_DIR/tools`
echo $real_cldr_ver
# Set dependency versions
mvn versions:set-property -Dproperty=icu4j.version -DnewVersion=$real_icu_ver -f $ICU_DIR/tools/cldr/cldr-to-icu
mvn versions:set-property -Dproperty=cldr-code.version -DnewVersion=$real_cldr_ver -f $ICU_DIR/tools/cldr/cldr-to-icu
mvn versions:set-property -Dproperty=icu4j.version -DnewVersion=$real_icu_ver -f $CLDR_DIR/tools
```
## Build everything
You must also build and install an additional CLDR library in the the local Maven repository.
Since that depends on ICU4J, you need to build and install that first.
Lastly, build the conversion tool
```sh
# Build ICU4J
cd "$ICU_DIR"
mvn clean install -f icu4j -DskipTests -DskipITs
# Build the CLDR library
cd "$CLDR_DIR"
mvn clean install -pl :cldr-all,:cldr-code -DskipTests -DskipITs
# Build the conversion tool
cd "$ICU_DIR/tools/cldr/cldr-to-icu/"
mvn clean package -DskipTests -DskipITs
```
## Generating all ICU data and source code
Run the conversion tool:
```sh
cd "$ICU_DIR/tools/cldr/cldr-to-icu/"
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar
```
$ cd "$TOOLS_ROOT/cldr/cldr-to-icu"
$ ant -f build-icu-data.xml
```
You can run it with `--help` for all the options supported.
## Other Examples
* Outputting a subset of the supplemental data into a specified directory:
```
$ ant -f build-icu-data.xml -DoutDir=/tmp/cldr -DoutputTypes=plurals,dayPeriods -DdontGenCode=true
```sh
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar --outDir=/tmp/cldr --outputTypes=plurals,dayPeriods --dontGenCode=true
```
Note: Output types can be listed with mixedCase, lower_underscore or UPPER_UNDERSCORE.
Pass `-DoutputTypes=help` to see the full list.
* Outputting only a subset of locale IDs (and all the supplemental data):
```
$ ant -f build-icu-data.xml -DoutDir=/tmp/cldr -DlocaleIdFilter='(zh|yue).*' -DdontGenCode=true
```sh
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar --outDir=/tmp/cldr --outputTypes=plurals,dayPeriods --dontGenCode=true
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar --outDir=/tmp/cldr --localeIdFilter='(zh|yue).*' --dontGenCode=true
```
* Overriding the default CLDR version string (which normally matches the CLDR library code):
```
$ ant -f build-icu-data.xml -DcldrVersion="36.1"
```sh
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar --cldrVersion="36.1"
```
### Using `alt="ascii"` CLDR alternate values from the CLDR XML
CLDR provides alternate values in addition to the default values for locale data.
For example, some locales have time formats using U+202F NARROW NO-BREAK SPACE (NNBSP) between the hours/minutes/seconds and the day periods.
For example, some locales have time formats using U+202F NARROW NO-BREAK SPACE (`NNBSP`) between the hours/minutes/seconds and the day periods.
In order to provide the equivalent time formats that use the ASCII space
U+0020 SPACE,
the alternate values have the extra attribute `alt="ascii"`.
Follw these steps to generate ICU data using the ASCII versions of locale data:
1. First, edit the `build-icu-data.xml` file where it mentions `ALTERNATE VALUES`
1. First, edit the `config.xml` file where it mentions `ALTERNATE VALUES`
with the correctly annotated source path, target path, and locales list
as follows:
@ -150,10 +241,10 @@ as follows:
+ source="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='hms'][@alt='ascii']"/>
```
1. Then run the generator:
1. Then run the generator:
```
$ ant -f build-icu-data.xml <options>
```sh
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar <options>
```
## Config syntax details
@ -167,15 +258,13 @@ the following excerpt of the DTD schema indicates that there is a default value
<!ATTLIST timeFormat type NMTOKEN "standard" >
```
See `build-icu-data.xml` for documentation of all options and additional customization.
See `config.xml` for documentation of all options and additional customization.
## Running unit tests (CURRENTLY FAILING)
## Running unit tests
```sh
mvn test -DCLDR_DIR="$CLDR_DATA_DIR"
```
$ mvn test -DCLDR_DIR="$CLDR_DATA_DIR"
```
## Importing and running from an IDE
@ -183,3 +272,5 @@ This project should be easy to import into an IDE which supports Maven developme
as IntelliJ or Eclipse. It uses a local Maven repository directory for the unpublished
CLDR libraries (which are included in the project), but otherwise gets all dependencies
via Maven's public repositories.
But before importing and running it you still need to build the ICU4J and the CLDR library (see above).

View file

@ -1,11 +0,0 @@
*********************************************************************
*** © 2019 and later: Unicode, Inc. and others. ***
*** License & terms of use: http://www.unicode.org/copyright.html ***
*********************************************************************
The instructions for the LdmlConverter tool (a.k.a. CLDR-to-ICU converter) have
moved to README.md in this directory.
Please read README.md, or better yet, view the rendered form of its Markdown
contents online at Github
(ex: https://github.com/unicode-org/icu/tree/main/tools/cldr/cldr-to-icu)

View file

@ -1,472 +0,0 @@
<!-- © 2019 and later: Unicode, Inc. and others.
License & terms of use: http://www.unicode.org/copyright.html -->
<!--================================================================================
Setup:
Follow the installation instructions in README.txt in this directory.
To build ICU data files:
1: Determine the CLDR base directory and set the CLDR_DIR environment variable.
2: Determine the flags required (see the list of properties below).
3: Run: ant -f build-icu-data.xml -D<flag-name>=<flag-value>...
================================================================================-->
<!-- TODO: Add things like copying of a template directory and deleting previous files
(perhaps always generate into a temporary directory and copy back to avoid having
inconsistent state when the conversion is cancelled). -->
<project name="Convert" default="all" basedir="." xmlns:if="ant:if" xmlns:unless="ant:unless">
<target name="all" depends="init-args, prepare-jar, clean, convert"/>
<!-- Initialize the properties which were not already set on the command line. -->
<target name="init-args">
<property environment="env"/>
<!-- Inherit properties from environment variable unless specified. As usual
with Ant, this is messier than it should be. All we are saying here is:
"Use the property if explicitly set, otherwise use the environment variable."
We cannot just set the property to the environment variable, since expansion
fails for non existent properties, and you are left with a literal value of
"${env.CLDR_DATA_DIR}". -->
<condition property="cldrDataDir" value="${env.CLDR_DATA_DIR}">
<isset property="env.CLDR_DATA_DIR"/>
</condition>
<fail unless="cldrDataDir"
message="Set the CLDR_DATA_DIR environment variable (or cldrDataDir property) to the CLDR data directory (typically ending in '/production')"/>
<!-- Ant does not inherit this from the user's environment (and it can matter).
This is only needed because we have to "exec" a new Ant task below. -->
<condition property="javaHome" value="${env.JAVA_HOME}">
<isset property="env.JAVA_HOME"/>
</condition>
<!-- The output directory into which to write the converted ICU data. By default
this will overwrite (without deletion) the ICU data files in this ICU release,
so it is recommended that for testing, it be set to another value. -->
<property name="outDir" value="${basedir}/../../../icu4c/source/data/"/>
<!-- The output directory into which to write generated C/C++ code. By default
this will overwrite (without deletion) the generated C/C++ files in this
ICU release, so it is recommended that for testing, it be set to another value. -->
<property name="genCCodeDir" value="${basedir}/../../../icu4c/source/"/>
<!-- The output directory into which to write generated Java code. By default
this will overwrite (without deletion) the generated Java files in this
ICU release, so it is recommended that for testing, it be set to another value. -->
<property name="genJavaCodeDir" value="${basedir}/../../../icu4j/main/core"/>
<!-- Set this to true to prevent build-icu-data.xml from generating the generated
ICU source files -->
<property name="dontGenCode" value="false" />
<!-- The directory in which the additional ICU XML data is stored. -->
<property name="specialsDir" value="${basedir}/../../../icu4c/source/data/xml"/>
<!-- Default value for ICU version (icuver.txt). Update this for each release. -->
<property name="icuVersion" value="76.1.0.0"/>
<!-- Default value for ICU data version (icuver.txt). Update this for each release. -->
<property name="icuDataVersion" value="76.1.0.0"/>
<!-- An override for the CLDR version string (icuver.txt and others). This will be
extracted from the CLDR library used for building the data if not set here. -->
<property name="cldrVersion" value=""/>
<!-- The minimum draft status for CLDR data to be used in the conversion. See
CldrDraftStatus for more details. -->
<property name="minDraftStatus" value="contributed"/>
<!-- A regular expression to match the locale IDs to be generated (useful for
debugging specific regions). This is applied after locale ID specifications
have been expanded into full locale IDs, so the value "en" will NOT match
"en_GB" or "en_001" etc. -->
<property name="localeIdFilter" value=""/>
<!-- Whether to synthetically generate "pseudo locale" data ("en_XA" and "ar_XB"). -->
<property name="includePseudoLocales" value="false"/>
<!-- Whether to emit a debug report containing some possibly useful information after
the conversion has finished. -->
<!-- TODO: Currently this isn't hugely useful, so find out what people want. -->
<property name="emitReport" value="false"/>
<!-- List of output "types" to be generated (e.g. "rbnf,plurals,locales"); an empty
list means "build everything".
Note that the grouping of types is based on the legacy converter behaviour and
is not always directly associated with an output directory (e.g. "locales"
produces locale data for curr/, lang/, main/, region/, unit/, zone/ but NOT
coll/, brkitr/ or rbnf/).
Pass in the value "HELP" (or any invalid value) to see the full list of types. -->
<!-- TODO: Find out what common use cases are and use them. -->
<property name="outputTypes" value=""/>
<!-- Override to force the 'clean' task to delete files it cannot determine to be
auto-generated by this tool. This is useful if the file header changes since
the heading is what's used to recognize auto-generated files. -->
<property name="forceDelete" value="false"/>
</target>
<!-- Build a standalone JAR which is called by Ant (and which avoids needing to mess
about making Ant know the Maven class-path). -->
<target name="prepare-jar" depends="init-args">
<exec executable="mvn" searchpath="true" failonerror="true">
<arg value="compile"/>
</exec>
</target>
<!-- Somewhat hacky wrapper target which invokes the real conversion task.
This is done so we can set the environment variable of the new process and
effectively overwrite the CLDR_DIR value. If ever the CLDR library doesn't
need to use CLDR_DIR at runtime to find the production data, this can all be
removed. -->
<target name="convert" depends="init-args, prepare-jar">
<exec executable="ant" searchpath="true" failonerror="true">
<!-- The CLDR library wants CLDR_DIR set, to the data directory. -->
<env key="CLDR_DIR" value="${cldrDataDir}" />
<!-- Force inherit JAVA_HOME (this can be important). -->
<env key="JAVA_HOME" value="${javaHome}" />
<!-- Initial Ant command line with all the "interesting" bit in. -->
<arg line="-f build-icu-data.xml convert-impl -DcldrDir=${cldrDataDir}"/>
<!-- List all properties in the "convert-impl" task (except cldrDir). -->
<arg value="-DoutDir=${outDir}"/>
<arg value="-DgenCCodeDir=${genCCodeDir}"/>
<arg value="-DgenJavaCodeDir=${genJavaCodeDir}"/>
<arg value="-DdontGenCode=${dontGenCode}"/>
<arg value="-DspecialsDir=${specialsDir}"/>
<arg value="-DoutputTypes=${outputTypes}"/>
<arg value="-DicuVersion=${icuVersion}"/>
<arg value="-DicuDataVersion=${icuDataVersion}"/>
<arg value="-DcldrVersion=${cldrVersion}"/>
<arg value="-DminDraftStatus=${minDraftStatus}"/>
<arg value="-DlocaleIdFilter=${localeIdFilter}"/>
<arg value="-DincludePseudoLocales=${includePseudoLocales}"/>
<arg value="-DemitReport=${emitReport}"/>
</exec>
</target>
<!-- Do the actual CLDR data conversion, based on the command line arguments, built in
default properties and the configuration in the "<convert>" element below. -->
<target name="convert-impl">
<taskdef name="convert" classname="org.unicode.icu.tool.cldrtoicu.ant.ConvertIcuDataTask">
<classpath>
<pathelement path="target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar"/>
</classpath>
</taskdef>
<taskdef name="generateCode" classname="org.unicode.icu.tool.cldrtoicu.ant.GenerateCodeTask">
<classpath>
<pathelement path="target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar"/>
</classpath>
</taskdef>
<convert cldrDir="${cldrDir}" outputDir="${outDir}" specialsDir="${specialsDir}"
outputTypes="${outputTypes}" cldrVersion="${cldrVersion}"
icuVersion="${icuVersion}" icuDataVersion="${icuDataVersion}"
minimalDraftStatus="${minDraftStatus}" localeIdFilter="${localeIdFilter}"
includePseudoLocales="${includePseudoLocales}" emitReport="${emitReport}">
<!-- The primary set of locale IDs to be generated by default. The IDs in this list are
automatically expanded to include default scripts and all available regions. The
rules are:
1) Base languages are expanded to include default scripts (e.g. "en" -> "en_Latn").
2) All region and variant subtags are added for any base language or language+script
(e.g. "en" -> "en_GB" or "shi_Latn" -> "shi_Latn_MA").
If a non-default script is desired it should be listed explicitly (e.g. "sr_Latn").
Locale IDs with deprecated subtags (which become aliases) must still be listed in
full (e.g. "en_RH" or "sr_Latn_YU").
-->
<localeIds>
// A
af, agq, ak, am, ar, ars, as, asa, ast, az, az_AZ, az_Cyrl
// B
bas, be, bem, bez, bg, bgc, bho, blo, bm, bn, bo, br, brx, bs, bs_BA, bs_Cyrl
// C
ca, ccp, ce, ceb, cgg, chr, ckb, cs, csw, cv, cy
// D
da, dav, de, dje, doi, dsb, dua, dyo, dz
// E
ebu, ee, el, en, en_NH, en_RH, eo, es, et, eu, ewo
// F
fa, ff, ff_Adlm, ff_CM, ff_GN, ff_MR, ff_SN, fi, fil, fo, fr, fur, fy
// G
ga, gaa, gd, gl, gsw, gu, guz, gv
// H
ha, haw, he, hi, hi_Latn, hr, hsb, hu, hy
// I
ia, id, ie, ig, ii, in, in_ID, is, it, iw, iw_IL
// J
ja, jgo, jmc, jv
// K
ka, kab, kam, kde, kea, kgp, khq, ki, kk, kkj, kl, kln, km, kn, ko, kok, kok_Latn, ks
ks_Deva, ks_IN, ksb, ksf, ksh, ku, kw, kxv, kxv_Deva, kxv_IN, kxv_Orya, kxv_Telu, ky
// L
lag, lb, lg, lij, lkt, lmo, ln, lo, lrc, lt, lu, luo, luy, lv
// M
mai, mas, mer, mfe, mg, mgh, mgo, mi, mk, ml, mn, mni, mni_IN, mo, mr, ms
mt, mua, my, mzn
// N
naq, nb, nd, nds, ne, nl, nmg, nn, nnh, no, no_NO, no_NO_NY, nqo, nso, nus, nyn
// O
oc, om, or, os
// P
pa, pa_Arab, pa_IN, pa_PK, pcm, pl, prg, ps, pt
// Q
qu
// R
raj, rm, rn, ro, rof, ru, rw, rwk
// S
sa, sah, saq, sat, sat_IN, sbp, sc, sd, sd_Deva, sd_IN, sd_PK, se, seh, ses, sg, sh, sh_BA, sh_CS, sh_YU
shi, shi_Latn, shi_MA, si, sk, sl, smn, sn, so, sq, sr, sr_BA, sr_CS, sr_Cyrl_CS, sr_Cyrl_YU, sr_Latn
sr_Latn_CS, sr_Latn_YU, sr_ME, sr_RS, sr_XK, sr_YU, st, su, su_ID, sv, sw, syr, szl
// T
ta, te, teo, tg, th, ti, tk, tl, tl_PH, tn, to, tok, tr, tt, twq, tzm
// U
ug, uk, ur, uz, uz_AF, uz_Arab, uz_Cyrl, uz_UZ
// V
vai, vai_LR, vai_Latn, vec, vi, vmw, vun
// W
wae, wo
// X
xh, xnr, xog
// Y
yav, yi, yo, yrl, yue, yue_CN, yue_HK, yue_Hans
// Z
za, zgh, zh, zh_CN, zh_HK, zh_Hant, zh_MO, zh_SG, zh_TW, zu
</localeIds>
<!-- The following elements configure directories in which a subset of the available
locales IDs should be generated. Unlike the main <localeId> element, these
filters must specify all locale IDs in full (but since they mostly select base
languages, this isn't a big deal).
As well as allowing some data directories to have a subset of available data (via
the <localeIds> element) there are also mechanisms for controlling aliasing and
the locale parent relation which allows the sharing of some ICU data in cases
where it would otherwise need to be copied. The two mechanisms are:
1: inheritLanguageSubtag: Used to rewrite the parent of a locale ID from "root" to
its language subtag (e.g. "zh_Hant" has a natural parent of "root", but to allow
some base language data to be shared it can be made to have a parent of "zh").
2: forcedAlias: Used to add aliases for specific directories in order to affect the
ICU behaviour in special cases.
Between them these mechanisms are known as "tailorings" of the affected locales. -->
<!-- TODO: Explain why these special cases are needed/different. -->
<!-- Collation data is large, but also more sharable than other data, which is why there
are a number of aliases and parent remappings for this directory. -->
<directory dir="coll" inheritLanguageSubtag="bs_Cyrl, sr_Latn, zh_Hant">
<!-- These aliases are to avoid needing to copy and maintain the same collation data
for "zh" and "yue". The maximized versions of "yue_Hans" is "yue_Hans_CN" (vs
"zh_Hans_CN"), and for "yue" it's "yue_Hant_HK" (vs "zh_Hant_HK"), so the
aliases are effectively just rewriting the base language. -->
<forcedAlias source="yue" target="zh_Hant"/>
<forcedAlias source="yue_Hant" target="zh_Hant"/>
<forcedAlias source="yue_CN" target="zh_Hans"/>
<forcedAlias source="yue_Hans" target="zh_Hans"/>
<forcedAlias source="yue_Hans_CN" target="zh_Hans"/>
<!-- TODO: Find out and document this properly. -->
<forcedAlias source="sr_ME" target="sr_Cyrl_ME"/>
<localeIds>
root,
// A-B
af, am, ars, ar, as, az, be, bg, bn, bo, br, bs_Cyrl, bs,
// C-F
ca, ceb, chr, cs, cy, da, de_AT, de, dsb, dz, ee, el, en,
en_US_POSIX, en_US, eo, es, et, fa_AF, fa, ff_Adlm, ff, fil, fi, fo, fr_CA, fr, fy,
// G-J
ga, gl, gu, ha, haw, he, hi, hr, hsb, hu, hy,
id_ID, id, ig, in, in_ID, is, it, iw_IL, iw, ja,
// K-P
ka, kk, kl, km, kn, kok, ko, ku, ky, lb, lij, lkt, ln, lo, lt, lv,
mk, ml, mn, mo, mr, ms, mt, my, nb, nb_NO, ne, nl, nn, no, no_NO, nso,
om, or, pa_IN, pa, pa_Guru, pl, ps, pt,
// R-T
ro, ru, sa, se, sh_BA, sh_CS, sh, sh_YU, si, sk, sl, smn, sq,
sr_BA, sr_Cyrl_ME, sr_Latn, sr_ME, sr_RS, sr, st, sv, sw,
ta, te, th, tk, tn, to, tr,
// U-Z
ug, uk, ur, uz, vi, wae, wo, xh, yi, yo, yue_CN, yue_Hans_CN, yue_Hans
yue_Hant, yue, zh_CN, zh_Hans, zh_Hant, zh_HK, zh_MO, zh_SG, zh_TW, zh, zu
</localeIds>
</directory>
<directory dir="rbnf">
<!-- It is not at all clear why this is being done. It's certainly not exactly the
same as above, since (a) the alias is reversed (b) "zh_Hant" does exist, with
different data than "yue", so this alias is not just rewriting the base
language. -->
<!-- TODO: Find out and document this properly. -->
<forcedAlias source="zh_Hant_HK" target="yue"/>
<localeIds>
root,
// A-E
af, ak, am, ars, ar, az, be, bg, bs, ca, ccp, chr, cs, cy,
da, de_CH, de, ee, el, en_001, en_IN, en, eo, es_419, es_DO,
es_GT, es_HN, es_MX, es_NI, es_PA, es_PR, es_SV, es, es_US, et,
// F-P
fa_AF, fa, ff, fil, fi, fo, fr_BE, fr_CH, fr, ga, he, hi, hr,
hu, hy, id, in, is, it, iw, ja, ka, kk, kl, km, ko, ky, lb,
lo, lrc, lt, lv, mk, ms, mt, my, nb, ne, nl, nn, no, pl, pt_PT, pt,
// Q-Z
qu, ro, ru, se, sh, sk, sl, sq, sr_Latn, sr, su, sv, sw, ta, th, tr,
uk, vec, vi, yue_Hans, yue, zh_Hant_HK, zh_Hant, zh_HK, zh_MO, zh_TW, zh
</localeIds>
</directory>
<directory dir="brkitr" inheritLanguageSubtag="zh_Hant">
<localeIds>
root,
de, el, en, en_US_POSIX, en_US, es, fi, fr, it, ja, ko, pt, ru, sv, zh_Hant, zh
</localeIds>
</directory>
<!-- GLOBAL ALIASES -->
<!-- Some spoken languages (e.g. "ars") inherit all their data from a written language
(e.g. "ar_SA"). However CLDR doesn't currently support a way to represent that
relationship. Unlike deprecated languages for which an alias can be inferred from
the "languageAlias" CLDR data, there's no way in CLDR to represent the fact that
we want "ars" (a non-deprecated language) to inherit the data of "ar_SA".
This alias is the first example of potentially many cases where ICU needs to
generate an alias in order to affect "sideways inheritance" for spoken languages,
and at some stage it should probably be supported properly in the CLDR data. -->
<forcedAlias source="ars" target="ar_SA"/>
<!-- A legacy global alias (note that "no_NO_NY" is not even structurally valid). -->
<forcedAlias source="no_NO_NY" target="nn_NO"/>
<!-- This one is a bit silly, it is just to generate a stub for no_NO, which is
not in CLDR. If we do not do this, then including it in localeIds will generate
empty no_Latn and no_Latn_NO and then no_NO aliasing to no_Latn_NO. -->
<forcedAlias source="no_NO" target="no"/>
<!-- ALTERNATE VALUES -->
<!-- The following elements configure alternate values for some special case paths.
The target path will only be replaced if both it, and the source path, exist in
the CLDR data (paths will not be modified if only the source path exists).
Since the paths must represent the same semantic type of data, they must be in the
same "namespace" (same element names) and must not contain value attributes. Thus
they can only differ by distinguishing attributes (either added or modified).
This feature is typically used to select alternate translations (e.g. short forms)
for certain paths. -->
<!-- <altPath target="//path/to/value[@attr='foo']"
source="//path/to/value[@attr='bar']"
locales="xx,yy_ZZ"/> -->
</convert>
<generateCode cldrDir="${cldrDir}" cOutDir="${genCCodeDir}" javaOutDir="${genJavaCodeDir}" unless:true="${dontGenCode}" />
</target>
<target name="clean" depends="init-args, prepare-jar">
<taskdef name="outputDirectories" classname="org.unicode.icu.tool.cldrtoicu.ant.CleanOutputDirectoryTask">
<classpath>
<pathelement path="target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar"/>
</classpath>
</taskdef>
<taskdef name="generateCode" classname="org.unicode.icu.tool.cldrtoicu.ant.GenerateCodeTask">
<classpath>
<pathelement path="target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar"/>
</classpath>
</taskdef>
<!-- If a directory is listed here, then every file in it is assumed to be automatically
generated by the conversion tool, unless it is explicitly listed in a <retain> element.
The tool then checks every file to determine if it has the expected header present,
indiciating that it was automatically generated, before deleting it.
If unexpected files are found, the "clean" task will fail without deleting anything
(unless'forceDelete' is set to override this). Note that even if 'forceDelete' is set,
the files listed explicitly below will never be deleted by this process.
This two-step approach minimizes the risk that the conversion process will ever
accidentally delete a manually maintained file.
-->
<outputDirectories root="${outDir}" forceDelete="${forceDelete}">
<dir name="brkitr">
<retain path="adaboost"/>
<retain path="dictionaries"/>
<retain path="lstm"/>
<retain path="rules"/>
</dir>
<dir name="coll">
<!-- Legacy files whose file names aren't supported for automatic generation.
Simple to maintain manually and unlikely to ever change again. -->
<retain path="de__PHONEBOOK.txt"/>
<retain path="de_.txt"/>
<retain path="es__TRADITIONAL.txt"/>
<retain path="es_.txt"/>
</dir>
<dir name="curr"/>
<dir name="lang"/>
<dir name="locales"/>
<dir name="misc">
<!-- Machine generated files produced by different tools.
Possibly worth moving into the new LDML conversion tool one day. -->
<retain path="currencyNumericCodes.txt"/>
<retain path="zoneinfo64.txt"/>
<!-- Project file (not ICU data), unlikely to ever be auto-generated. -->
<retain path="icudata.rc"/>
<!-- Small high-level metadata file, stable and easy to maintain manually. -->
<retain path="icustd.txt"/>
</dir>
<dir name="rbnf"/>
<dir name="region"/>
<dir name="translit">
<!-- Small, easy to maintain, special case top-level files. -->
<retain path="en.txt"/>
<retain path="el.txt"/>
</dir>
<dir name="unit"/>
<dir name="zone">
<!-- Manually edited to support TZ database name compatibility. -->
<retain path="tzdbNames.txt"/>
</dir>
</outputDirectories>
<generateCode cOutDir="${genCCodeDir}" javaOutDir="${genJavaCodeDir}" action="clean" />
</target>
</project>

View file

@ -0,0 +1,295 @@
<!-- © 2019 and later: Unicode, Inc. and others.
License & terms of use: http://www.unicode.org/copyright.html -->
<config>
<convert>
<!-- The primary set of locale IDs to be generated by default. The IDs in this list are
automatically expanded to include default scripts and all available regions. The
rules are:
1) Base languages are expanded to include default scripts (e.g. "en" -> "en_Latn").
2) All region and variant subtags are added for any base language or language+script
(e.g. "en" -> "en_GB" or "shi_Latn" -> "shi_Latn_MA").
If a non-default script is desired it should be listed explicitly (e.g. "sr_Latn").
Locale IDs with deprecated subtags (which become aliases) must still be listed in
full (e.g. "en_RH" or "sr_Latn_YU").
-->
<localeIds>
// A
af, agq, ak, am, ar, ars, as, asa, ast, az, az_AZ, az_Cyrl
// B
bas, be, bem, bez, bg, bgc, bho, blo, bm, bn, bo, br, brx, bs, bs_BA, bs_Cyrl
// C
ca, ccp, ce, ceb, cgg, chr, ckb, cs, csw, cv, cy
// D
da, dav, de, dje, doi, dsb, dua, dyo, dz
// E
ebu, ee, el, en, en_NH, en_RH, eo, es, et, eu, ewo
// F
fa, ff, ff_Adlm, ff_CM, ff_GN, ff_MR, ff_SN, fi, fil, fo, fr, fur, fy
// G
ga, gaa, gd, gl, gsw, gu, guz, gv
// H
ha, haw, he, hi, hi_Latn, hr, hsb, hu, hy
// I
ia, id, ie, ig, ii, in, in_ID, is, it, iw, iw_IL
// J
ja, jgo, jmc, jv
// K
ka, kab, kam, kde, kea, kgp, khq, ki, kk, kkj, kl, kln, km, kn, ko, kok, kok_Latn, ks
ks_Deva, ks_IN, ksb, ksf, ksh, ku, kw, kxv, kxv_Deva, kxv_IN, kxv_Orya, kxv_Telu, ky
// L
lag, lb, lg, lij, lkt, lmo, ln, lo, lrc, lt, lu, luo, luy, lv
// M
mai, mas, mer, mfe, mg, mgh, mgo, mi, mk, ml, mn, mni, mni_IN, mo, mr, ms
mt, mua, my, mzn
// N
naq, nb, nd, nds, ne, nl, nmg, nn, nnh, no, no_NO, no_NO_NY, nqo, nso, nus, nyn
// O
oc, om, or, os
// P
pa, pa_Arab, pa_IN, pa_PK, pcm, pl, prg, ps, pt
// Q
qu
// R
raj, rm, rn, ro, rof, ru, rw, rwk
// S
sa, sah, saq, sat, sat_IN, sbp, sc, sd, sd_Deva, sd_IN, sd_PK, se, seh, ses, sg, sh, sh_BA, sh_CS, sh_YU
shi, shi_Latn, shi_MA, si, sk, sl, smn, sn, so, sq, sr, sr_BA, sr_CS, sr_Cyrl_CS, sr_Cyrl_YU, sr_Latn
sr_Latn_CS, sr_Latn_YU, sr_ME, sr_RS, sr_XK, sr_YU, st, su, su_ID, sv, sw, syr, szl
// T
ta, te, teo, tg, th, ti, tk, tl, tl_PH, tn, to, tok, tr, tt, twq, tzm
// U
ug, uk, ur, uz, uz_AF, uz_Arab, uz_Cyrl, uz_UZ
// V
vai, vai_LR, vai_Latn, vec, vi, vmw, vun
// W
wae, wo
// X
xh, xnr, xog
// Y
yav, yi, yo, yrl, yue, yue_CN, yue_HK, yue_Hans
// Z
za, zgh, zh, zh_CN, zh_HK, zh_Hant, zh_MO, zh_SG, zh_TW, zu
</localeIds>
<!-- The following elements configure directories in which a subset of the available
locales IDs should be generated. Unlike the main <localeId> element, these
filters must specify all locale IDs in full (but since they mostly select base
languages, this isn't a big deal).
As well as allowing some data directories to have a subset of available data (via
the <localeIds> element) there are also mechanisms for controlling aliasing and
the locale parent relation which allows the sharing of some ICU data in cases
where it would otherwise need to be copied. The two mechanisms are:
1: inheritLanguageSubtag: Used to rewrite the parent of a locale ID from "root" to
its language subtag (e.g. "zh_Hant" has a natural parent of "root", but to allow
some base language data to be shared it can be made to have a parent of "zh").
2: forcedAlias: Used to add aliases for specific directories in order to affect the
ICU behaviour in special cases.
Between them these mechanisms are known as "tailorings" of the affected locales. -->
<!-- TODO: Explain why these special cases are needed/different. -->
<!-- Collation data is large, but also more sharable than other data, which is why there
are a number of aliases and parent remappings for this directory. -->
<directory dir="coll" inheritLanguageSubtag="bs_Cyrl, sr_Latn, zh_Hant">
<!-- These aliases are to avoid needing to copy and maintain the same collation data
for "zh" and "yue". The maximized versions of "yue_Hans" is "yue_Hans_CN" (vs
"zh_Hans_CN"), and for "yue" it's "yue_Hant_HK" (vs "zh_Hant_HK"), so the
aliases are effectively just rewriting the base language. -->
<forcedAlias source="yue" target="zh_Hant"/>
<forcedAlias source="yue_Hant" target="zh_Hant"/>
<forcedAlias source="yue_CN" target="zh_Hans"/>
<forcedAlias source="yue_Hans" target="zh_Hans"/>
<forcedAlias source="yue_Hans_CN" target="zh_Hans"/>
<!-- TODO: Find out and document this properly. -->
<forcedAlias source="sr_ME" target="sr_Cyrl_ME"/>
<localeIds>
root,
// A-B
af, am, ars, ar, as, az, be, bg, bn, bo, br, bs_Cyrl, bs,
// C-F
ca, ceb, chr, cs, cy, da, de_AT, de, dsb, dz, ee, el, en,
en_US_POSIX, en_US, eo, es, et, fa_AF, fa, ff_Adlm, ff, fil, fi, fo, fr_CA, fr, fy,
// G-J
ga, gl, gu, ha, haw, he, hi, hr, hsb, hu, hy,
id_ID, id, ig, in, in_ID, is, it, iw_IL, iw, ja,
// K-P
ka, kk, kl, km, kn, kok, ko, ku, ky, lb, lij, lkt, ln, lo, lt, lv,
mk, ml, mn, mo, mr, ms, mt, my, nb, nb_NO, ne, nl, nn, no, no_NO, nso,
om, or, pa_IN, pa, pa_Guru, pl, ps, pt,
// R-T
ro, ru, sa, se, sh_BA, sh_CS, sh, sh_YU, si, sk, sl, smn, sq,
sr_BA, sr_Cyrl_ME, sr_Latn, sr_ME, sr_RS, sr, st, sv, sw,
ta, te, th, tk, tn, to, tr,
// U-Z
ug, uk, ur, uz, vi, wae, wo, xh, yi, yo, yue_CN, yue_Hans_CN, yue_Hans
yue_Hant, yue, zh_CN, zh_Hans, zh_Hant, zh_HK, zh_MO, zh_SG, zh_TW, zh, zu
</localeIds>
</directory>
<directory dir="rbnf">
<!-- It is not at all clear why this is being done. It's certainly not exactly the
same as above, since (a) the alias is reversed (b) "zh_Hant" does exist, with
different data than "yue", so this alias is not just rewriting the base
language. -->
<!-- TODO: Find out and document this properly. -->
<forcedAlias source="zh_Hant_HK" target="yue"/>
<localeIds>
root,
// A-E
af, ak, am, ars, ar, az, be, bg, bs, ca, ccp, chr, cs, cy,
da, de_CH, de, ee, el, en_001, en_IN, en, eo, es_419, es_DO,
es_GT, es_HN, es_MX, es_NI, es_PA, es_PR, es_SV, es, es_US, et,
// F-P
fa_AF, fa, ff, fil, fi, fo, fr_BE, fr_CH, fr, ga, he, hi, hr,
hu, hy, id, in, is, it, iw, ja, ka, kk, kl, km, ko, ky, lb,
lo, lrc, lt, lv, mk, ms, mt, my, nb, ne, nl, nn, no, pl, pt_PT, pt,
// Q-Z
qu, ro, ru, se, sh, sk, sl, sq, sr_Latn, sr, su, sv, sw, ta, th, tr,
uk, vec, vi, yue_Hans, yue, zh_Hant_HK, zh_Hant, zh_HK, zh_MO, zh_TW, zh
</localeIds>
</directory>
<directory dir="brkitr" inheritLanguageSubtag="zh_Hant">
<localeIds>
root,
de, el, en, en_US_POSIX, en_US, es, fi, fr, it, ja, ko, pt, ru, sv, zh_Hant, zh
</localeIds>
</directory>
<!-- GLOBAL ALIASES -->
<!-- Some spoken languages (e.g. "ars") inherit all their data from a written language
(e.g. "ar_SA"). However CLDR doesn't currently support a way to represent that
relationship. Unlike deprecated languages for which an alias can be inferred from
the "languageAlias" CLDR data, there's no way in CLDR to represent the fact that
we want "ars" (a non-deprecated language) to inherit the data of "ar_SA".
This alias is the first example of potentially many cases where ICU needs to
generate an alias in order to affect "sideways inheritance" for spoken languages,
and at some stage it should probably be supported properly in the CLDR data. -->
<forcedAlias source="ars" target="ar_SA"/>
<!-- A legacy global alias (note that "no_NO_NY" is not even structurally valid). -->
<forcedAlias source="no_NO_NY" target="nn_NO"/>
<!-- This one is a bit silly, it is just to generate a stub for no_NO, which is
not in CLDR. If we do not do this, then including it in localeIds will generate
empty no_Latn and no_Latn_NO and then no_NO aliasing to no_Latn_NO. -->
<forcedAlias source="no_NO" target="no"/>
<!-- ALTERNATE VALUES -->
<!-- The following elements configure alternate values for some special case paths.
The target path will only be replaced if both it, and the source path, exist in
the CLDR data (paths will not be modified if only the source path exists).
Since the paths must represent the same semantic type of data, they must be in the
same "namespace" (same element names) and must not contain value attributes. Thus
they can only differ by distinguishing attributes (either added or modified).
This feature is typically used to select alternate translations (e.g. short forms)
for certain paths. -->
<!-- <altPath target="//path/to/value[@attr='foo']"
source="//path/to/value[@attr='bar']"
locales="xx,yy_ZZ"/> -->
</convert>
<!-- If a directory is listed here, then every file in it is assumed to be automatically
generated by the conversion tool, unless it is explicitly listed in a <retain> element.
The tool then checks every file to determine if it has the expected header present,
indiciating that it was automatically generated, before deleting it.
If unexpected files are found, the "clean" task will fail without deleting anything
(unless'forceDelete' is set to override this). Note that even if 'forceDelete' is set,
the files listed explicitly below will never be deleted by this process.
This two-step approach minimizes the risk that the conversion process will ever
accidentally delete a manually maintained file.
-->
<outputDirectories root="${outDir}" forceDelete="${forceDelete}">
<dir name="brkitr">
<retain path="adaboost"/>
<retain path="dictionaries"/>
<retain path="lstm"/>
<retain path="rules"/>
</dir>
<dir name="coll">
<!-- Legacy files whose file names aren't supported for automatic generation.
Simple to maintain manually and unlikely to ever change again. -->
<retain path="de__PHONEBOOK.txt"/>
<retain path="de_.txt"/>
<retain path="es__TRADITIONAL.txt"/>
<retain path="es_.txt"/>
</dir>
<dir name="curr"/>
<dir name="lang"/>
<dir name="locales"/>
<dir name="misc">
<!-- Machine generated files produced by different tools.
Possibly worth moving into the new LDML conversion tool one day. -->
<retain path="currencyNumericCodes.txt"/>
<retain path="zoneinfo64.txt"/>
<!-- Project file (not ICU data), unlikely to ever be auto-generated. -->
<retain path="icudata.rc"/>
<!-- Small high-level metadata file, stable and easy to maintain manually. -->
<retain path="icustd.txt"/>
</dir>
<dir name="rbnf"/>
<dir name="region"/>
<dir name="translit">
<!-- Small, easy to maintain, special case top-level files. -->
<retain path="en.txt"/>
<retain path="el.txt"/>
</dir>
<dir name="unit"/>
<dir name="zone">
<!-- Manually edited to support TZ database name compatibility. -->
<retain path="tzdbNames.txt"/>
</dir>
</outputDirectories>
</config>

View file

@ -9,71 +9,60 @@
<modelVersion>4.0.0</modelVersion>
<!-- Include the parent POM file to add the CLDR API dependency. -->
<parent>
<groupId>org.unicode.icu</groupId>
<artifactId>cldr-lib</artifactId>
<version>1.0</version>
<relativePath>../lib</relativePath>
</parent>
<groupId>org.unicode.icu</groupId>
<artifactId>cldr-to-icu</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<!-- cldr/tools/ uses JDK 11, and because we depend on it we must
use the same version or above -->
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
<icu4j.version>76.1</icu4j.version>
<cldr-code.version>47.0-SNAPSHOT</cldr-code.version>
<guava.version>32.1.1-jre</guava.version>
<truth.version>1.4.4</truth.version>
<commons-cli.version>1.9.0</commons-cli.version>
</properties>
<!-- No need for <groupId> here (it's defined by the parent POM). -->
<artifactId>cldr-to-icu</artifactId>
<version>1.0-SNAPSHOT</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5.1</version>
<version>3.13.0</version>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.6.0</version>
<configuration>
<mainClass>
org.unicode.icu.tool.cldrtoicu.LdmlConverter
</mainClass>
<systemProperties>
<property>
<key>ICU_DIR</key>
<value>${project.basedir}/../../..</value>
</property>
</systemProperties>
<source>${maven.compiler.source}</source>
<target>${maven.compiler.target}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.1.1</version>
<version>3.7.1</version>
<executions>
<execution>
<phase>compile</phase>
<goals>
<goal>single</goal>
</goals>
<configuration>
<archive>
<manifest>
<mainClass>
org.unicode.icu.tool.cldrtoicu.LdmlConverter
</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</execution>
</executions>
<configuration>
<archive>
<manifest>
<mainClass>
org.unicode.icu.tool.cldrtoicu.Cldr2Icu
</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
</plugins>
</build>
@ -83,11 +72,16 @@
<dependency>
<groupId>com.ibm.icu</groupId>
<artifactId>icu4j</artifactId>
<version>76.1</version>
<!-- Note: see https://github.com/unicode-org/icu/packages/1954682/versions
for the icu4j.version tag to use. In general we should just use the latest
SNAPSHOT for the ICU version that we want, so this should only need updating
when the ICU version changes e.g. from 74.0.1, to 74.1, then to 75.0.1 -->
<version>${icu4j.version}</version>
<!-- Note: see https://github.com/unicode-org/icu/packages/1954682/versions
for the icu4j.version tag to use. In general we should just use the latest
SNAPSHOT for the ICU version that we want, so this should only need updating
when the ICU version changes e.g. from 74.0.1, to 74.1, then to 75.0.1 -->
</dependency>
<dependency>
<groupId>org.unicode.cldr</groupId>
<artifactId>cldr-code</artifactId>
<version>${cldr-code.version}</version>
</dependency>
<!-- Useful common libraries. Note that some of the code in the CLDR library is also
@ -96,36 +90,21 @@
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>30.0-jre</version>
<version>${guava.version}</version>
</dependency>
<!-- Ant: Only used for running the conversion tool, not compiling it. -->
<dependency>
<groupId>org.apache.ant</groupId>
<artifactId>ant</artifactId>
<version>1.10.11</version>
<groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId>
<version>${commons-cli.version}</version>
</dependency>
<!-- Testing only dependencies. -->
<dependency>
<groupId>com.google.truth</groupId>
<artifactId>truth</artifactId>
<version>1.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.google.truth.extensions</groupId>
<artifactId>truth-java8-extension</artifactId>
<version>1.0</version>
<version>${truth.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
<repositories>
<repository>
<id>githubcldr</id>
<name>GitHub unicode-org/icu Apache Maven Packages</name>
<url>https://maven.pkg.github.com/unicode-org/icu</url>
</repository>
</repositories>
</project>

View file

@ -0,0 +1,71 @@
// © 2024 and later: Unicode, Inc. and others.
// License & terms of use: http://www.unicode.org/copyright.html
package org.unicode.icu.tool.cldrtoicu;
import org.unicode.icu.tool.cldrtoicu.ant.CleanOutputDirectoryTask;
import org.unicode.icu.tool.cldrtoicu.ant.ConvertIcuDataTask;
import org.unicode.icu.tool.cldrtoicu.ant.GenerateCodeTask;
public class Cldr2Icu {
private final Cldr2IcuCliOptions options = new Cldr2IcuCliOptions();
private void convert() {
ConvertIcuDataTask convert = ConvertIcuDataTask.fromXml(options.xmlConfig);
convert.setCldrDir(options.cldrDataDir);
convert.setOutputDir(options.outDir);
convert.setSpecialsDir(options.specialsDir);
convert.setOutputTypes(options.outputTypes);
convert.setIcuVersion(options.icuVersion);
convert.setIcuDataVersion(options.icuDataVersion);
convert.setCldrVersion(options.cldrVersion);
convert.setMinimalDraftStatus(options.minDraftStatus);
convert.setLocaleIdFilter(options.localeIdFilter);
convert.setIncludePseudoLocales(options.includePseudoLocales);
convert.setEmitReport(options.emitReport);
convert.init();
convert.execute();
}
private void generateCode(String action) {
GenerateCodeTask generateCode = new GenerateCodeTask();
generateCode.setCldrDir(options.cldrDataDir);
generateCode.setCOutDir(options.genCCodeDir);
generateCode.setJavaOutDir(options.genJavaCodeDir);
generateCode.setAction(action);
generateCode.init();
generateCode.execute();
}
private void outputDirectories() {
CleanOutputDirectoryTask clean = CleanOutputDirectoryTask.fromXml(options.xmlConfig);
clean.setRoot(options.outDir);
clean.setForceDelete(options.forceDelete);
clean.init();
clean.execute();
}
private void clean() {
outputDirectories();
generateCode("clean");
}
private void generate() {
convert();
if (!options.dontGenCode) {
generateCode(null);
}
}
public static void main(String[] args) {
Cldr2Icu self = new Cldr2Icu();
self.options.processArgs(args);
self.clean();
self.generate();
}
}

View file

@ -0,0 +1,401 @@
// © 2024 and later: Unicode, Inc. and others.
// License & terms of use: http://www.unicode.org/copyright.html
package org.unicode.icu.tool.cldrtoicu;
import java.io.File;
import java.util.Arrays;
import java.util.StringJoiner;
import org.apache.commons.cli.CommandLine;
import org.apache.commons.cli.CommandLineParser;
import org.apache.commons.cli.DefaultParser;
import org.apache.commons.cli.HelpFormatter;
import org.apache.commons.cli.Option;
import org.apache.commons.cli.Options;
import org.unicode.icu.tool.cldrtoicu.LdmlConverter.OutputType;
import com.ibm.icu.util.VersionInfo;
class Cldr2IcuCliOptions {
private static final String HELP = "help";
private static final String HELP_DESC = "this text";
private static final String ICU_DIR = "icuDir";
private static final String ICU_DIR_DESC = "Path top level ICU directory"
+ " (containing `.git`, `icu4c`, `icu4j`, `tools` directories)";
private static final String ICU_DIR_DEFAULT = "${environ.ICU_DIR}";
String icuDir;
private static final String CLDR_DIR = "cldrDir";
private static final String CLDR_DIR_DESC = "This is the path to the to root of standard CLDR sources,"
+ " (containing `common` and `tools` directories).";
private static final String CLDR_DIR_DEFAULT = "${environ.CLDR_DIR}";
String cldrDir;
private static final String CLDR_DATA_DIR = "cldrDataDir";
private static final String CLDR_DATA_DIR_DESC = "The top-level directory for the CLDR production data"
+ " (typically the `production` directory in the staging repository)."
+ " Usually generated locally or obtained from https://github.com/unicode-org/cldr-staging/tree/main/production";
private static final String CLDR_DATA_DIR_DEFAULT = "${environ.CLDR_DATA_DIR}";
String cldrDataDir;
private static final String OUT_DIR = "outDir";
final private static String OUT_DIR_DESC = "The output directory into which to write the converted ICU data. By default"
+ " this will overwrite (without deletion) the ICU data files in this ICU release,"
+ " so it is recommended that for testing, it be set to another value.";
final private static String OUT_DIR_DEFAULT = "${icuDir}/icu4c/source/data";
String outDir;
private static final String GEN_C_CODE_DIR = "genCCodeDir";
private static final String GEN_C_CODE_DIR_DESC = "The output directory into which to write generated C/C++ code."
+ " By default this will overwrite (without deletion) the generated C/C++ files in this ICU release,"
+ " so it is recommended that for testing, it be set to another value.";
private static final String GEN_C_CODE_DIR_DEFAULT = "${icuDir}/icu4c/source";
String genCCodeDir;
private static final String GEN_JAVA_CODE_DIR = "genJavaCodeDir";
private static final String GEN_JAVA_CODE_DIR_DESC = "The output directory into which to write generated Java code."
+ " By default this will overwrite (without deletion) the generated Java files in this ICU release,"
+ " so it is recommended that for testing, it be set to another value.";
private static final String GEN_JAVA_CODE_DIR_DEFAULT = "${icuDir}/icu4j/main/core";
String genJavaCodeDir;
private static final String DONT_GEN_CODE = "dontGenCode";
private static final String DONT_GEN_CODE_DESC = "Set this to true to prevent the generation of"
+ " ICU source files";
private static final String DONT_GEN_CODE_DEFAULT = "false";
boolean dontGenCode;
private static final String SPECIALS_DIR = "specialsDir";
private static final String SPECIALS_DIR_DESC = "The directory in which the additional ICU XML data is stored.";
private static final String SPECIALS_DIR_DEFAULT = "${icuDir}/icu4c/source/data/xml";
String specialsDir;
private static final String ICU_VERSION = "icuVersion";
private static final String ICU_VERSION_DESC = "Default value for ICU version (`icuver.txt`)."
+ " Update this for each release.";
private static final String ICU_VERSION_DEFAULT = VersionInfo.ICU_VERSION.toString();
String icuVersion;
private static final String ICU_DATA_VERSION = "icuDataVersion";
private static final String ICU_DATA_VERSION_DESC = "Default value for ICU data version (`icuver.txt`)."
+ " Update this for each release.";
private static final String ICU_DATA_VERSION_DEFAULT = VersionInfo.ICU_DATA_VERSION.toString();
String icuDataVersion;
private static final String CLDR_VERSION = "cldrVersion";
private static final String CLDR_VERSION_DESC = "An override for the CLDR version string (`icuver.txt` and others)."
+ " This will be extracted from the CLDR library used for building the data if not set here.";
private static final String CLDR_VERSION_DEFAULT = "";
String cldrVersion;
private static final String MIN_DRAFT_STATUS = "minDraftStatus";
private static final String MIN_DRAFT_STATUS_DESC = "The minimum draft status for CLDR data to be used in the conversion."
+ " See CldrDraftStatus for more details.";
private static final String MIN_DRAFT_STATUS_DEFAULT = "CONTRIBUTED";
String minDraftStatus;
private static final String LOCALE_ID_FILTER = "localeIdFilter";
private static final String LOCALE_ID_FILTER_DESC = "A regular expression to match the locale IDs to be generated"
+ " (useful for debugging specific regions). This is applied after locale ID specifications"
+ " have been expanded into full locale IDs, so the value `en` will NOT match `en_GB` or `en_001` etc.";
private static final String LOCALE_ID_FILTER_DEFAULT = "";
String localeIdFilter;
private static final String INCLUDE_PSEUDO_LOCALES = "includePseudoLocales";
private static final String INCLUDE_PSEUDO_LOCALES_DESC = "Whether to synthetically generate \"pseudo locale\" data"
+ " (`en_XA` and `ar_XB`).";
private static final String INCLUDE_PSEUDO_LOCALES_DEFAULT = "false";
boolean includePseudoLocales;
private static final String EMIT_REPORT = "emitReport";
private static final String EMIT_REPORT_DESC = "Whether to emit a debug report containing some possibly"
+ " useful information after the conversion has finished.";
private static final String EMIT_REPORT_DEFAULT = "false";
boolean emitReport;
private static final String OUTPUT_TYPES = "outputTypes";
private static final String OUTPUT_TYPES_DESC = "List of output \"types\" to be generated (e.g. `rbnf,plurals,locales`);"
+ " an empty list means \"build everything\".\n"
+ "Note that the grouping of types is based on the legacy converter behaviour and"
+ " is not always directly associated with an output directory (e.g. \"locales\") produces locale data"
+ " for `curr/`, `lang/`, `main/`, `region/`, `unit/`, `zone/` but NOT `coll/`, `brkitr/` or `rbnf/`).\n"
// It would be nice to initialize this from OutputType, but to do that we need to read an XML file,
// so we need to know what the cldrDir folder is. But we only know that AFTER we parse the command line.
+ "Use outputTypesList to get a list of currently know values.";
private static final String OUTPUT_TYPES_DEFAULT = "";
String outputTypes;
private static final String OUTPUT_TYPES_LIST = "outputTypesList";
private static final String OUTPUT_TYPES_LIST_DESC = "Show the complete list of knonw output types and exit.";
private static final String OUTPUT_TYPES_LIST_DEFAULT = "false";
private static final String FORCE_DELETE = "forceDelete";
private static final String FORCE_DELETE_DESC = "Override to force the 'clean' task to delete files it cannot"
+ " determine to be auto-generated by this tool. This is useful if the file header changes since"
+ " the heading is what's used to recognize auto-generated files.";
private static final String FORCE_DELETE_DEFAULT = "false";
boolean forceDelete;
private static final String XML_CONFIG = "xmlConfig";
private static final String XML_CONFIG_DESC = "Override to force the 'clean' task to delete files it cannot"
+ " determine to be auto-generated by this tool. This is useful if the file header changes since"
+ " the heading is what's used to recognize auto-generated files.";
private static final String XML_CONFIG_DEFAULT = "${icuDir}/tools/cldr/cldr-to-icu/config.xml";
String xmlConfig;
// These must be kept in sync with getOptions().
private static final Options options = new Options()
.addOption(Option.builder()
.longOpt(HELP)
.desc(HELP_DESC)
.build())
.addOption(Option.builder()
.longOpt(ICU_DIR)
.hasArg()
.argName("path")
.desc(descWithDefault(ICU_DIR_DESC, ICU_DIR_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(CLDR_DIR)
.hasArg()
.argName("path")
.desc(descWithDefault(CLDR_DIR_DESC, CLDR_DIR_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(CLDR_DATA_DIR)
.hasArg()
.argName("path")
.desc(descWithDefault(CLDR_DATA_DIR_DESC, CLDR_DATA_DIR_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(OUT_DIR)
.hasArg()
.argName("path")
.desc(descWithDefault(OUT_DIR_DESC, OUT_DIR_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(GEN_C_CODE_DIR)
.hasArg()
.argName("path")
.desc(descWithDefault(GEN_C_CODE_DIR_DESC, GEN_C_CODE_DIR_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(GEN_JAVA_CODE_DIR)
.hasArg()
.argName("path")
.desc(descWithDefault(GEN_JAVA_CODE_DIR_DESC, GEN_JAVA_CODE_DIR_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(DONT_GEN_CODE)
.desc(descWithDefault(DONT_GEN_CODE_DESC, DONT_GEN_CODE_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(SPECIALS_DIR)
.hasArg()
.argName("path")
.desc(descWithDefault(SPECIALS_DIR_DESC, SPECIALS_DIR_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(OUTPUT_TYPES)
.hasArg()
.argName("out_types")
.desc(descWithDefault(OUTPUT_TYPES_DESC, OUTPUT_TYPES_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(OUTPUT_TYPES_LIST)
.desc(descWithDefault(OUTPUT_TYPES_LIST_DESC, OUTPUT_TYPES_LIST_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(ICU_VERSION)
.hasArg()
.argName("version")
.desc(descWithDefault(ICU_VERSION_DESC, ICU_VERSION_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(ICU_DATA_VERSION)
.hasArg()
.argName("version")
.desc(descWithDefault(ICU_DATA_VERSION_DESC, ICU_DATA_VERSION_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(CLDR_VERSION)
.hasArg()
.argName("version")
.desc(descWithDefault(CLDR_VERSION_DESC, CLDR_VERSION_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(MIN_DRAFT_STATUS)
.hasArg()
.argName("draft_status")
.desc(descWithDefault(MIN_DRAFT_STATUS_DESC, MIN_DRAFT_STATUS_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(LOCALE_ID_FILTER)
.hasArg()
.argName("locale_list")
.desc(descWithDefault(LOCALE_ID_FILTER_DESC, LOCALE_ID_FILTER_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(INCLUDE_PSEUDO_LOCALES)
.desc(descWithDefault(INCLUDE_PSEUDO_LOCALES_DESC, INCLUDE_PSEUDO_LOCALES_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(EMIT_REPORT)
.desc(descWithDefault(EMIT_REPORT_DESC, EMIT_REPORT_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(FORCE_DELETE)
.desc(descWithDefault(FORCE_DELETE_DESC, FORCE_DELETE_DEFAULT))
.build())
.addOption(Option.builder()
.longOpt(XML_CONFIG)
.hasArg()
.argName("path")
.desc(descWithDefault(XML_CONFIG_DESC, XML_CONFIG_DEFAULT))
.build())
;
void processArgs(String[] args) {
CommandLine cli = null;
try{
CommandLineParser parser = new DefaultParser();
cli = parser.parse(options, args);
} catch (Exception e){
cli = CommandLine.builder().build();
showUsageAndExit();
}
if (cli.hasOption(HELP)) {
showUsageAndExit();
}
icuDir = cli.getOptionValue(ICU_DIR, icuDir);
cldrDir = cli.getOptionValue(CLDR_DIR, cldrDir);
cldrDataDir = cli.getOptionValue(CLDR_DATA_DIR, cldrDataDir);
outDir = cli.getOptionValue(OUT_DIR, expandFolders(OUT_DIR_DEFAULT));
genCCodeDir = cli.getOptionValue(GEN_C_CODE_DIR, expandFolders(GEN_C_CODE_DIR_DEFAULT));
genJavaCodeDir = cli.getOptionValue(GEN_JAVA_CODE_DIR, expandFolders(GEN_JAVA_CODE_DIR_DEFAULT));
dontGenCode = cli.hasOption(DONT_GEN_CODE);
specialsDir = cli.getOptionValue(SPECIALS_DIR, expandFolders(SPECIALS_DIR_DEFAULT));
outputTypes = cli.getOptionValue(OUTPUT_TYPES, ""); // empty means all
icuVersion = cli.getOptionValue(ICU_VERSION, ICU_VERSION_DEFAULT);
icuDataVersion = cli.getOptionValue(ICU_DATA_VERSION, ICU_DATA_VERSION_DEFAULT);
cldrVersion = cli.getOptionValue(CLDR_VERSION, CLDR_VERSION_DEFAULT);
minDraftStatus = cli.getOptionValue(MIN_DRAFT_STATUS, MIN_DRAFT_STATUS_DEFAULT);
localeIdFilter = cli.getOptionValue(LOCALE_ID_FILTER, LOCALE_ID_FILTER_DEFAULT);
includePseudoLocales = cli.hasOption(INCLUDE_PSEUDO_LOCALES);
emitReport = cli.hasOption(EMIT_REPORT);
forceDelete = cli.hasOption(FORCE_DELETE);
xmlConfig = cli.getOptionValue(XML_CONFIG, expandFolders(XML_CONFIG_DEFAULT));
if (cli.hasOption(OUTPUT_TYPES_LIST)) {
OutputType[] outTypesToSort = OutputType.values();
Arrays.sort(outTypesToSort, (o1, o2) -> o1.name().compareTo(o2.name()));
StringJoiner strOutType = new StringJoiner(", ");
for (OutputType ot : outTypesToSort) {
strOutType.add(ot.name());
}
System.out.println("Known output types: " + strOutType);
System.exit(2);
}
}
private static String descWithDefault(String description, String defaultValue) {
if (defaultValue != null) {
return description + "\nDefaults to: \"" + defaultValue + "\"";
} else {
return description;
}
}
private void showUsageAndExit() {
String thisClassName = Cldr2Icu.class.getCanonicalName();
HelpFormatter formatter = new HelpFormatter();
formatter.printHelp(
/*width*/ 120,
/*cmdLineSyntax*/ thisClassName + " [OPTIONS]\n",
/*header*/ "\n"
+ "This program is used to convert CLDR xml files to ICU ResourceBundle txt files.\n"
+ "Options:",
options,
/*footer*/ "\nExample: " + thisClassName + " --outDir /tmp/debug --localeIdFilter=fr");
System.exit(-1);
}
Cldr2IcuCliOptions() {
// This will initialize icuDir, cldrDir, and cldrDataDir from environment variables
validateEnvironment();
}
String expandFolders(String str) {
return str
.replace("${icuDir}", icuDir)
.replace("${cldrDir}", cldrDir)
.replace("${cldrDataDir}", cldrDataDir);
}
// For certain things we want to check both the environment, and Java properties
// (passed with -Dkey=value)
// The property takes precedence.
private static String getEnvironOrProperty(String key) {
String result = System.getProperty(key);
if (result == null) {
result = System.getenv(key);
}
return result;
}
// Check that the environment variables point to the proper `icu` / `cldr` / `cldr-staging` folders
private void validateEnvironment() {
icuDir = getEnvironOrProperty("ICU_DIR");
cldrDir = getEnvironOrProperty("CLDR_DIR");
cldrDataDir = getEnvironOrProperty("CLDR_DATA_DIR");
String icuMessage = "Set the ICU_DIR environment variable to the top level ICU directory (containing `.git`, `icu4c`, `icu4j`, `tools` directories)";
String cldrMessage = "Set the CLDR_DIR environment variable to the top level CLDR directory (containing `common` and `tools` directories)";
String cldrDataMessage = "Set the CLDR_DATA_DIR environment variable to the top level CLDR production data directory (typically the `production` directory in the staging repository)\n"
+ "Usually generated locally or obtained from: https://github.com/unicode-org/cldr-staging/tree/main/production";
if (icuDir == null) {
System.err.println(icuMessage);
System.exit(1);
}
if (cldrDir == null) {
System.err.println(cldrMessage);
System.exit(1);
}
if (cldrDataDir == null) {
System.err.println(cldrDataMessage);
System.exit(1);
}
if (!new File(icuDir).isDirectory()
|| ! new File(icuDir, "icu4c").isDirectory()
|| ! new File(icuDir, "icu4j").isDirectory()
|| ! new File(icuDir, "tools/cldr/cldr-to-icu").isDirectory()
|| ! new File(icuDir, "tools/cldr/cldr-to-icu/pom.xml").isFile()) {
System.err.println("The `" + icuDir + "` directory does not look like a valid icu root.");
System.err.println(icuMessage);
System.exit(1);
}
if (!new File(cldrDir).isDirectory()
|| ! new File(cldrDir, "tools/cldr-code").isDirectory()
|| ! new File(cldrDir, "tools/cldr-code/pom.xml").isFile()) {
System.err.println("The `" + cldrDir + "` directory does not look like a valid cldr root.");
System.err.println(cldrMessage);
System.exit(1);
}
if (!new File(cldrDataDir).isDirectory()
|| ! new File(cldrDataDir, "common/supplemental").isDirectory()
|| ! new File(cldrDataDir, "common/main").isDirectory()
|| ! new File(cldrDataDir, "common/main/en.xml").isFile()) {
System.err.println("The `" + cldrDataDir + "` directory does not look like a valid cldr-staging/ root.");
System.err.println(cldrDataMessage);
System.exit(1);
}
// The cldr-code library checks for CLDR_DIR in the Java properties.
// So if we got cldrDir from or from environment or command line we update the property.
System.setProperty("CLDR_DIR", cldrDir);
}
}

View file

@ -179,7 +179,6 @@ final class IcuDataDumper {
LineMatch match = LineType.match(line, inBlockComment);
checkState(match.getType().isValidTransitionFrom(lastType),
"invalid state transition: %s --//-> %s", lastType, match.getType());
boolean isEndOfWrappedValue = false;
switch (match.getType()) {
case COMMENT:
if (name != null) {

View file

@ -11,6 +11,7 @@ import static java.util.stream.Collectors.joining;
import static java.util.stream.Collectors.partitioningBy;
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
@ -28,9 +29,14 @@ import java.util.TreeSet;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import org.apache.tools.ant.BuildException;
import org.apache.tools.ant.Task;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.unicode.icu.tool.cldrtoicu.LdmlConverterConfig.IcuLocaleDir;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import com.google.common.base.CharMatcher;
import com.google.common.collect.ImmutableList;
@ -38,7 +44,6 @@ import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Iterables;
import com.google.common.io.CharStreams;
// Note: Auto-magical Ant methods are listed as "unused" by IDEs, unless the warning is suppressed.
public final class CleanOutputDirectoryTask extends Task {
private static final ImmutableSet<String> ALLOWED_DIRECTORIES =
Stream
@ -58,8 +63,7 @@ public final class CleanOutputDirectoryTask extends Task {
// header without it (since that's the old behaviour).
// Once there's been an ICU release with this line included in the headers of all data
// files, we can remove the fallback and just test for this line and nothing else.
private static final String WAS_GENERATED_LABEL =
"Generated using tools/cldr/cldr-to-icu/build-icu-data.xml";
private static final String WAS_GENERATED_LABEL = "Generated using tools/cldr/cldr-to-icu/";
// The number of header lines to check before giving up if we don't find the generated
// label.
@ -84,9 +88,8 @@ public final class CleanOutputDirectoryTask extends Task {
public static final class Retain extends Task {
private Path path = null;
// Don't use "Path" for the argument type because that always makes an absolute path (e.g.
// relative to the working directory for the Ant task). We want relative paths.
@SuppressWarnings("unused")
// Don't use "Path" for the argument type because that always makes an absolute path
// (e.g. relative to the working directory). We want relative paths.
public void setPath(String path) {
Path p = Paths.get(path).normalize();
checkBuild(!p.isAbsolute() && !p.startsWith(".."), "invalid path: %s", path);
@ -103,14 +106,12 @@ public final class CleanOutputDirectoryTask extends Task {
private String name;
private final Set<Path> retained = new HashSet<>();
@SuppressWarnings("unused")
public void setName(String name) {
checkBuild(ALLOWED_DIRECTORIES.contains(name),
"unknown directory name '%s'; allowed values: %s", name, ALLOWED_DIRECTORIES);
this.name = name;
}
@SuppressWarnings("unused")
public void addConfiguredRetain(Retain retain) {
retained.add(retain.path);
}
@ -121,18 +122,15 @@ public final class CleanOutputDirectoryTask extends Task {
}
}
@SuppressWarnings("unused")
public void setRoot(String root) {
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
this.root = Paths.get(root);
}
@SuppressWarnings("unused")
public void setForceDelete(boolean forceDelete) {
this.forceDelete = forceDelete;
}
@SuppressWarnings("unused")
public void addConfiguredDir(Dir dir) {
outputDirs.add(dir);
}
@ -255,7 +253,7 @@ public final class CleanOutputDirectoryTask extends Task {
fileReader.reset();
}
boolean isLenientHeaderMatchSoFar = true;
for (int n = 0; n < MAX_HEADER_CHECK_LINES ; n++) {
for (int n = 0; n < MAX_HEADER_CHECK_LINES; n++) {
String line = fileReader.readLine();
// True if we have processed the header, not including the trailing generated label.
boolean headerIsProcessed = n >= headerLines.size() - 1;
@ -340,4 +338,77 @@ public final class CleanOutputDirectoryTask extends Task {
throw new RuntimeException("cannot read resource: " + name, e);
}
}
private static Retain getRetain(Element elem) {
if (!"retain".equals(elem.getTagName())) {
return null;
}
String path = elem.getAttribute("path");
Retain retain = new Retain();
retain.setPath(path);
return retain;
}
private static Dir getDirectory(Element element) {
if (!"dir".equals(element.getTagName())) {
return null;
}
String name = element.getAttribute("name");
Dir dir = new Dir();
dir.setName(name);
Node node = element.getFirstChild();
while (node != null) {
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element childElement = (Element) node;
switch (childElement.getTagName()) {
case "retain":
Retain retain = getRetain(childElement);
dir.addConfiguredRetain(retain);
break;
default:
}
}
node = node.getNextSibling();
}
return dir;
}
public static CleanOutputDirectoryTask fromXml(String fileName) {
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new File(fileName));
Element root = doc.getDocumentElement();
if (!"config".equals(root.getTagName())) {
System.err.println("The root of the config file should be <config>");
return null;
}
NodeList outputDirectories = root.getElementsByTagName("outputDirectories");
if (outputDirectories.getLength() != 1) {
System.err.println("Exactly one <outputDirectories> element allowed and required");
return null;
}
CleanOutputDirectoryTask cleaner = new CleanOutputDirectoryTask();
Node node = outputDirectories.item(0).getFirstChild();
while (node != null) {
if (node instanceof Element) {
Element childElement = (Element) node;
String nodeName = childElement.getTagName();
switch (nodeName) {
case "dir":
Dir dir = getDirectory(childElement);
cleaner.addConfiguredDir(dir);
break;
default:
break;
}
}
node = node.getNextSibling();
}
return cleaner;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
}

View file

@ -15,6 +15,7 @@ import static com.google.common.collect.Tables.immutableCell;
import static java.util.stream.Collectors.joining;
import static org.unicode.cldr.api.CldrPath.parseDistinguishingPath;
import java.io.File;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
@ -25,8 +26,9 @@ import java.util.function.Predicate;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import org.apache.tools.ant.BuildException;
import org.apache.tools.ant.Task;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.unicode.cldr.api.CldrDataSupplier;
import org.unicode.cldr.api.CldrDraftStatus;
import org.unicode.cldr.api.CldrPath;
@ -38,6 +40,10 @@ import org.unicode.icu.tool.cldrtoicu.LdmlConverter.OutputType;
import org.unicode.icu.tool.cldrtoicu.LdmlConverterConfig.IcuLocaleDir;
import org.unicode.icu.tool.cldrtoicu.PseudoLocales;
import org.unicode.icu.tool.cldrtoicu.SupplementalData;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import com.google.common.base.Ascii;
import com.google.common.base.CaseFormat;
@ -53,10 +59,9 @@ import com.google.common.collect.SetMultimap;
import com.google.common.collect.Sets;
import com.google.common.collect.Table.Cell;
// Note: Auto-magical Ant methods are listed as "unused" by IDEs, unless the warning is suppressed.
public final class ConvertIcuDataTask extends Task {
private static final Splitter LIST_SPLITTER =
Splitter.on(CharMatcher.anyOf(",\n")).trimResults(whitespace()).omitEmptyStrings();
Splitter.on(CharMatcher.anyOf(",\n")).trimResults(whitespace()).omitEmptyStrings();
private static final CharMatcher DIGIT_OR_UNDERSCORE = inRange('0', '9').or(is('_'));
private static final CharMatcher UPPER_UNDERSCORE = inRange('A', 'Z').or(DIGIT_OR_UNDERSCORE);
@ -77,39 +82,32 @@ public final class ConvertIcuDataTask extends Task {
private boolean includePseudoLocales = false;
private Predicate<String> idFilter = id -> true;
@SuppressWarnings("unused")
public void setOutputDir(String path) {
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
config.setOutputDir(Paths.get(path));
}
@SuppressWarnings("unused")
public void setCldrDir(String path) {
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
this.cldrPath = checkNotNull(Paths.get(path));
}
@SuppressWarnings("unused")
public void setIcuVersion(String icuVersion) {
config.setIcuVersion(icuVersion);
}
@SuppressWarnings("unused")
public void setIcuDataVersion(String icuDataVersion) {
config.setIcuDataVersion(icuDataVersion);
}
@SuppressWarnings("unused")
public void setCldrVersion(String cldrVersion) {
config.setCldrVersion(cldrVersion);
}
@SuppressWarnings("unused")
public void setMinimalDraftStatus(String status) {
minimumDraftStatus = resolve(CldrDraftStatus.class, status);
}
@SuppressWarnings("unused")
public void setOutputTypes(String types) {
ImmutableList<OutputType> typeList =
LIST_SPLITTER
@ -121,23 +119,19 @@ public final class ConvertIcuDataTask extends Task {
}
}
@SuppressWarnings("unused")
public void setSpecialsDir(String path) {
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
config.setSpecialsDir(Paths.get(path));
}
@SuppressWarnings("unused")
public void setIncludePseudoLocales(boolean includePseudoLocales) {
this.includePseudoLocales = includePseudoLocales;
}
@SuppressWarnings("unused")
public void setLocaleIdFilter(String idFilterRegex) {
this.idFilter = Pattern.compile(idFilterRegex).asPredicate();
}
@SuppressWarnings("unused")
public void setEmitReport(boolean emit) {
config.setEmitReport(emit);
}
@ -145,7 +139,6 @@ public final class ConvertIcuDataTask extends Task {
public static final class LocaleIds extends Task {
private ImmutableSet<String> ids;
@SuppressWarnings("unused")
public void addText(String localeIds) {
this.ids = parseLocaleIds(localeIds);
}
@ -162,22 +155,18 @@ public final class ConvertIcuDataTask extends Task {
private final List<ForcedAlias> forcedAliases = new ArrayList<>();
private LocaleIds localeIds = null;
@SuppressWarnings("unused")
public void setDir(String directory) {
this.dir = resolve(IcuLocaleDir.class, directory);
}
@SuppressWarnings("unused")
public void setInheritLanguageSubtag(String localeIds) {
this.inheritLanguageSubtag = parseLocaleIds(localeIds);
}
@SuppressWarnings("unused")
public void addConfiguredForcedAlias(ForcedAlias alias) {
forcedAliases.add(alias);
}
@SuppressWarnings("unused")
public void addConfiguredLocaleIds(LocaleIds localeIds) {
checkBuild(this.localeIds == null,
"Cannot add more that one <localeIds> element for <directory>: %s", dir);
@ -195,12 +184,10 @@ public final class ConvertIcuDataTask extends Task {
private String source = "";
private String target = "";
@SuppressWarnings("unused")
public void setSource(String source) {
this.source = whitespace().trimFrom(source);
}
@SuppressWarnings("unused")
public void setTarget(String target) {
this.target = whitespace().trimFrom(target);
}
@ -217,17 +204,14 @@ public final class ConvertIcuDataTask extends Task {
private String target = "";
private ImmutableSet<String> localeIds = ImmutableSet.of();
@SuppressWarnings("unused")
public void setTarget(String target) {
this.target = target.replace('\'', '"');
}
@SuppressWarnings("unused")
public void setSource(String source) {
this.source = source.replace('\'', '"');
}
@SuppressWarnings("unused")
public void setLocales(String localeIds) {
this.localeIds = parseLocaleIds(localeIds);
}
@ -239,13 +223,11 @@ public final class ConvertIcuDataTask extends Task {
}
}
@SuppressWarnings("unused")
public void addConfiguredLocaleIds(LocaleIds localeIds) {
checkBuild(this.localeIds == null, "Cannot add more that one <localeIds> element");
this.localeIds = localeIds;
}
@SuppressWarnings("unused")
public void addConfiguredDirectory(Directory filter) {
checkState(!perDirectoryIds.containsKey(filter.dir),
"directory %s specified twice", filter.dir);
@ -289,14 +271,12 @@ public final class ConvertIcuDataTask extends Task {
}
// Aliases on the outside are applied to all directories.
@SuppressWarnings("unused")
public void addConfiguredForcedAlias(ForcedAlias alias) {
for (IcuLocaleDir dir : IcuLocaleDir.values()) {
config.addForcedAlias(dir, alias.source, alias.target);
}
}
@SuppressWarnings("unused")
public void addConfiguredAltPath(AltPath altPath) {
// Don't convert to CldrPath here (it triggers a bunch of CLDR data loading for the DTDs).
// Wait until the "execute()" method since in future we expect to use the configured CLDR
@ -304,7 +284,6 @@ public final class ConvertIcuDataTask extends Task {
altPaths.add(altPath);
}
@SuppressWarnings("unused")
public void execute() throws BuildException {
// Spin up CLDRConfig outside of other inner loops, to
// avoid static init problems seen in CLDR-14636
@ -408,4 +387,128 @@ public final class ConvertIcuDataTask extends Task {
"invalid enumeration name " + name + "; expected one of; " + validNames);
}
}
private static AltPath getAltPath(Element elem) {
if (!"altPath".equals(elem.getTagName())) {
return null;
}
String source = elem.getAttribute("source");
String target = elem.getAttribute("target");
String locales = elem.getAttribute("locales");
AltPath ap = new AltPath();
ap.setSource(source);
ap.setTarget(target);
ap.setLocales(locales);
ap.init();
return ap;
}
private static ForcedAlias getForcedAlias(Element elem) {
if (!"forcedAlias".equals(elem.getTagName())) {
return null;
}
String source = elem.getAttribute("source");
String target = elem.getAttribute("target");
ForcedAlias fa = new ForcedAlias();
fa.setSource(source);
fa.setTarget(target);
fa.init();
return fa;
}
private static LocaleIds getLocaleIds(Element elem) {
if (!"localeIds".equals(elem.getTagName())) {
return null;
}
LocaleIds localeIds = new LocaleIds();
String strLocaleIds = elem.getTextContent();
localeIds.addText(strLocaleIds);
localeIds.init();
return localeIds;
}
private static Directory getDirectory(Element element) {
if (!"directory".equals(element.getTagName())) {
return null;
}
String dir = element.getAttribute("dir");
String inheritLanguageSubtag = element.getAttribute("inheritLanguageSubtag");
Directory directory = new Directory();
directory.setDir(dir);
directory.setInheritLanguageSubtag(inheritLanguageSubtag);
Node node = element.getFirstChild();
while (node != null) {
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element childElement = (Element) node;
switch (childElement.getTagName()) {
case "localeIds":
LocaleIds localeIds = getLocaleIds(childElement);
directory.addConfiguredLocaleIds(localeIds);
break;
case "forcedAlias":
ForcedAlias fa = getForcedAlias(childElement);
directory.addConfiguredForcedAlias(fa);
break;
default:
}
}
node = node.getNextSibling();
}
if (directory.localeIds == null) {
directory.addConfiguredLocaleIds(new LocaleIds());
}
directory.init();
return directory;
}
public static ConvertIcuDataTask fromXml(String fileName) {
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new File(fileName));
Element root = doc.getDocumentElement();
if (!"config".equals(root.getTagName())) {
System.err.println("The root of the config file should be <config>");
return null;
}
NodeList convertNodes = root.getElementsByTagName("convert");
if (convertNodes.getLength() != 1) {
System.err.println("Exactly one <convert> element allowed and required");
return null;
}
ConvertIcuDataTask converter = new ConvertIcuDataTask();
Node node = convertNodes.item(0).getFirstChild();
while (node != null) {
if (node instanceof Element) {
Element childElement = (Element) node;
String nodeName = childElement.getTagName();
switch (nodeName) {
case "localeIds":
LocaleIds localeIds = getLocaleIds(childElement);
converter.addConfiguredLocaleIds(localeIds);
break;
case "directory":
Directory directory = getDirectory(childElement);
converter.addConfiguredDirectory(directory);
break;
case "forcedAlias":
ForcedAlias fa = getForcedAlias(childElement);
converter.addConfiguredForcedAlias(fa);
break;
case "altPath":
AltPath altPath = getAltPath(childElement);
converter.addConfiguredAltPath(altPath);
break;
default:
break;
}
}
node = node.getNextSibling();
}
return converter;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
}

View file

@ -12,12 +12,9 @@ import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import org.apache.tools.ant.BuildException;
import org.apache.tools.ant.Task;
import org.unicode.icu.tool.cldrtoicu.CodeGenerator;
import org.unicode.icu.tool.cldrtoicu.generator.ResourceFallbackCodeGenerator;
// Note: Auto-magical Ant methods are listed as "unused" by IDEs, unless the warning is suppressed.
public final class GenerateCodeTask extends Task {
private Path cldrPath;
private Path cOutDir;
@ -40,31 +37,26 @@ public final class GenerateCodeTask extends Task {
new GeneratedFileDef("common/localefallback_data.h", "src/main/java/com/ibm/icu/impl/LocaleFallbackData.java", new ResourceFallbackCodeGenerator()),
};
@SuppressWarnings("unused")
public void setCldrDir(String path) {
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
this.cldrPath = checkNotNull(Paths.get(path));
}
@SuppressWarnings("unused")
public void setCOutDir(String path) {
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
this.cOutDir = Paths.get(path);
}
@SuppressWarnings("unused")
public void setJavaOutDir(String path) {
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
this.javaOutDir = Paths.get(path);
}
@SuppressWarnings("unused")
public void setAction(String action) {
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
this.action = action;
}
@SuppressWarnings("unused")
public void execute() throws BuildException {
for (GeneratedFileDef task : generatedFileDefs) {
Path cOutPath = cOutDir.resolve(task.cRelativePath);
@ -91,5 +83,4 @@ public final class GenerateCodeTask extends Task {
}
}
}
}

View file

@ -0,0 +1,25 @@
// © 2024 and later: Unicode, Inc. and others.
// License & terms of use: http://www.unicode.org/copyright.html
package org.unicode.icu.tool.cldrtoicu.ant;
public class Task {
public static class BuildException extends RuntimeException {
private static final long serialVersionUID = 2430911677116799373L;
public BuildException(String message, Throwable cause) {
super(message, cause);
}
public BuildException(String message) {
super(message);
}
}
void log(String format) {
System.out.println(format);
}
public void execute() throws BuildException {}
public void init() throws BuildException {}
}

View file

@ -1,3 +1,3 @@
© 2016 and later: Unicode, Inc. and others.
License & terms of use: http://www.unicode.org/copyright.html
Generated using tools/cldr/cldr-to-icu/build-icu-data.xml
Generated using tools/cldr/cldr-to-icu/

View file

@ -4,7 +4,6 @@ package org.unicode.icu.tool.cldrtoicu;
import static com.google.common.truth.Truth.assertThat;
import static com.google.common.truth.Truth.assertWithMessage;
import static com.google.common.truth.Truth8.assertThat;
import static org.unicode.cldr.api.CldrValue.parseValue;
import java.nio.file.Path;
@ -38,7 +37,11 @@ public class SupplementalDataTest {
@BeforeClass
public static void loadRegressionData() {
Path cldrRoot = Paths.get(System.getProperty("CLDR_DIR"));
String cldrDir = System.getProperty("CLDR_DIR");
if (cldrDir == null) {
cldrDir = System.getenv("CLDR_DIR");
}
Path cldrRoot = Paths.get(cldrDir);
regressionData = SupplementalData.create(CldrDataSupplier.forCldrFilesIn(cldrRoot));
likelySubtags = new LikelySubtags();
}

View file

@ -18,7 +18,7 @@ import java.util.Arrays;
public class CleanOutputDirectoryTaskTest {
// Not using the original field since we want this test to fail if this changes unexpectedly.
private static final String WAS_GENERATED_LABEL =
"Generated using tools/cldr/cldr-to-icu/build-icu-data.xml";
"Generated using tools/cldr/cldr-to-icu/";
// Commented version of the label for test data.
private static final String WAS_GENERATED_LINE = "// " + WAS_GENERATED_LABEL;

View file

@ -135,14 +135,15 @@ public class LocaleDistanceMapperTest {
// LSR values come in (language, script, region) tuples. They are the mapped-to
// values for the likely subtag mappings, ordered by the DTD order in which the
// mapping keys were encountered.
assertThat(icuData).hasValuesFor("likely/lsrs",
"", "", "",
"skip", "script", "",
"zh", "Hans", "CN",
"zh", "Hant", "TW",
"en", "Latn", "US",
"zh", "Hant", "HK",
"zh", "Hant", "MO");
assertThat(icuData).hasValuesFor("likely/lsrnum:intvector",
"0", // "", "", ""
"1", // "skip", "script", ""
"1232236233", // "zh", "Hans", "CN"
"1254131029", // "zh", "Hant", "TW"
"429941505", // "en", "Latn", "US"
"1247517541", // "zh", "Hant", "HK"
"1249741720" // "zh", "Hant", "MO"
);
// It's a bit easier to see how match keys are grouped against the partitions.
ImmutableSetMultimap<Integer, String> likelyTrie =
@ -174,11 +175,12 @@ public class LocaleDistanceMapperTest {
// Pairs of expanded paradigm locales (using LSR tuples) in declaration order.
// This is just the list from the CLDR data with no processing.
assertThat(icuData).hasValuesFor("match/paradigms",
"en", "Latn", "US",
"en", "Latn", "GB",
"es", "Latn", "ES",
"es", "Latn", "419");
assertThat(icuData).hasValuesFor("match/paradigmnum:intvector",
"429941505", // "en", "Latn", "US"
"420631446", // "en", "Latn", "GB"
"429626712", // "es", "Latn", "ES"
"419470284" // "es", "Latn", "419"
);
// See PartitionInfoTest for a description of the ordering of these strings.
assertThat(icuData).hasValuesFor("match/partitions",

View file

@ -28,7 +28,9 @@ public class Bcp47MapperTest {
RbPath.of("typeAlias", "timezone:alias"),
RbValue.of("/ICUDATA/timezoneTypes/typeAlias/timezone"),
RbPath.of("typeMap", "timezone:alias"),
RbValue.of("/ICUDATA/timezoneTypes/typeMap/timezone"));
RbValue.of("/ICUDATA/timezoneTypes/typeMap/timezone"),
RbPath.of("ianaMap", "timezone:alias"),
RbValue.of("/ICUDATA/timezoneTypes/ianaMap/timezone"));
@Test
public void testSimple() {

View file

@ -1,101 +0,0 @@
*********************************************************************
*** © 2019 and later: Unicode, Inc. and others. ***
*** License & terms of use: http://www.unicode.org/copyright.html ***
*********************************************************************
What is this directory and why is it empty?
-------------------------------------------
This is the root of a local Maven repository which needs to be populated before
code which uses the CLDR data API can be executed.
To do this, you need to have a local copy of the CLDR project configured on your
computer and be able able to build the API jar file and copy an existing utility
jar file. In the examples below it is assumed that $CLDR_ROOT references this
CLDR release.
Setup
-----
This project relies on the Maven build tool for managing dependencies and uses
Ant for configuration purposes, so both will need to be installed. On a Debian
based system, this should be as simple as:
$ sudo apt-get install maven ant
Installing the CLDR API jar
---------------------------
From this directory:
$ ./install-cldr-jars.sh "$CLDR_DIR"
Manually installing the CLDR API jar
------------------------------------
Only follow these remaining steps if the installation script isn't suitable or
doesn't work on your system.
To regenerate the CLDR API jar you need to build the "jar" target manually
using the Maven pom.xml file in the "tools" directory of the CLDR project:
$ cd "$CLDR_ROOT/tools"
$ mvn package -DskipTests=true
This should result in the cldr-code.jar file being built into the cldr-code/target
sub-directory, which can then be installed as a Maven dependency as described above.
Updating local Maven repository
-------------------------------
To update the local Maven repository (e.g. to install the CLDR jar) then from
this directory (lib/) you should run:
$ mvn install:install-file \
-Dproject.parent.relativePath="" \
-DgroupId=org.unicode.cldr \
-DartifactId=cldr-api \
-Dversion=0.1-SNAPSHOT \
-Dpackaging=jar \
-DgeneratePom=true \
-DlocalRepositoryPath=. \
-Dfile="$CLDR_ROOT/tools/cldr-code/target/cldr-code.jar"
And if you have updated one of these libraries then from this directory run:
$ mvn dependency:purge-local-repository \
-Dproject.parent.relativePath="" \
-DmanualIncludes=org.unicode.cldr:cldr-api:jar
After doing this, you should see something like the following list of files in
this directory:
README.txt <-- this file
org/unicode/cldr/cldr-api/maven-metadata-local.xml
org/unicode/cldr/cldr-api/0.1-SNAPSHOT/maven-metadata-local.xml
org/unicode/cldr/cldr-api/0.1-SNAPSHOT/cldr-api-0.1-SNAPSHOT.pom
org/unicode/cldr/cldr-api/0.1-SNAPSHOT/cldr-api-0.1-SNAPSHOT.jar
Finally, if you choose to update the version number of the snapshot, then also
update all the the pom.xml files which reference it (but this is unlikely to be
necessary).
Troubleshooting
---------------
While the Maven system should keep the CLDR JAR up to date, there is a chance
that you may have an out of date JAR installed elsewhere. If you have any
issues with the JAR not being the expected version (e.g. after making changes)
then run the above "purge" step again, from this directory.
This should re-resolve the current JAR snapshot from the repository in this
directory. Having purged the Maven cache, next time you build a project, you
should see something like:
[exec] Downloading from <xxx>: <url>/org/unicode/cldr/cldr-api/0.1-SNAPSHOT/maven-metadata.xml
[exec] [INFO] Building jar: <path-to-icu-root>/tools/cldr/cldr-to-icu/target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar
This shows that it has had to re-fetch the JAR file.

View file

@ -1,102 +0,0 @@
#!/bin/bash -u
#
#####################################################################
### © 2020 and later: Unicode, Inc. and others. ###
### License & terms of use: http://www.unicode.org/copyright.html ###
#####################################################################
#
# This script will attempt to build and install the necessary CLDR JAR files
# from a given CLDR installation root directory. The JAR files are installed
# according to the manual instructions given in README.txt and lib/README.txt.
#
# The user must have installed both 'ant' and 'maven' in accordance with the
# instructions in README.txt before attempting to run this script.
#
# Usage (from the directory of this script):
#
# ./install-cldr-jars.sh <CLDR-root-directory>
#
# Note to maintainers: This script cannot be assumed to run on a Unix/Linux
# based system, and while a Posix compliant bash shell is required, any
# assumptions about auxiliary Unix tools should be minimized (e.g. things
# like "dirname" or "tempfile" may not exist). Where bash-only alternatives
# have to be used, they should be clearly documented.
# Exit with a message for fatal errors.
function die() {
echo "$1"
echo "Exiting..."
exit 1
} >&2
# Runs a given command and captures output to the global log file.
# If a command errors, the user can then view the log file.
function run_with_logging() {
echo >> "${LOG_FILE}"
echo "Running: ${@}" >> "${LOG_FILE}"
echo -- "----------------------------------------------------------------" >> "${LOG_FILE}"
"${@}" >> "${LOG_FILE}" 2>&1
if (( $? != 0 )) ; then
echo -- "---- Previous command failed ----" >> "${LOG_FILE}"
echo "Error running: ${@}"
read -p "Show log file? " -n 1 -r
echo
if [[ "${REPLY}" =~ ^[Yy]$ ]] ; then
less -RX "${LOG_FILE}"
fi
echo "Log file: ${LOG_FILE}"
exit 1
fi
echo -- "---- Previous command succeeded ----" >> "${LOG_FILE}"
}
# First require that we are run from the same directory as the script.
# Can't assume users have "dirname" available so hack it a bit with shell
# substitution (if no directory path was prepended, SCRIPT_DIR==$0).
SCRIPT_DIR=${0%/*}
if [[ "$SCRIPT_DIR" != "$0" ]] ; then
cd $SCRIPT_DIR
fi
# Check for some expected environmental things early.
which ant > /dev/null || die "Cannot find Ant executable 'ant' in the current path."
which mvn > /dev/null || die "Cannot find Maven executable 'mvn' in the current path."
# Check there's one argument that points at a directory (or a symbolic link to a directory).
(( $# == 1 )) && [[ -d "$1" ]] || die "Usage: ./install-cldr-jars.sh <CLDR-root-directory>"
# Set up a log file (and be nice about tidying it up).
# Cannot assume "tempfile" exists so use a timestamp (we expect "date" to exist though).
LOG_FILE="${TMPDIR:-/tmp}/cldr2icu_log_$(date '+%m%d_%H%M%S').txt"
touch $LOG_FILE || die "Cannot create temporary file: ${LOG_FILE}"
echo -- "---- LOG FILE ---- $(date '+%F %T') ----" >> "${LOG_FILE}"
# Build the cldr-code.jar in the cldr-code/target subdirectory of the CLDR tools directory.
CLDR_TOOLS_DIR="$1/tools"
pushd "${CLDR_TOOLS_DIR}" > /dev/null || die "Cannot change directory to: ${CLDR_TOOLS_DIR}"
echo "Building CLDR JAR file..."
run_with_logging mvn package -DskipTests=true
[[ -f "cldr-code/target/cldr-code.jar" ]] || die "Error creating cldr-code.jar file"
popd > /dev/null
# The -B flag is "batch" mode and won't mess about with escape codes in the log file.
echo "Installing CLDR JAR file..."
run_with_logging mvn -B install:install-file \
-Dproject.parent.relativePath="" \
-DgroupId=org.unicode.cldr \
-DartifactId=cldr-api \
-Dversion=0.1-SNAPSHOT \
-Dpackaging=jar \
-DgeneratePom=true \
-DlocalRepositoryPath=. \
-Dfile="${CLDR_TOOLS_DIR}/cldr-code/target/cldr-code.jar"
echo "Syncing local Maven repository..."
run_with_logging mvn -B dependency:purge-local-repository \
-Dproject.parent.relativePath="" \
-DmanualIncludes=org.unicode.cldr:cldr-api:jar
echo "All done!"
echo "Log file: ${LOG_FILE}"

View file

@ -1,53 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!-- © 2020 and later: Unicode, Inc. and others.
License & terms of use: http://www.unicode.org/copyright.html
See README.txt for instructions on updating the local repository.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<!-- This POM file acts as a parent POM file for any tool which is built
via Maven and requires access to the CLDR data APIs. This POM file
and the other files in this directory encapsulate the somewhat messy
task of including the Ant-built CLDR JAR file in Maven projects. -->
<!-- Declares this to be a POM that's included by other POM files. -->
<packaging>pom</packaging>
<!-- This must match any child POM file's <parent> declaration. -->
<groupId>org.unicode.icu</groupId>
<artifactId>cldr-lib</artifactId>
<version>1.0</version>
<!-- Important: The "${project.basedir}" property is the directory of the
child POM file, not this directory (and there's no easy way in Maven
to identify the absolute path of a parent POM file). However since
child POM files should have a <parent> declaration with the relative
path in it, we can use that. Note however that this is a bit fragile
and relies on <relativePath> being a directory, not a POM file.
In order to allow the local repository to work either when it is used
by a child POM file or when it's used directly (e.g. for installing
or purging the cache) when it is invoked from this directory, the
-Dproject.parent.relativePath=""
argument must be given. -->
<repositories>
<repository>
<id>local-maven-repo</id>
<url>file://${project.basedir}/${project.parent.relativePath}</url>
</repository>
</repositories>
<!-- Ant-built JAR file(s) installed into the local Maven repository in this
directory by the 'install-cldr-jars.sh' script. -->
<dependencies>
<dependency>
<groupId>org.unicode.cldr</groupId>
<artifactId>cldr-api</artifactId>
<version>0.1-SNAPSHOT</version>
</dependency>
</dependencies>
</project>