mirror of
https://github.com/unicode-org/icu.git
synced 2025-04-04 21:15:35 +00:00
ICU-22773 Migrate the CLDR conversion tool to Maven
This commit is contained in:
parent
3b9c0fc4a5
commit
2fa8a0908c
32 changed files with 1347 additions and 1114 deletions
2
.github/adaboost.json
vendored
2
.github/adaboost.json
vendored
|
@ -1,6 +1,6 @@
|
|||
// © 2022 and later: Unicode, Inc. and others.
|
||||
// License & terms of use: http://www.unicode.org/copyright.html
|
||||
// Generated using tools/cldr/cldr-to-icu/build-icu-data.xml
|
||||
// Generated using tools/cldr/cldr-to-icu/
|
||||
//
|
||||
// Include Japanese adaboost model.
|
||||
{
|
||||
|
|
2
.github/lstm_for_th_my.json
vendored
2
.github/lstm_for_th_my.json
vendored
|
@ -1,6 +1,6 @@
|
|||
// © 2021 and later: Unicode, Inc. and others.
|
||||
// License & terms of use: http://www.unicode.org/copyright.html
|
||||
// Generated using tools/cldr/cldr-to-icu/build-icu-data.xml
|
||||
// Generated using tools/cldr/cldr-to-icu/
|
||||
//
|
||||
// Include Burmese and Thai lstm models.
|
||||
{
|
||||
|
|
|
@ -27,8 +27,8 @@ All Rights Reserved.
|
|||
# Intro and setup
|
||||
|
||||
These instructions describe how to regenerate ICU4C locale and linguistic data from CLDR,
|
||||
and then how to convert that ICU4 data for ICU4J (data jars and maven resources).
|
||||
They apply to CLDR 44 / ICU 74 and later.
|
||||
and then how to convert that ICU4C data for ICU4J (data jars and maven resources).
|
||||
They apply to CLDR 47 / ICU 77 and later.
|
||||
|
||||
To use these instructions just for generating ICU4J data from ICU4C, you only need to use
|
||||
steps 1, 8, and 12 in the Process section.
|
||||
|
@ -37,22 +37,26 @@ The full process requires local copies of
|
|||
|
||||
* CLDR (the source of most of the data, and some Java tools)
|
||||
* The complete ICU source tree, including:
|
||||
* tools: includes the LdmlConverter build tool and associated config files
|
||||
* icu4c: the target for converted CLDR data, and source for ICU4J data; includes tests for the converted data
|
||||
* icu4j: the target for updated data jars; includes tests for the converted data
|
||||
* `tools`: includes the `LdmlConverter` build tool and associated config files
|
||||
* `icu4c`: the target for converted CLDR data, and source for ICU4J data; includes tests for the converted data
|
||||
* `icu4j`: the target for updated data jars; includes tests for the converted data
|
||||
|
||||
For an official CLDR data integration into ICU, these should be clean, freshly
|
||||
checked-out. For released CLDR sources, an alternative to checking out sources
|
||||
for a given version is downloading the zipped sources for the common (core.zip)
|
||||
and tools (tools.zip) directory subtrees from the Data column in
|
||||
for a given version is downloading the zipped sources for the common (`core.zip`)
|
||||
and tools (`tools.zip`) directory subtrees from the Data column in
|
||||
[CLDR Releases/Downloads](https://cldr.unicode.org/index/downloads)
|
||||
|
||||
Besides a standard JDK, the process also requires [ant](https://ant.apache.org) and
|
||||
Besides a standard JDK 11+, the process also requires [ant](https://ant.apache.org) and
|
||||
[maven](https://maven.apache.org) plus the xml-apis.jar from the
|
||||
[Apache xalan package](https://xalan.apache.org/xalan-j/downloads.html) _(Is this
|
||||
latter requirement still true?)_. You will also need to have performed the
|
||||
latter requirement still true?)_.
|
||||
|
||||
If you do CLDR development you can configure maven as documented at
|
||||
[CLDR Maven setup](http://cldr.unicode.org/development/maven) (non-Eclipse version).
|
||||
|
||||
But for the CLDR to ICU data conversion, or for regular ICU development this is not needed.
|
||||
|
||||
Notes:
|
||||
|
||||
* Enough things can (and will) fail in this process that it is best to
|
||||
|
@ -65,12 +69,12 @@ Notes:
|
|||
files are used in addition to the CLDR files as inputs to the CLDR data build
|
||||
process for ICU):
|
||||
* The primary file to edit for adding/removing locales and/or collation and
|
||||
rbnf data is<br>
|
||||
`$TOOLS_ROOT/cldr/cldr-to-icu/build-icu-data.xml`.
|
||||
`rbnf` data is \
|
||||
`$ICU_DIR/tools/cldr/cldr-to-icu/config.xml`.
|
||||
* There are some files in `icu4c/source/data/xml/` that may need editing for
|
||||
certain additions. This is especially true for brkitr additions; however there
|
||||
are rbnf files there that add some rules. The collation files there mainly
|
||||
hook up the UCA collation rules in `icu4c/data/unidata/UCARules.txt` to the
|
||||
certain additions. This is especially true for `brkitr` additions; however there
|
||||
are `rbnf` files there that add some rules. The collation files there mainly
|
||||
hook up the UCA collation rules in `icu4c/source/data/unidata/UCARules.txt` to the
|
||||
collation data. To process these files, certain CLDR dtds are copied over to
|
||||
ICU.
|
||||
|
||||
|
@ -88,14 +92,14 @@ considerations:
|
|||
# CLDR prerequisites for BRS integrations
|
||||
|
||||
The following tasks should be done in the CLDR repo before beginning a CLDR-ICU
|
||||
integration that ss part of the BRS process; handle each of these using a separate
|
||||
integration that is part of the BRS process; handle each of these using a separate
|
||||
ticket and a separate PR:
|
||||
|
||||
1. Generate updated CLDR test data (which is copied to ICU), using the process in
|
||||
[Generating CLDR testData](https://docs.google.com/document/d/1-RC99npKcSSwUoYGkSzxaKOe76gYRkWhGdFzCdIBCu4/edit#heading=h.2rum9c6hrr4w)
|
||||
|
||||
2. Run CLDRModify with no options with no options and then with -fP. The webpage
|
||||
for CLDRModify is currently being converted to markdown, a reference to it will
|
||||
2. Run `CLDRModify` with no options with no options and then with `-fP`. The web page
|
||||
for `CLDRModify` is currently being converted to markdown, a reference to it will
|
||||
be added when that process is complete.
|
||||
|
||||
# Environment variables
|
||||
|
@ -120,61 +124,61 @@ There are several environment variables that need to be defined.
|
|||
|
||||
* `CLDR_TMP_DIR`: Parent of temporary CLDR production data. Defaults to
|
||||
`$CLDR_DIR/../cldr-aux` (sibling to `CLDR_DIR`).
|
||||
|
||||
> **NOTE:** As of CLDR 36 and 37, the GenerateProductionData tool no longer
|
||||
|
||||
> **NOTE:** As of CLDR 36 and 37, the `GenerateProductionData` tool no longer
|
||||
generates data by default into `$CLDR_TMP_DIR/production`; instead it
|
||||
generates data into `$CLDR_DIR/../cldr-staging/production` (though there is
|
||||
a command-line option to override this). However the rest of the build still
|
||||
assumes that the generated data is in `$CLDR_TMP_DIR/production`.
|
||||
So `CLDR_TMP_DIR` must be defined to be `CLDR_DIR/../cldr-staging`.
|
||||
|
||||
|
||||
3. ICU-related variables
|
||||
|
||||
* `ICU4C_DIR`: Path to root of ICU4C sources, below which is the source dir.
|
||||
* `ICU_DIR`: Path to root of ICU directory, below which are (e.g.) the
|
||||
`icu4c`, `icu4j`, and `tools` directories.
|
||||
|
||||
* `ICU4J_ROOT`: Path to root of ICU4J sources, below which is the main dir.
|
||||
* `ICU4C_DIR`: Path to root of ICU4C sources, below which is the `source` dir.
|
||||
|
||||
* `ICU4J_ROOT`: Path to root of ICU4J sources, below which is the `main` dir.
|
||||
|
||||
* `TOOLS_ROOT`: Path to root of ICU tools directory, below which are (e.g.) the
|
||||
cldr and unicodetools dirs.
|
||||
|
||||
# Process
|
||||
|
||||
|
||||
## 1 Environment variables
|
||||
|
||||
1a. Java, ant, and maven variables, adjust for your system
|
||||
```
|
||||
```sh
|
||||
export JAVA_HOME=/usr/libexec/java_home
|
||||
export ANT_OPTS="-Xmx8192m"
|
||||
export MAVEN_ARGS="--no-transfer-progress"
|
||||
```
|
||||
|
||||
1b. CLDR variables, adjust for your setup; with cygwin it might be e.g.
|
||||
```
|
||||
```sh
|
||||
CLDR_DIR=`cygpath -wp /build/cldr`
|
||||
```
|
||||
|
||||
Note that for cldr-staging we do not use personal forks, we commit directly.
|
||||
```
|
||||
```sh
|
||||
export CLDR_DIR=$HOME/cldr-myfork
|
||||
export CLDR_TMP_DIR=$HOME/cldr-staging
|
||||
export CLDR_DATA_DIR=$HOME/cldr-staging/production
|
||||
```
|
||||
|
||||
1c. ICU variables
|
||||
```
|
||||
```sh
|
||||
export ICU4C_DIR=$HOME/icu-myfork/icu4c
|
||||
export ICU4J_ROOT=$HOME/icu-myfork/icu4j
|
||||
export TOOLS_ROOT=$HOME/icu-myfork/tools
|
||||
```
|
||||
|
||||
1d. Directory for logs/notes (create if does not exist)
|
||||
```
|
||||
```sh
|
||||
export NOTES=...(some directory)...
|
||||
mkdir -p $NOTES
|
||||
```
|
||||
|
||||
1e. The name of the icu data directory for Java (for example `icudt74b`)
|
||||
```
|
||||
```sh
|
||||
export ICU_DATA_VER=icudt(version)b
|
||||
```
|
||||
|
||||
|
@ -182,10 +186,10 @@ export ICU_DATA_VER=icudt(version)b
|
|||
|
||||
2a. Configure ICU4C, build and test without new data first, to verify that
|
||||
there are no pre-existing errors, and to build some tools needed for later
|
||||
steps. Here `<platform>` is the runConfigureICU code for the platform you
|
||||
steps. Here `<platform>` is the `runConfigureICU` code for the platform you
|
||||
are building on, e.g. Linux, macOS, Cygwin.
|
||||
(optionally build with debug enabled)
|
||||
```
|
||||
```sh
|
||||
cd $ICU4C_DIR/source
|
||||
./runConfigureICU [--enable-debug] <platform>
|
||||
make clean
|
||||
|
@ -195,7 +199,7 @@ make check 2>&1 | tee $NOTES/icu4c-oldData-makeCheck.txt
|
|||
2b. Now with ICU4J, build and test without new data first, to verify that
|
||||
there are no pre-existing errors (or at least to have the pre-existing errors
|
||||
as a base for comparison):
|
||||
```
|
||||
```sh
|
||||
cd $ICU4J_ROOT
|
||||
mvn clean
|
||||
mvn verify 2>&1 | tee $NOTES/icu4j-oldData-mvnCheck.txt
|
||||
|
@ -210,31 +214,33 @@ cp -p $CLDR_DIR/common/dtd/ldmlICU.dtd $ICU4C_DIR/source/data/dtd/cldr/common/dt
|
|||
```
|
||||
|
||||
3b. Update the cldr-icu tooling to use the latest tagged version of ICU
|
||||
```
|
||||
open $TOOLS_ROOT/cldr/cldr-to-icu/pom.xml
|
||||
```sh
|
||||
open $ICU_DIR/tools/cldr/cldr-to-icu/pom.xml
|
||||
```
|
||||
(search for `icu4j-for-cldr` and update to the latest tagged version per instructions)
|
||||
|
||||
3c. Update the build for any new icu version, added locales, etc.
|
||||
```sh
|
||||
# ICU version
|
||||
open $ICU_DIR/tools/cldr/cldr-to-icu/pom.xml
|
||||
# Locales and other configuration changes
|
||||
open $ICU_DIR/tools/cldr/cldr-to-icu/config.xml
|
||||
```
|
||||
open $TOOLS_ROOT/cldr/cldr-to-icu/build-icu-data.xml
|
||||
```
|
||||
(update icuVersion, icuDataVersion if necessary; update lists of locales to include if necessary)
|
||||
(update `icuVersion`, `icuDataVersion` if necessary; update lists of locales to include if necessary)
|
||||
|
||||
3d. If there are new data types or variants in CLDR, you may need to update the
|
||||
files that specify mapping of CLDR data to ICU rseources:
|
||||
```
|
||||
open $TOOLS_ROOT/cldr/cldr-to-icu/src/main/resources/ldml2icu_locale.txt
|
||||
open $TOOLS_ROOT/cldr/cldr-to-icu/src/main/resources/ldml2icu_supplemental.txt
|
||||
files that specify mapping of CLDR data to ICU resources:
|
||||
```sh
|
||||
open $ICU_DIR/tools/cldr/cldr-to-icu/src/main/resources/ldml2icu_locale.txt
|
||||
open $ICU_DIR/tools/cldr/cldr-to-icu/src/main/resources/ldml2icu_supplemental.txt
|
||||
```
|
||||
|
||||
## 4 Build and install CLDR jar
|
||||
|
||||
See `$TOOLS_ROOT/cldr/lib/README.txt` for more information on the CLDR
|
||||
jar and the `install-cldr-jars.sh` script.
|
||||
```
|
||||
cd $TOOLS_ROOT/cldr
|
||||
ant install-cldr-libs
|
||||
See `$ICU_DIR/tools/cldr/cldr-to-icu/README.md` for more information on the CLDR jar.
|
||||
```sh
|
||||
cd "$CLDR_DIR"
|
||||
mvn clean install -pl :cldr-all,:cldr-code -DskipTests -DskipITs
|
||||
```
|
||||
|
||||
## 5 Generate CLDR production data and convert for ICU
|
||||
|
@ -247,14 +253,15 @@ This process uses ant with ICU4C's `data/build.xml`
|
|||
(usually `$CLDR_TMP_DIR/production`), required if any CLDR data has changed.
|
||||
* Running `ant setup` is not required, but it will print useful errors to
|
||||
debug issues with your path when it fails.
|
||||
```
|
||||
|
||||
```sh
|
||||
cd $ICU4C_DIR/source/data
|
||||
ant cleanprod
|
||||
ant setup
|
||||
ant proddata 2>&1 | tee $NOTES/cldr-newData-proddataLog.txt
|
||||
```
|
||||
|
||||
> Note, for CLDR development, at this point tests are sometimes run on the
|
||||
> Note, for CLDR development, at this point tests are sometimes run on the
|
||||
production data, see
|
||||
[BRS: Run tests on production data](https://cldr.unicode.org/development/cldr-big-red-switch/brs-run-tests-on-production-data)
|
||||
|
||||
|
@ -262,26 +269,27 @@ ant proddata 2>&1 | tee $NOTES/cldr-newData-proddataLog.txt
|
|||
|
||||
These include .txt files and .py files. These new files will replace whatever was
|
||||
already present in the ICU4C sources. This process uses the `LdmlConverter` in
|
||||
`$TOOLS_ROOT/cldr/cldr-to-icu/`; see `$TOOLS_ROOT/cldr/cldr-to-icu/README.txt`.
|
||||
`$ICU_DIR/tools/cldr/cldr-to-icu/`; see `$ICU_DIR/tools/cldr/cldr-to-icu/README.md`.
|
||||
|
||||
* This process will take several minutes, during most of which there will be no log
|
||||
output (so do not assume nothing is happening). Keep a log so you can investigate
|
||||
anything that looks suspicious.
|
||||
* Note that `ant clean` should _not_ be run before this. The `build-icu-data.xml` process
|
||||
* The conversion tool
|
||||
will automatically run its own "clean" step to delete files it cannot determine to
|
||||
be ones that it would generate, except for pasts listed in `<retain>` elements such as
|
||||
`coll/de__PHONEBOOK.txt`, `coll/de_.txt`, etc.
|
||||
* Before running ant to regenerate the data, make any necessary changes to the
|
||||
build-icu-data.xml file, such as adding new locales etc.
|
||||
```
|
||||
cd $TOOLS_ROOT/cldr/cldr-to-icu
|
||||
ant -f build-icu-data.xml -DcldrDataDir="$CLDR_TMP_DIR/production" | tee $NOTES/cldr-newData-builddataLog.txt
|
||||
* Before running the tool to regenerate the data, make any necessary changes to the
|
||||
`config.xml` file, such as adding new locales etc.
|
||||
|
||||
```sh
|
||||
cd $ICU_DIR/tools/cldr/cldr-to-icu
|
||||
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar --cldrDataDir="$CLDR_TMP_DIR/production" | tee $NOTES/cldr-newData-builddataLog.txt
|
||||
```
|
||||
|
||||
5c. Update the CLDR testData files needed by ICU4C/J tests, ensuring
|
||||
they are representative of the newest CLDR data.
|
||||
```
|
||||
cd $TOOLS_ROOT/cldr
|
||||
```sh
|
||||
cd $ICU_DIR/tools/cldr
|
||||
ant copy-cldr-testdata
|
||||
```
|
||||
|
||||
|
@ -289,7 +297,7 @@ ant copy-cldr-testdata
|
|||
(This step has been subsumed into 5c above)
|
||||
|
||||
5e. For now, manually re-add the `lstm` entries in `data/brkitr/root.txt`
|
||||
```
|
||||
```sh
|
||||
open $ICU4C_DIR/source/data/brkitr/root.txt
|
||||
```
|
||||
Paste the following block after the dictionaries block and before the final closing '}':
|
||||
|
@ -302,20 +310,20 @@ Paste the following block after the dictionaries block and before the final clos
|
|||
|
||||
5f. Update hard-coded lists in ICU
|
||||
|
||||
ICU4 has some hard-coded lists of locale-related codes that may need updating. Ideally these should
|
||||
ICU has some hard-coded lists of locale-related codes that may need updating. Ideally these should
|
||||
be replaced by data converted from CLDR ([ICU-22839](https://unicode-org.atlassian.net/browse/ICU-22839)). In the
|
||||
meantime these need to be updated manually.
|
||||
|
||||
| code type | icu4c/source library file(s) | icu4c/source test file(s) |
|
||||
| -------------------------------------------------------------------------------------------- | ------------------------------------------- | ------------------------------------------- |
|
||||
| language<BR>(at least all language codes in ICU locales or CLDR attributeValueValidity.xml) | common/uloc.cpp: LANGUAGES[], LANGUAGES_3[] | test/testdata/structLocale.txt: Languages |
|
||||
| region<BR>(at least all region codes in ICU locales or CLDR attributeValueValidity.xml) | common/uloc.cpp: COUNTRIES[], COUNTRIES_3[] | test/testdata/structLocale.txt: Countries |
|
||||
| currency (see note below)<BR>(at least everything in CLDR supplementalData.xml currencyData) | common/ucurr.cpp: gCurrencyList[]] | test/testdata/structLocale.txt: Currencies,CurrencyPlurals<BR>test/cintltst/currtest.c:TestEnumList() |
|
||||
| timezone | (not currently aware of hard-coded list) | test/testdata/structLocale.txt: zoneStrings |
|
||||
| language<BR>(at least all language codes in ICU locales or CLDR `attributeValueValidity.xml`) | `common/uloc.cpp`: `LANGUAGES[], LANGUAGES_3[]` | `test/testdata/structLocale.txt`: Languages |
|
||||
| region<BR>(at least all region codes in ICU locales or CLDR `attributeValueValidity.xml`) | `common/uloc.cpp`: `COUNTRIES[], COUNTRIES_3[]` | `test/testdata/structLocale.txt`: Countries |
|
||||
| currency (see note below)<BR>(at least everything in CLDR `supplementalData.xml` `currencyData`) | `common/ucurr.cpp`: `gCurrencyList[]]` | `test/testdata/structLocale.txt`: `Currencies`,`CurrencyPlurals`<BR>`test/cintltst/currtest.c`:`TestEnumList()` |
|
||||
| timezone | (not currently aware of hard-coded list) | `test/testdata/structLocale.txt`: `zoneStrings` |
|
||||
|
||||
Note: currency code lists are also in other code lists along with measurement units,
|
||||
but these are re-generated using the procedure in
|
||||
[Updating MeasureUnit with new CLDR data](https://unicode-org.github.io/icu/processes/release/tasks/updating-measure-unit.html)
|
||||
[Updating `MeasureUnit` with new CLDR data](https://unicode-org.github.io/icu/processes/release/tasks/updating-measure-unit.html)
|
||||
(also mentioned in step 14 below).
|
||||
|
||||
## 6 Check the results
|
||||
|
@ -323,7 +331,7 @@ but these are re-generated using the procedure in
|
|||
Check which data files have modifications, which have been added or removed
|
||||
(if there are no changes, you may not need to proceed further). Make sure the
|
||||
list seems reasonable. You may want to save logs, and possibly examine them...
|
||||
```
|
||||
```sh
|
||||
cd $ICU4C_DIR/..
|
||||
git status
|
||||
git status > $NOTES/gitStatusDelta-data.txt
|
||||
|
@ -332,7 +340,7 @@ open $NOTES/gitDiffDelta-data.txt
|
|||
```
|
||||
|
||||
6a. You may also want to check which files were modified in CLDR production data:
|
||||
```
|
||||
```sh
|
||||
cd $CLDR_TMP_DIR
|
||||
git status
|
||||
git status > $NOTES/gitStatusDelta-staging.txt
|
||||
|
@ -342,25 +350,25 @@ git diff > $NOTES/gitDiffDelta-staging.txt
|
|||
## 7 Fix data generation errors
|
||||
|
||||
Look for evident errors in the list of file changes, or in the file diffs.
|
||||
Fixing them may entail modifying CLDR source data or `TOOLS_ROOT` config files or
|
||||
Fixing them may entail modifying CLDR source data or `$ICU_DIR/tools/cldr/cldr-to-icu` config files or
|
||||
tooling.
|
||||
|
||||
## 8 Rebuild ICU4C with new data, run tests
|
||||
|
||||
8a. Re-run configure and make clean, necessary to handle any files added or deleted:
|
||||
```
|
||||
```sh
|
||||
cd $ICU4C_DIR/source
|
||||
./runConfigureICU [--enable-debug] <platform>
|
||||
make clean
|
||||
```
|
||||
|
||||
8b. Do the rebuild, keeping a log as before:
|
||||
```
|
||||
```sh
|
||||
make check 2>&1 | tee $NOTES/icu4c-newData-makeCheck.txt
|
||||
```
|
||||
|
||||
To re-run a specific test if necessary when fixing bugs; for example:
|
||||
```
|
||||
```sh
|
||||
cd test/intltest
|
||||
DYLD_LIBRARY_PATH=../../lib:../../stubdata:../../tools/ctestfw:$DYLD_LIBRARY_PATH ./intltest -e -G format/NumberTest/NumberPermutationTest
|
||||
cd ../..
|
||||
|
@ -380,7 +388,8 @@ ticket under which you are performing the integration, if you have one), fix the
|
|||
and regenerate from step 4.
|
||||
|
||||
If the data is OK , other sources of failure can include:
|
||||
* Problems with the CLDR-ICU conversion process (pehaps some locale data is not getting
|
||||
|
||||
* Problems with the CLDR-ICU conversion process (perhaps some locale data is not getting
|
||||
converted properly; go back to step 3, adjust and repeat from there.
|
||||
* Problems with ICU library code that may not be using new resources properly. Fix and
|
||||
repeat from step 8.
|
||||
|
@ -390,9 +399,9 @@ If the data is OK , other sources of failure can include:
|
|||
you will need to update `icu4c/test/testdata/structLocale.txt` (otherwise
|
||||
`/tsutil/cldrtest/TestLocaleStructure` may fail).
|
||||
|
||||
## 10 Running ICU4C tests in exhaustive mode.
|
||||
## 10 Running ICU4C tests in exhaustive mode
|
||||
|
||||
Exhautive tests should always be run for a CLDR-ICU integration PR before it is merged.
|
||||
Exhaustive tests should always be run for a CLDR-ICU integration PR before it is merged.
|
||||
Once you have a PR, you can do this for both C and J as part of the pre-merge CI tests
|
||||
by manually running a workflow (the exhaustive tests are not run automatically on every PR).
|
||||
See [Continuous Integration / Exhaustive Tests](../userguide/dev/ci.md#exhaustive-tests).
|
||||
|
@ -400,7 +409,7 @@ See [Continuous Integration / Exhaustive Tests](../userguide/dev/ci.md#exhaustiv
|
|||
The following instructions run the ICU4C exhaustive tests locally (which you may want to do
|
||||
before even committing changes, or which may be necessary to diagnose failures in the
|
||||
CI tests):
|
||||
```
|
||||
```sh
|
||||
cd $ICU4C_DIR/source
|
||||
export INTLTEST_OPTS="-e"
|
||||
export CINTLTST_OPTS="-e"
|
||||
|
@ -415,13 +424,13 @@ appropriate, and repeating from step 4 or 8 as appropriate.
|
|||
## 12 Transfer the ICU4C data to ICU4J
|
||||
|
||||
12a. You need to reconfigure ICU4C to include the unicore data.
|
||||
```
|
||||
```sh
|
||||
cd $ICU4C_DIR/source
|
||||
ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ./runConfigureICU <platform>
|
||||
```
|
||||
|
||||
12b. Rebuild the data with the new config setting, then create the ICU4J data jar.
|
||||
```
|
||||
```sh
|
||||
cd $ICU4C_DIR/source/data
|
||||
make clean
|
||||
make -j -l2.5
|
||||
|
@ -429,13 +438,13 @@ make icu4j-data-install
|
|||
```
|
||||
|
||||
12c. Create the test data jar
|
||||
```
|
||||
```sh
|
||||
cd $ICU4C_DIR/source/test/testdata
|
||||
make icu4j-data-install
|
||||
```
|
||||
|
||||
12d. Update the extracted {main, test} data files in the Maven build
|
||||
```
|
||||
```sh
|
||||
cd $ICU4J_ROOT
|
||||
./extract-data-files.sh
|
||||
```
|
||||
|
@ -443,7 +452,7 @@ cd $ICU4J_ROOT
|
|||
## 13 Rebuild ICU4J with new data, run tests
|
||||
|
||||
13a. Run the tests using the maven build
|
||||
```
|
||||
```sh
|
||||
cd $ICU4J_ROOT
|
||||
mvn clean
|
||||
mvn install 2>&1 | tee $NOTES/icu4j-newData-mvnCheck.txt
|
||||
|
@ -451,26 +460,29 @@ mvn install 2>&1 | tee $NOTES/icu4j-newData-mvnCheck.txt
|
|||
|
||||
It is possible to re-run a specific test class or method if necessary when fixing bugs.
|
||||
|
||||
For example (using artifactId, full class name, test all methods):
|
||||
```
|
||||
For example (using `artifactId`, full class name, test all methods):
|
||||
```sh
|
||||
mvn install -pl :core -Dtest=com.ibm.icu.dev.test.util.LocaleBuilderTest
|
||||
```
|
||||
or (example of using module path, class name, one method):
|
||||
```
|
||||
```sh
|
||||
mvn install -pl main/common_tests -Dtest=MeasureUnitTest#TestGreek
|
||||
```
|
||||
|
||||
13b. Optionally run the tests in exhautive mode
|
||||
13b. Optionally run the tests in exhaustive mode
|
||||
|
||||
Optionally run before committing changes, or run to diagnose failures from
|
||||
running exhastive CI tests in the PR using `/azp run CI-Exhaustive`:
|
||||
```
|
||||
Optionally run exhaustive tests locally before committing changes:
|
||||
```sh
|
||||
cd $ICU4J_ROOT
|
||||
mvn install -DICU.exhaustive=10 2>&1 | tee $NOTES/icu4j-newData-mvnCheckEx.txt
|
||||
```
|
||||
|
||||
Exhaustive tests in CI can be triggered by running the "Exhaustive Tests for ICU"
|
||||
action from the GitHub web UI.
|
||||
See [Continuous Integration / Exhaustive Tests](../userguide/dev/ci.md#exhaustive-tests).
|
||||
|
||||
Running a specific test is the same as above:
|
||||
```
|
||||
```sh
|
||||
mvn install --pl :core -DICU.exhaustive=10 -Dtest=ExhaustiveNumberTest
|
||||
```
|
||||
|
||||
|
@ -482,7 +494,7 @@ step 4, as appropriate, until there are no more failures in ICU4C or ICU4J.
|
|||
Note that certain data changes and related test failures may require the
|
||||
rebuilding of other kinds of data and/or code. For example:
|
||||
|
||||
### Updating MeasureUnit code and tests
|
||||
### Updating `MeasureUnit` code and tests
|
||||
|
||||
If you see a failure such as
|
||||
```
|
||||
|
@ -490,7 +502,7 @@ MeasureUnitTest testCLDRUnitAvailability Failure (MeasureUnitTest.java:3410) : U
|
|||
```
|
||||
then you will need to update the C and J library and test code for new measurement
|
||||
units, see the procedure at
|
||||
[Updating MeasureUnit with new CLDR data](https://unicode-org.github.io/icu/processes/release/tasks/updating-measure-unit.html)
|
||||
[Updating `MeasureUnit` with new CLDR data](https://unicode-org.github.io/icu/processes/release/tasks/updating-measure-unit.html)
|
||||
|
||||
### Updating plurals test data
|
||||
|
||||
|
@ -503,12 +515,12 @@ To address these requires updating the LOCALE_SNAPSHOT data in
|
|||
```
|
||||
$ICU4J_ROOT/main/common_tests/src/test/java/com/ibm/icu/dev/test/format/PluralRulesTest.java
|
||||
```
|
||||
by modifying the TestLocales() test there to run `generateLOCALE_SNAPSHOT()` and
|
||||
by modifying the `TestLocales()` test there to run `generateLOCALE_SNAPSHOT()` and
|
||||
then copying in the updated data.
|
||||
|
||||
## 15 Check the ICU file changes and commit
|
||||
|
||||
```
|
||||
```sh
|
||||
cd $ICU4C_DIR/source
|
||||
make clean
|
||||
cd $ICU4J_ROOT
|
||||
|
@ -528,13 +540,13 @@ git push origin ICU-nnnnn-branchname
|
|||
(Only for an official integration from CLDR git repositories)
|
||||
|
||||
16a. Check cldr-staging changes, and commit
|
||||
```
|
||||
```sh
|
||||
cd $CLDR_TMP_DIR
|
||||
git status
|
||||
```
|
||||
|
||||
Then `git add` or `git rm` files as necessary. Record the changes, commit and push.
|
||||
```
|
||||
```sh
|
||||
git status > $NOTES/gitStatusDelta-production-afterAdd.txt
|
||||
git commit -m 'CLDR-nnnnn production data corresponding to CLDR release-nn-stage'
|
||||
git push origin main
|
||||
|
@ -545,8 +557,8 @@ git push origin main
|
|||
|
||||
(There may be other cldr-staging changes unrelated to production data, such as charts
|
||||
or spec; we want to include them in the tag, so pull first, but log to see what the
|
||||
chnages are first)
|
||||
```
|
||||
changes are first)
|
||||
```sh
|
||||
cd $CLDR_TMP_DIR
|
||||
git pull
|
||||
git log
|
||||
|
@ -559,7 +571,7 @@ git push --tags
|
|||
|
||||
We need to tag the main cldr repository. If $CLDR_DIR represents that repository,
|
||||
this is easy:
|
||||
```
|
||||
```sh
|
||||
cd $CLDR_DIR
|
||||
git tag -a "release-nn-stage" -m "CLDR-nnnnn: tag CLDR release-nn-stage"
|
||||
git push --tags
|
||||
|
@ -567,7 +579,7 @@ git push --tags
|
|||
|
||||
However if $CLDR_DIR represents your personal fork or a branch from it, you need to
|
||||
figure out what commit hash yo have integrated, and tag that hash in the main repo.
|
||||
```
|
||||
```sh
|
||||
cd $CLDR_DIR
|
||||
git log
|
||||
```
|
||||
|
@ -575,7 +587,7 @@ Note the latest commit hash hhhhhhhh...
|
|||
|
||||
Then switch to the main repo, update it, and tag the appropriate hash (making sure
|
||||
it is in that repo!):
|
||||
```
|
||||
```sh
|
||||
cd $HOME/cldr
|
||||
git pull
|
||||
git log
|
||||
|
@ -583,7 +595,7 @@ git tag -a "release-nn-stage" -m "CLDR-nnnnn: tag CLDR release-nn-stage" hhhhhhh
|
|||
git push --tags
|
||||
```
|
||||
|
||||
## 18 Pubish the cldr tags in github
|
||||
## 18 Publish the cldr tags in github
|
||||
|
||||
You should publish the cldr and cldr-staging tags in github.
|
||||
|
||||
|
|
|
@ -53,6 +53,13 @@ need to be correspondingly updated. See below for more files to be updated and s
|
|||
[icu4c/source/data/misc/icuver.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/misc/icuver.txt)
|
||||
needs to be updated with the correct version number for ICU and its data.
|
||||
|
||||
#### Since ICU 77
|
||||
|
||||
The tool takes the `icuVersion` and `icuDataVersion` from the official ICU APIs.
|
||||
(from the icu4j listed as a dependency of the tool, usually the one you just built from the `icu4j` folder).
|
||||
|
||||
If you need values different than that, you can specify them as the command line parameters (`--icuVersion` and `--icuDataVersion`).
|
||||
|
||||
#### Since ICU 68
|
||||
|
||||
In
|
||||
|
@ -212,8 +219,18 @@ The command requires a version number string that follows the typical Java / Mav
|
|||
|
||||
6. cldr-to-icu build tool has a dependency on the icu4j packages which needs to be updated in [`tools/cldr/cldr-to-icu/pom.xml`](https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/pom.xml). Please update it to match the version that was updated in `icu4j/pom.xml` in the steps above.
|
||||
|
||||
`<version>74.0.1-SNAPSHOT</version>`
|
||||
```xml
|
||||
version>74.0.1-SNAPSHOT</version>
|
||||
```
|
||||
|
||||
Since ICU 77 this moved to a property:
|
||||
```xml
|
||||
<icu4j.version>77.0.1-SNAPSHOT</icu4j.version>
|
||||
```
|
||||
Which can be easily be set from command line:
|
||||
```sh
|
||||
mvn versions:set-property -Dproperty=icu4j.version -DnewVersion=77.1 -f $ICU_DIR/tools/cldr/cldr-to-icu
|
||||
```
|
||||
|
||||
#### Until ICU 73 (inclusive)
|
||||
|
||||
|
|
|
@ -290,6 +290,8 @@ copying that version number into the $ICU_SRC/.bazeliskrc config file.
|
|||
- run Unicode Tools GenerateUnihanCollators & GenerateUnihanCollatorFiles,
|
||||
check CLDR diffs, copy to CLDR, test CLDR, ... as documented there
|
||||
- generate ICU zh collation data
|
||||
WARNING: outdated, don't do this, follow the tools/cldr/cldr-to-icu/README.md file!
|
||||
--- Old text from here:
|
||||
instructions inspired by
|
||||
https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/README.txt and
|
||||
https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt
|
||||
|
|
13
tools/cldr/.gitignore
vendored
13
tools/cldr/.gitignore
vendored
|
@ -1,9 +1,4 @@
|
|||
# Exclude the Maven local repository but keep the lib directory and the top-level readme, scripts and build config.
|
||||
/lib/**
|
||||
!/lib/README.txt
|
||||
!/lib/install-cldr-jars.sh
|
||||
!/lib/pom.xml
|
||||
|
||||
# Ignore the default Maven target directory.
|
||||
/cldr-to-icu/target
|
||||
|
||||
# Eclipse IDE generated files
|
||||
.classpath
|
||||
.project
|
||||
.settings/
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
|
||||
<!-- This build file is intended to become the single mechanism for working with CLDR
|
||||
code and data when building ICU data.
|
||||
|
||||
|
||||
Eventually it will encompass:
|
||||
* Building ICU data form CLDR data via cldr-to-icu.
|
||||
* Building the CLDR libraries needed to support ICU data conversion.
|
||||
|
@ -70,23 +70,4 @@
|
|||
<delete dir="${testDataDir4J}"/>
|
||||
</target>
|
||||
|
||||
<!-- Builds the ICU data, using the Ant build file in the cldr-to-icu directory and passing.
|
||||
through any specified arguments for controlling the build. If you need more control when
|
||||
building ICU data (such as incrementally building parts of the data), you should use the
|
||||
build-icu-data.xml file directly. -->
|
||||
<target name="build-icu-data">
|
||||
<ant dir="cldr-to-icu" antfile="build-icu-data.xml" target="all" inheritAll="true"/>
|
||||
</target>
|
||||
|
||||
<!-- Deletes generated ICU data by invoking "clean" in cldr-to-icu/build-icu-data.xml -->
|
||||
<target name="clean-icu-data">
|
||||
<ant dir="cldr-to-icu" antfile="build-icu-data.xml" target="clean" inheritAll="true"/>
|
||||
</target>
|
||||
|
||||
<!-- Installs the CLDR library dependencies needed for building ICU data. -->
|
||||
<target name="install-cldr-libs" depends="init-args">
|
||||
<exec dir="lib" executable="install-cldr-jars.sh" resolveexecutable="true" failonerror="true">
|
||||
<arg line="${cldrDir}"/>
|
||||
</exec>
|
||||
</target>
|
||||
</project>
|
||||
|
|
|
@ -1,31 +0,0 @@
|
|||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<classpath>
|
||||
<classpathentry kind="src" output="target/classes" path="src/main/java">
|
||||
<attributes>
|
||||
<attribute name="optional" value="true"/>
|
||||
<attribute name="maven.pomderived" value="true"/>
|
||||
</attributes>
|
||||
</classpathentry>
|
||||
<classpathentry excluding="**" kind="src" output="target/classes" path="src/main/resources">
|
||||
<attributes>
|
||||
<attribute name="maven.pomderived" value="true"/>
|
||||
</attributes>
|
||||
</classpathentry>
|
||||
<classpathentry kind="src" output="target/test-classes" path="src/test/java">
|
||||
<attributes>
|
||||
<attribute name="optional" value="true"/>
|
||||
<attribute name="maven.pomderived" value="true"/>
|
||||
</attributes>
|
||||
</classpathentry>
|
||||
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.8">
|
||||
<attributes>
|
||||
<attribute name="maven.pomderived" value="true"/>
|
||||
</attributes>
|
||||
</classpathentry>
|
||||
<classpathentry kind="con" path="org.eclipse.m2e.MAVEN2_CLASSPATH_CONTAINER">
|
||||
<attributes>
|
||||
<attribute name="maven.pomderived" value="true"/>
|
||||
</attributes>
|
||||
</classpathentry>
|
||||
<classpathentry kind="output" path="target/classes"/>
|
||||
</classpath>
|
|
@ -1,23 +0,0 @@
|
|||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<projectDescription>
|
||||
<name>cldr-to-icu</name>
|
||||
<comment></comment>
|
||||
<projects>
|
||||
</projects>
|
||||
<buildSpec>
|
||||
<buildCommand>
|
||||
<name>org.eclipse.jdt.core.javabuilder</name>
|
||||
<arguments>
|
||||
</arguments>
|
||||
</buildCommand>
|
||||
<buildCommand>
|
||||
<name>org.eclipse.m2e.core.maven2Builder</name>
|
||||
<arguments>
|
||||
</arguments>
|
||||
</buildCommand>
|
||||
</buildSpec>
|
||||
<natures>
|
||||
<nature>org.eclipse.jdt.core.javanature</nature>
|
||||
<nature>org.eclipse.m2e.core.maven2Nature</nature>
|
||||
</natures>
|
||||
</projectDescription>
|
|
@ -1,5 +0,0 @@
|
|||
eclipse.preferences.version=1
|
||||
org.eclipse.jdt.core.compiler.codegen.targetPlatform=1.8
|
||||
org.eclipse.jdt.core.compiler.compliance=1.8
|
||||
org.eclipse.jdt.core.compiler.problem.forbiddenReference=warning
|
||||
org.eclipse.jdt.core.compiler.source=1.8
|
|
@ -1,5 +0,0 @@
|
|||
eclipse.preferences.version=1
|
||||
org.eclipse.jdt.ui.ignorelowercasenames=true
|
||||
org.eclipse.jdt.ui.importorder=java;javax;org;com;
|
||||
org.eclipse.jdt.ui.ondemandthreshold=9999
|
||||
org.eclipse.jdt.ui.staticondemandthreshold=9999
|
|
@ -1,4 +0,0 @@
|
|||
activeProfiles=
|
||||
eclipse.preferences.version=1
|
||||
resolveWorkspaceProjects=true
|
||||
version=1
|
|
@ -6,32 +6,56 @@ License & terms of use: http://www.unicode.org/copyright.html
|
|||
# Basic instructions for running the LdmlConverter via Maven
|
||||
|
||||
> Note: While this document provides useful background information about the
|
||||
LdmlConverter, the actual complete process for integrating CLDR data to ICU
|
||||
`LdmlConverter`, the actual complete process for integrating CLDR data to ICU
|
||||
is described in the document `../../../docs/processes/cldr-icu.md` which is
|
||||
best viewed as
|
||||
[CLDR-ICU integration](https://unicode-org.github.io/icu/processes/cldr-icu.html)
|
||||
|
||||
## TLDR
|
||||
|
||||
* Define the `ICU_DIR`, `CLDR_DIR`, and `CLDR_DATA_DIR` environment variables, or (see below)
|
||||
* Check / update versions
|
||||
* Build ICU4J:
|
||||
```sh
|
||||
cd "$ICU_DIR"
|
||||
mvn clean install -f icu4j -DskipTests -DskipITs
|
||||
```
|
||||
* Build the `cldr-code` library from the `cldr` repo:
|
||||
```sh
|
||||
cd "$CLDR_DIR"
|
||||
mvn clean install -pl :cldr-all,:cldr-code -DskipTests -DskipITs
|
||||
```
|
||||
* Build the conversion tool:
|
||||
```sh
|
||||
cd "$ICU_DIR/tools/cldr/cldr-to-icu/"
|
||||
mvn clean package -DskipTests -DskipITs
|
||||
```
|
||||
* Run the conversion tool:
|
||||
```sh
|
||||
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
* A CLDR release for supplying CLDR data and the CLDR API.
|
||||
* JDK 11+
|
||||
* The Maven build tool
|
||||
* The Ant build tool (using JDK 11+)
|
||||
|
||||
## Important directories
|
||||
|
||||
| Directory | Description |
|
||||
|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `TOOLS_ROOT` | Path to root of ICU tools directory, below which are (e.g.) the `cldr/` and `unicodetools/` directories. |
|
||||
| `ICU_DIR` | Path to root of ICU directory, below which are (e.g.) the `icu4c/`, `icu4j/` and `tools/` directories. |
|
||||
| `CLDR_DIR` | This is the path to the to root of standard CLDR sources, below which are the `common/` and `tools/` directories. |
|
||||
| `CLDR_DATA_DIR` | The top-level directory for the CLDR production data (typically the "production" directory in the staging repository). Usually generated locally or obtained from: https://github.com/unicode-org/cldr-staging/tree/main/production |
|
||||
|
||||
In Posix systems, it's best to set these as exported shell variables, and any
|
||||
following instructions assume they have been set accordingly:
|
||||
|
||||
```
|
||||
$ export TOOLS_ROOT=/path/to/icu/tools
|
||||
$ export CLDR_DIR=/path/to/cldr
|
||||
$ export CLDR_DATA_DIR=/path/to/cldr-staging/production
|
||||
```sh
|
||||
export TOOLS_ROOT=/path/to/icu/tools
|
||||
export CLDR_DIR=/path/to/cldr
|
||||
export CLDR_DATA_DIR=/path/to/cldr-staging/production
|
||||
```
|
||||
|
||||
Note that you should not attempt to use data from the CLDR project directory
|
||||
|
@ -40,65 +64,132 @@ relies on a pre-processing step, and the CLDR data must come from the separate
|
|||
"staging" repository (i.e. https://github.com/unicode-org/cldr-staging) or be
|
||||
pre-processed locally into a different directory.
|
||||
|
||||
:point_right: **Note**: the 3 folders can also be overridden:
|
||||
|
||||
* with Java properties (e.g. `-DCLDR_DIR=/foo/bar`)
|
||||
* from the command line when invoking the tool (the `icuDir`, `cldrDir`, and `cldrDataDir` options)
|
||||
|
||||
## Initial Setup
|
||||
|
||||
This project relies on the Maven build tool for managing dependencies and uses
|
||||
Ant for configuration purposes, so both will need to be installed. On a Debian
|
||||
This project relies on the Maven build tool for managing dependencies, so it will need to be installed. On a Debian
|
||||
based system, this should be as simple as:
|
||||
|
||||
```
|
||||
$ sudo apt-get install maven ant
|
||||
```sh
|
||||
sudo apt-get install maven
|
||||
```
|
||||
|
||||
You must also install an additional CLDR JAR file the local Maven repository at
|
||||
`$TOOLS_ROOT/cldr/lib` (see the `README.txt` in that directory for more
|
||||
information).
|
||||
## Check / update versions
|
||||
|
||||
### Real versions
|
||||
|
||||
**ICU version (`real_icu_ver`):**
|
||||
```sh
|
||||
mvn help:evaluate -Dexpression=project.version -q -DforceStdout -f $ICU_DIR/icu4j
|
||||
```
|
||||
$ cd "$TOOLS_ROOT/cldr/lib"
|
||||
$ ./install-cldr-jars.sh "$CLDR_DIR"
|
||||
|
||||
**CLDR Library version (`real_cldr_ver`):**
|
||||
```sh
|
||||
mvn help:evaluate -Dexpression=project.version -q -DforceStdout -f $CLDR_DIR/tools
|
||||
```
|
||||
|
||||
### Dependency versions
|
||||
|
||||
**ICU version used by the cldr conversion tool:** \
|
||||
⚠️ **Warning:** Must be the same as `real_icu_ver`
|
||||
```sh
|
||||
mvn help:evaluate -Dexpression=icu4j.version -q -DforceStdout -f $ICU_DIR/tools/cldr/cldr-to-icu
|
||||
```
|
||||
|
||||
**CLDR library version used by the cldr conversion tool:** \
|
||||
⚠️ **Warning:** Must be the same as `real_cldr_ver`
|
||||
```sh
|
||||
mvn help:evaluate -Dexpression=cldr-code.version -q -DforceStdout -f $ICU_DIR/tools/cldr/cldr-to-icu
|
||||
```
|
||||
|
||||
**ICU version used by the cldr library:** \
|
||||
⚠️ **Warning:** Must be the same as `real_icu_ver`
|
||||
```sh
|
||||
mvn help:evaluate -Dexpression=icu4j.version -q -DforceStdout -f $CLDR_DIR/tools
|
||||
```
|
||||
|
||||
### TLDR (Quick update versions without checking)
|
||||
|
||||
```sh
|
||||
# Get real versions
|
||||
real_icu_ver=`mvn help:evaluate -Dexpression=project.version -q -DforceStdout -f $ICU_DIR/icu4j`
|
||||
echo $real_icu_ver
|
||||
real_cldr_ver=`mvn help:evaluate -Dexpression=project.version -q -DforceStdout -f $CLDR_DIR/tools`
|
||||
echo $real_cldr_ver
|
||||
# Set dependency versions
|
||||
mvn versions:set-property -Dproperty=icu4j.version -DnewVersion=$real_icu_ver -f $ICU_DIR/tools/cldr/cldr-to-icu
|
||||
mvn versions:set-property -Dproperty=cldr-code.version -DnewVersion=$real_cldr_ver -f $ICU_DIR/tools/cldr/cldr-to-icu
|
||||
mvn versions:set-property -Dproperty=icu4j.version -DnewVersion=$real_icu_ver -f $CLDR_DIR/tools
|
||||
```
|
||||
|
||||
## Build everything
|
||||
|
||||
You must also build and install an additional CLDR library in the the local Maven repository.
|
||||
|
||||
Since that depends on ICU4J, you need to build and install that first.
|
||||
|
||||
Lastly, build the conversion tool
|
||||
|
||||
```sh
|
||||
# Build ICU4J
|
||||
cd "$ICU_DIR"
|
||||
mvn clean install -f icu4j -DskipTests -DskipITs
|
||||
# Build the CLDR library
|
||||
cd "$CLDR_DIR"
|
||||
mvn clean install -pl :cldr-all,:cldr-code -DskipTests -DskipITs
|
||||
# Build the conversion tool
|
||||
cd "$ICU_DIR/tools/cldr/cldr-to-icu/"
|
||||
mvn clean package -DskipTests -DskipITs
|
||||
```
|
||||
|
||||
## Generating all ICU data and source code
|
||||
|
||||
Run the conversion tool:
|
||||
```sh
|
||||
cd "$ICU_DIR/tools/cldr/cldr-to-icu/"
|
||||
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar
|
||||
```
|
||||
$ cd "$TOOLS_ROOT/cldr/cldr-to-icu"
|
||||
$ ant -f build-icu-data.xml
|
||||
```
|
||||
|
||||
You can run it with `--help` for all the options supported.
|
||||
|
||||
## Other Examples
|
||||
|
||||
* Outputting a subset of the supplemental data into a specified directory:
|
||||
```
|
||||
$ ant -f build-icu-data.xml -DoutDir=/tmp/cldr -DoutputTypes=plurals,dayPeriods -DdontGenCode=true
|
||||
```sh
|
||||
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar --outDir=/tmp/cldr --outputTypes=plurals,dayPeriods --dontGenCode=true
|
||||
```
|
||||
Note: Output types can be listed with mixedCase, lower_underscore or UPPER_UNDERSCORE.
|
||||
Pass `-DoutputTypes=help` to see the full list.
|
||||
|
||||
|
||||
* Outputting only a subset of locale IDs (and all the supplemental data):
|
||||
```
|
||||
$ ant -f build-icu-data.xml -DoutDir=/tmp/cldr -DlocaleIdFilter='(zh|yue).*' -DdontGenCode=true
|
||||
```sh
|
||||
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar --outDir=/tmp/cldr --outputTypes=plurals,dayPeriods --dontGenCode=true
|
||||
|
||||
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar --outDir=/tmp/cldr --localeIdFilter='(zh|yue).*' --dontGenCode=true
|
||||
```
|
||||
|
||||
* Overriding the default CLDR version string (which normally matches the CLDR library code):
|
||||
```
|
||||
$ ant -f build-icu-data.xml -DcldrVersion="36.1"
|
||||
```sh
|
||||
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar --cldrVersion="36.1"
|
||||
```
|
||||
|
||||
### Using `alt="ascii"` CLDR alternate values from the CLDR XML
|
||||
|
||||
CLDR provides alternate values in addition to the default values for locale data.
|
||||
|
||||
For example, some locales have time formats using U+202F NARROW NO-BREAK SPACE (NNBSP) between the hours/minutes/seconds and the day periods.
|
||||
For example, some locales have time formats using U+202F NARROW NO-BREAK SPACE (`NNBSP`) between the hours/minutes/seconds and the day periods.
|
||||
In order to provide the equivalent time formats that use the ASCII space
|
||||
U+0020 SPACE,
|
||||
the alternate values have the extra attribute `alt="ascii"`.
|
||||
|
||||
Follw these steps to generate ICU data using the ASCII versions of locale data:
|
||||
|
||||
1. First, edit the `build-icu-data.xml` file where it mentions `ALTERNATE VALUES`
|
||||
1. First, edit the `config.xml` file where it mentions `ALTERNATE VALUES`
|
||||
with the correctly annotated source path, target path, and locales list
|
||||
as follows:
|
||||
|
||||
|
@ -150,10 +241,10 @@ as follows:
|
|||
+ source="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='hms'][@alt='ascii']"/>
|
||||
```
|
||||
|
||||
1. Then run the generator:
|
||||
1. Then run the generator:
|
||||
|
||||
```
|
||||
$ ant -f build-icu-data.xml <options>
|
||||
```sh
|
||||
java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar <options>
|
||||
```
|
||||
|
||||
## Config syntax details
|
||||
|
@ -167,15 +258,13 @@ the following excerpt of the DTD schema indicates that there is a default value
|
|||
<!ATTLIST timeFormat type NMTOKEN "standard" >
|
||||
```
|
||||
|
||||
See `build-icu-data.xml` for documentation of all options and additional customization.
|
||||
See `config.xml` for documentation of all options and additional customization.
|
||||
|
||||
## Running unit tests (CURRENTLY FAILING)
|
||||
|
||||
## Running unit tests
|
||||
|
||||
```sh
|
||||
mvn test -DCLDR_DIR="$CLDR_DATA_DIR"
|
||||
```
|
||||
$ mvn test -DCLDR_DIR="$CLDR_DATA_DIR"
|
||||
```
|
||||
|
||||
|
||||
## Importing and running from an IDE
|
||||
|
||||
|
@ -183,3 +272,5 @@ This project should be easy to import into an IDE which supports Maven developme
|
|||
as IntelliJ or Eclipse. It uses a local Maven repository directory for the unpublished
|
||||
CLDR libraries (which are included in the project), but otherwise gets all dependencies
|
||||
via Maven's public repositories.
|
||||
|
||||
But before importing and running it you still need to build the ICU4J and the CLDR library (see above).
|
||||
|
|
|
@ -1,11 +0,0 @@
|
|||
*********************************************************************
|
||||
*** © 2019 and later: Unicode, Inc. and others. ***
|
||||
*** License & terms of use: http://www.unicode.org/copyright.html ***
|
||||
*********************************************************************
|
||||
|
||||
The instructions for the LdmlConverter tool (a.k.a. CLDR-to-ICU converter) have
|
||||
moved to README.md in this directory.
|
||||
|
||||
Please read README.md, or better yet, view the rendered form of its Markdown
|
||||
contents online at Github
|
||||
(ex: https://github.com/unicode-org/icu/tree/main/tools/cldr/cldr-to-icu)
|
|
@ -1,472 +0,0 @@
|
|||
<!-- © 2019 and later: Unicode, Inc. and others.
|
||||
License & terms of use: http://www.unicode.org/copyright.html -->
|
||||
|
||||
<!--================================================================================
|
||||
Setup:
|
||||
Follow the installation instructions in README.txt in this directory.
|
||||
|
||||
To build ICU data files:
|
||||
1: Determine the CLDR base directory and set the CLDR_DIR environment variable.
|
||||
2: Determine the flags required (see the list of properties below).
|
||||
3: Run: ant -f build-icu-data.xml -D<flag-name>=<flag-value>...
|
||||
================================================================================-->
|
||||
<!-- TODO: Add things like copying of a template directory and deleting previous files
|
||||
(perhaps always generate into a temporary directory and copy back to avoid having
|
||||
inconsistent state when the conversion is cancelled). -->
|
||||
<project name="Convert" default="all" basedir="." xmlns:if="ant:if" xmlns:unless="ant:unless">
|
||||
|
||||
<target name="all" depends="init-args, prepare-jar, clean, convert"/>
|
||||
|
||||
<!-- Initialize the properties which were not already set on the command line. -->
|
||||
<target name="init-args">
|
||||
<property environment="env"/>
|
||||
<!-- Inherit properties from environment variable unless specified. As usual
|
||||
with Ant, this is messier than it should be. All we are saying here is:
|
||||
"Use the property if explicitly set, otherwise use the environment variable."
|
||||
We cannot just set the property to the environment variable, since expansion
|
||||
fails for non existent properties, and you are left with a literal value of
|
||||
"${env.CLDR_DATA_DIR}". -->
|
||||
<condition property="cldrDataDir" value="${env.CLDR_DATA_DIR}">
|
||||
<isset property="env.CLDR_DATA_DIR"/>
|
||||
</condition>
|
||||
<fail unless="cldrDataDir"
|
||||
message="Set the CLDR_DATA_DIR environment variable (or cldrDataDir property) to the CLDR data directory (typically ending in '/production')"/>
|
||||
|
||||
<!-- Ant does not inherit this from the user's environment (and it can matter).
|
||||
This is only needed because we have to "exec" a new Ant task below. -->
|
||||
<condition property="javaHome" value="${env.JAVA_HOME}">
|
||||
<isset property="env.JAVA_HOME"/>
|
||||
</condition>
|
||||
|
||||
<!-- The output directory into which to write the converted ICU data. By default
|
||||
this will overwrite (without deletion) the ICU data files in this ICU release,
|
||||
so it is recommended that for testing, it be set to another value. -->
|
||||
<property name="outDir" value="${basedir}/../../../icu4c/source/data/"/>
|
||||
|
||||
<!-- The output directory into which to write generated C/C++ code. By default
|
||||
this will overwrite (without deletion) the generated C/C++ files in this
|
||||
ICU release, so it is recommended that for testing, it be set to another value. -->
|
||||
<property name="genCCodeDir" value="${basedir}/../../../icu4c/source/"/>
|
||||
|
||||
<!-- The output directory into which to write generated Java code. By default
|
||||
this will overwrite (without deletion) the generated Java files in this
|
||||
ICU release, so it is recommended that for testing, it be set to another value. -->
|
||||
<property name="genJavaCodeDir" value="${basedir}/../../../icu4j/main/core"/>
|
||||
|
||||
<!-- Set this to true to prevent build-icu-data.xml from generating the generated
|
||||
ICU source files -->
|
||||
<property name="dontGenCode" value="false" />
|
||||
|
||||
<!-- The directory in which the additional ICU XML data is stored. -->
|
||||
<property name="specialsDir" value="${basedir}/../../../icu4c/source/data/xml"/>
|
||||
|
||||
<!-- Default value for ICU version (icuver.txt). Update this for each release. -->
|
||||
<property name="icuVersion" value="76.1.0.0"/>
|
||||
|
||||
<!-- Default value for ICU data version (icuver.txt). Update this for each release. -->
|
||||
<property name="icuDataVersion" value="76.1.0.0"/>
|
||||
|
||||
<!-- An override for the CLDR version string (icuver.txt and others). This will be
|
||||
extracted from the CLDR library used for building the data if not set here. -->
|
||||
<property name="cldrVersion" value=""/>
|
||||
|
||||
<!-- The minimum draft status for CLDR data to be used in the conversion. See
|
||||
CldrDraftStatus for more details. -->
|
||||
<property name="minDraftStatus" value="contributed"/>
|
||||
|
||||
<!-- A regular expression to match the locale IDs to be generated (useful for
|
||||
debugging specific regions). This is applied after locale ID specifications
|
||||
have been expanded into full locale IDs, so the value "en" will NOT match
|
||||
"en_GB" or "en_001" etc. -->
|
||||
<property name="localeIdFilter" value=""/>
|
||||
|
||||
<!-- Whether to synthetically generate "pseudo locale" data ("en_XA" and "ar_XB"). -->
|
||||
<property name="includePseudoLocales" value="false"/>
|
||||
|
||||
<!-- Whether to emit a debug report containing some possibly useful information after
|
||||
the conversion has finished. -->
|
||||
<!-- TODO: Currently this isn't hugely useful, so find out what people want. -->
|
||||
<property name="emitReport" value="false"/>
|
||||
|
||||
<!-- List of output "types" to be generated (e.g. "rbnf,plurals,locales"); an empty
|
||||
list means "build everything".
|
||||
|
||||
Note that the grouping of types is based on the legacy converter behaviour and
|
||||
is not always directly associated with an output directory (e.g. "locales"
|
||||
produces locale data for curr/, lang/, main/, region/, unit/, zone/ but NOT
|
||||
coll/, brkitr/ or rbnf/).
|
||||
|
||||
Pass in the value "HELP" (or any invalid value) to see the full list of types. -->
|
||||
<!-- TODO: Find out what common use cases are and use them. -->
|
||||
<property name="outputTypes" value=""/>
|
||||
|
||||
<!-- Override to force the 'clean' task to delete files it cannot determine to be
|
||||
auto-generated by this tool. This is useful if the file header changes since
|
||||
the heading is what's used to recognize auto-generated files. -->
|
||||
<property name="forceDelete" value="false"/>
|
||||
</target>
|
||||
|
||||
<!-- Build a standalone JAR which is called by Ant (and which avoids needing to mess
|
||||
about making Ant know the Maven class-path). -->
|
||||
<target name="prepare-jar" depends="init-args">
|
||||
<exec executable="mvn" searchpath="true" failonerror="true">
|
||||
<arg value="compile"/>
|
||||
</exec>
|
||||
</target>
|
||||
|
||||
<!-- Somewhat hacky wrapper target which invokes the real conversion task.
|
||||
This is done so we can set the environment variable of the new process and
|
||||
effectively overwrite the CLDR_DIR value. If ever the CLDR library doesn't
|
||||
need to use CLDR_DIR at runtime to find the production data, this can all be
|
||||
removed. -->
|
||||
<target name="convert" depends="init-args, prepare-jar">
|
||||
<exec executable="ant" searchpath="true" failonerror="true">
|
||||
<!-- The CLDR library wants CLDR_DIR set, to the data directory. -->
|
||||
<env key="CLDR_DIR" value="${cldrDataDir}" />
|
||||
<!-- Force inherit JAVA_HOME (this can be important). -->
|
||||
<env key="JAVA_HOME" value="${javaHome}" />
|
||||
<!-- Initial Ant command line with all the "interesting" bit in. -->
|
||||
<arg line="-f build-icu-data.xml convert-impl -DcldrDir=${cldrDataDir}"/>
|
||||
<!-- List all properties in the "convert-impl" task (except cldrDir). -->
|
||||
<arg value="-DoutDir=${outDir}"/>
|
||||
<arg value="-DgenCCodeDir=${genCCodeDir}"/>
|
||||
<arg value="-DgenJavaCodeDir=${genJavaCodeDir}"/>
|
||||
<arg value="-DdontGenCode=${dontGenCode}"/>
|
||||
<arg value="-DspecialsDir=${specialsDir}"/>
|
||||
<arg value="-DoutputTypes=${outputTypes}"/>
|
||||
<arg value="-DicuVersion=${icuVersion}"/>
|
||||
<arg value="-DicuDataVersion=${icuDataVersion}"/>
|
||||
<arg value="-DcldrVersion=${cldrVersion}"/>
|
||||
<arg value="-DminDraftStatus=${minDraftStatus}"/>
|
||||
<arg value="-DlocaleIdFilter=${localeIdFilter}"/>
|
||||
<arg value="-DincludePseudoLocales=${includePseudoLocales}"/>
|
||||
<arg value="-DemitReport=${emitReport}"/>
|
||||
</exec>
|
||||
</target>
|
||||
|
||||
<!-- Do the actual CLDR data conversion, based on the command line arguments, built in
|
||||
default properties and the configuration in the "<convert>" element below. -->
|
||||
<target name="convert-impl">
|
||||
<taskdef name="convert" classname="org.unicode.icu.tool.cldrtoicu.ant.ConvertIcuDataTask">
|
||||
<classpath>
|
||||
<pathelement path="target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar"/>
|
||||
</classpath>
|
||||
</taskdef>
|
||||
<taskdef name="generateCode" classname="org.unicode.icu.tool.cldrtoicu.ant.GenerateCodeTask">
|
||||
<classpath>
|
||||
<pathelement path="target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar"/>
|
||||
</classpath>
|
||||
</taskdef>
|
||||
<convert cldrDir="${cldrDir}" outputDir="${outDir}" specialsDir="${specialsDir}"
|
||||
outputTypes="${outputTypes}" cldrVersion="${cldrVersion}"
|
||||
icuVersion="${icuVersion}" icuDataVersion="${icuDataVersion}"
|
||||
minimalDraftStatus="${minDraftStatus}" localeIdFilter="${localeIdFilter}"
|
||||
includePseudoLocales="${includePseudoLocales}" emitReport="${emitReport}">
|
||||
|
||||
<!-- The primary set of locale IDs to be generated by default. The IDs in this list are
|
||||
automatically expanded to include default scripts and all available regions. The
|
||||
rules are:
|
||||
|
||||
1) Base languages are expanded to include default scripts (e.g. "en" -> "en_Latn").
|
||||
2) All region and variant subtags are added for any base language or language+script
|
||||
(e.g. "en" -> "en_GB" or "shi_Latn" -> "shi_Latn_MA").
|
||||
|
||||
If a non-default script is desired it should be listed explicitly (e.g. "sr_Latn").
|
||||
|
||||
Locale IDs with deprecated subtags (which become aliases) must still be listed in
|
||||
full (e.g. "en_RH" or "sr_Latn_YU").
|
||||
-->
|
||||
<localeIds>
|
||||
// A
|
||||
af, agq, ak, am, ar, ars, as, asa, ast, az, az_AZ, az_Cyrl
|
||||
|
||||
// B
|
||||
bas, be, bem, bez, bg, bgc, bho, blo, bm, bn, bo, br, brx, bs, bs_BA, bs_Cyrl
|
||||
|
||||
// C
|
||||
ca, ccp, ce, ceb, cgg, chr, ckb, cs, csw, cv, cy
|
||||
|
||||
// D
|
||||
da, dav, de, dje, doi, dsb, dua, dyo, dz
|
||||
|
||||
// E
|
||||
ebu, ee, el, en, en_NH, en_RH, eo, es, et, eu, ewo
|
||||
|
||||
// F
|
||||
fa, ff, ff_Adlm, ff_CM, ff_GN, ff_MR, ff_SN, fi, fil, fo, fr, fur, fy
|
||||
|
||||
// G
|
||||
ga, gaa, gd, gl, gsw, gu, guz, gv
|
||||
|
||||
// H
|
||||
ha, haw, he, hi, hi_Latn, hr, hsb, hu, hy
|
||||
|
||||
// I
|
||||
ia, id, ie, ig, ii, in, in_ID, is, it, iw, iw_IL
|
||||
|
||||
// J
|
||||
ja, jgo, jmc, jv
|
||||
|
||||
// K
|
||||
ka, kab, kam, kde, kea, kgp, khq, ki, kk, kkj, kl, kln, km, kn, ko, kok, kok_Latn, ks
|
||||
ks_Deva, ks_IN, ksb, ksf, ksh, ku, kw, kxv, kxv_Deva, kxv_IN, kxv_Orya, kxv_Telu, ky
|
||||
|
||||
// L
|
||||
lag, lb, lg, lij, lkt, lmo, ln, lo, lrc, lt, lu, luo, luy, lv
|
||||
|
||||
// M
|
||||
mai, mas, mer, mfe, mg, mgh, mgo, mi, mk, ml, mn, mni, mni_IN, mo, mr, ms
|
||||
mt, mua, my, mzn
|
||||
|
||||
// N
|
||||
naq, nb, nd, nds, ne, nl, nmg, nn, nnh, no, no_NO, no_NO_NY, nqo, nso, nus, nyn
|
||||
|
||||
// O
|
||||
oc, om, or, os
|
||||
|
||||
// P
|
||||
pa, pa_Arab, pa_IN, pa_PK, pcm, pl, prg, ps, pt
|
||||
|
||||
// Q
|
||||
qu
|
||||
|
||||
// R
|
||||
raj, rm, rn, ro, rof, ru, rw, rwk
|
||||
|
||||
// S
|
||||
sa, sah, saq, sat, sat_IN, sbp, sc, sd, sd_Deva, sd_IN, sd_PK, se, seh, ses, sg, sh, sh_BA, sh_CS, sh_YU
|
||||
shi, shi_Latn, shi_MA, si, sk, sl, smn, sn, so, sq, sr, sr_BA, sr_CS, sr_Cyrl_CS, sr_Cyrl_YU, sr_Latn
|
||||
sr_Latn_CS, sr_Latn_YU, sr_ME, sr_RS, sr_XK, sr_YU, st, su, su_ID, sv, sw, syr, szl
|
||||
|
||||
// T
|
||||
ta, te, teo, tg, th, ti, tk, tl, tl_PH, tn, to, tok, tr, tt, twq, tzm
|
||||
|
||||
// U
|
||||
ug, uk, ur, uz, uz_AF, uz_Arab, uz_Cyrl, uz_UZ
|
||||
|
||||
// V
|
||||
vai, vai_LR, vai_Latn, vec, vi, vmw, vun
|
||||
|
||||
// W
|
||||
wae, wo
|
||||
|
||||
// X
|
||||
xh, xnr, xog
|
||||
|
||||
// Y
|
||||
yav, yi, yo, yrl, yue, yue_CN, yue_HK, yue_Hans
|
||||
|
||||
// Z
|
||||
za, zgh, zh, zh_CN, zh_HK, zh_Hant, zh_MO, zh_SG, zh_TW, zu
|
||||
</localeIds>
|
||||
|
||||
<!-- The following elements configure directories in which a subset of the available
|
||||
locales IDs should be generated. Unlike the main <localeId> element, these
|
||||
filters must specify all locale IDs in full (but since they mostly select base
|
||||
languages, this isn't a big deal).
|
||||
|
||||
As well as allowing some data directories to have a subset of available data (via
|
||||
the <localeIds> element) there are also mechanisms for controlling aliasing and
|
||||
the locale parent relation which allows the sharing of some ICU data in cases
|
||||
where it would otherwise need to be copied. The two mechanisms are:
|
||||
|
||||
1: inheritLanguageSubtag: Used to rewrite the parent of a locale ID from "root" to
|
||||
its language subtag (e.g. "zh_Hant" has a natural parent of "root", but to allow
|
||||
some base language data to be shared it can be made to have a parent of "zh").
|
||||
|
||||
2: forcedAlias: Used to add aliases for specific directories in order to affect the
|
||||
ICU behaviour in special cases.
|
||||
|
||||
Between them these mechanisms are known as "tailorings" of the affected locales. -->
|
||||
<!-- TODO: Explain why these special cases are needed/different. -->
|
||||
|
||||
<!-- Collation data is large, but also more sharable than other data, which is why there
|
||||
are a number of aliases and parent remappings for this directory. -->
|
||||
<directory dir="coll" inheritLanguageSubtag="bs_Cyrl, sr_Latn, zh_Hant">
|
||||
<!-- These aliases are to avoid needing to copy and maintain the same collation data
|
||||
for "zh" and "yue". The maximized versions of "yue_Hans" is "yue_Hans_CN" (vs
|
||||
"zh_Hans_CN"), and for "yue" it's "yue_Hant_HK" (vs "zh_Hant_HK"), so the
|
||||
aliases are effectively just rewriting the base language. -->
|
||||
<forcedAlias source="yue" target="zh_Hant"/>
|
||||
<forcedAlias source="yue_Hant" target="zh_Hant"/>
|
||||
<forcedAlias source="yue_CN" target="zh_Hans"/>
|
||||
<forcedAlias source="yue_Hans" target="zh_Hans"/>
|
||||
<forcedAlias source="yue_Hans_CN" target="zh_Hans"/>
|
||||
|
||||
<!-- TODO: Find out and document this properly. -->
|
||||
<forcedAlias source="sr_ME" target="sr_Cyrl_ME"/>
|
||||
|
||||
<localeIds>
|
||||
root,
|
||||
|
||||
// A-B
|
||||
af, am, ars, ar, as, az, be, bg, bn, bo, br, bs_Cyrl, bs,
|
||||
|
||||
// C-F
|
||||
ca, ceb, chr, cs, cy, da, de_AT, de, dsb, dz, ee, el, en,
|
||||
en_US_POSIX, en_US, eo, es, et, fa_AF, fa, ff_Adlm, ff, fil, fi, fo, fr_CA, fr, fy,
|
||||
|
||||
// G-J
|
||||
ga, gl, gu, ha, haw, he, hi, hr, hsb, hu, hy,
|
||||
id_ID, id, ig, in, in_ID, is, it, iw_IL, iw, ja,
|
||||
|
||||
// K-P
|
||||
ka, kk, kl, km, kn, kok, ko, ku, ky, lb, lij, lkt, ln, lo, lt, lv,
|
||||
mk, ml, mn, mo, mr, ms, mt, my, nb, nb_NO, ne, nl, nn, no, no_NO, nso,
|
||||
om, or, pa_IN, pa, pa_Guru, pl, ps, pt,
|
||||
|
||||
// R-T
|
||||
ro, ru, sa, se, sh_BA, sh_CS, sh, sh_YU, si, sk, sl, smn, sq,
|
||||
sr_BA, sr_Cyrl_ME, sr_Latn, sr_ME, sr_RS, sr, st, sv, sw,
|
||||
ta, te, th, tk, tn, to, tr,
|
||||
|
||||
// U-Z
|
||||
ug, uk, ur, uz, vi, wae, wo, xh, yi, yo, yue_CN, yue_Hans_CN, yue_Hans
|
||||
yue_Hant, yue, zh_CN, zh_Hans, zh_Hant, zh_HK, zh_MO, zh_SG, zh_TW, zh, zu
|
||||
</localeIds>
|
||||
</directory>
|
||||
|
||||
<directory dir="rbnf">
|
||||
<!-- It is not at all clear why this is being done. It's certainly not exactly the
|
||||
same as above, since (a) the alias is reversed (b) "zh_Hant" does exist, with
|
||||
different data than "yue", so this alias is not just rewriting the base
|
||||
language. -->
|
||||
<!-- TODO: Find out and document this properly. -->
|
||||
<forcedAlias source="zh_Hant_HK" target="yue"/>
|
||||
|
||||
<localeIds>
|
||||
root,
|
||||
|
||||
// A-E
|
||||
af, ak, am, ars, ar, az, be, bg, bs, ca, ccp, chr, cs, cy,
|
||||
da, de_CH, de, ee, el, en_001, en_IN, en, eo, es_419, es_DO,
|
||||
es_GT, es_HN, es_MX, es_NI, es_PA, es_PR, es_SV, es, es_US, et,
|
||||
|
||||
// F-P
|
||||
fa_AF, fa, ff, fil, fi, fo, fr_BE, fr_CH, fr, ga, he, hi, hr,
|
||||
hu, hy, id, in, is, it, iw, ja, ka, kk, kl, km, ko, ky, lb,
|
||||
lo, lrc, lt, lv, mk, ms, mt, my, nb, ne, nl, nn, no, pl, pt_PT, pt,
|
||||
|
||||
// Q-Z
|
||||
qu, ro, ru, se, sh, sk, sl, sq, sr_Latn, sr, su, sv, sw, ta, th, tr,
|
||||
uk, vec, vi, yue_Hans, yue, zh_Hant_HK, zh_Hant, zh_HK, zh_MO, zh_TW, zh
|
||||
</localeIds>
|
||||
</directory>
|
||||
|
||||
<directory dir="brkitr" inheritLanguageSubtag="zh_Hant">
|
||||
<localeIds>
|
||||
root,
|
||||
de, el, en, en_US_POSIX, en_US, es, fi, fr, it, ja, ko, pt, ru, sv, zh_Hant, zh
|
||||
</localeIds>
|
||||
</directory>
|
||||
|
||||
<!-- GLOBAL ALIASES -->
|
||||
|
||||
<!-- Some spoken languages (e.g. "ars") inherit all their data from a written language
|
||||
(e.g. "ar_SA"). However CLDR doesn't currently support a way to represent that
|
||||
relationship. Unlike deprecated languages for which an alias can be inferred from
|
||||
the "languageAlias" CLDR data, there's no way in CLDR to represent the fact that
|
||||
we want "ars" (a non-deprecated language) to inherit the data of "ar_SA".
|
||||
|
||||
This alias is the first example of potentially many cases where ICU needs to
|
||||
generate an alias in order to affect "sideways inheritance" for spoken languages,
|
||||
and at some stage it should probably be supported properly in the CLDR data. -->
|
||||
<forcedAlias source="ars" target="ar_SA"/>
|
||||
|
||||
<!-- A legacy global alias (note that "no_NO_NY" is not even structurally valid). -->
|
||||
<forcedAlias source="no_NO_NY" target="nn_NO"/>
|
||||
|
||||
<!-- This one is a bit silly, it is just to generate a stub for no_NO, which is
|
||||
not in CLDR. If we do not do this, then including it in localeIds will generate
|
||||
empty no_Latn and no_Latn_NO and then no_NO aliasing to no_Latn_NO. -->
|
||||
<forcedAlias source="no_NO" target="no"/>
|
||||
|
||||
<!-- ALTERNATE VALUES -->
|
||||
|
||||
<!-- The following elements configure alternate values for some special case paths.
|
||||
The target path will only be replaced if both it, and the source path, exist in
|
||||
the CLDR data (paths will not be modified if only the source path exists).
|
||||
|
||||
Since the paths must represent the same semantic type of data, they must be in the
|
||||
same "namespace" (same element names) and must not contain value attributes. Thus
|
||||
they can only differ by distinguishing attributes (either added or modified).
|
||||
|
||||
This feature is typically used to select alternate translations (e.g. short forms)
|
||||
for certain paths. -->
|
||||
<!-- <altPath target="//path/to/value[@attr='foo']"
|
||||
source="//path/to/value[@attr='bar']"
|
||||
locales="xx,yy_ZZ"/> -->
|
||||
</convert>
|
||||
|
||||
<generateCode cldrDir="${cldrDir}" cOutDir="${genCCodeDir}" javaOutDir="${genJavaCodeDir}" unless:true="${dontGenCode}" />
|
||||
</target>
|
||||
|
||||
<target name="clean" depends="init-args, prepare-jar">
|
||||
<taskdef name="outputDirectories" classname="org.unicode.icu.tool.cldrtoicu.ant.CleanOutputDirectoryTask">
|
||||
<classpath>
|
||||
<pathelement path="target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar"/>
|
||||
</classpath>
|
||||
</taskdef>
|
||||
<taskdef name="generateCode" classname="org.unicode.icu.tool.cldrtoicu.ant.GenerateCodeTask">
|
||||
<classpath>
|
||||
<pathelement path="target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar"/>
|
||||
</classpath>
|
||||
</taskdef>
|
||||
|
||||
<!-- If a directory is listed here, then every file in it is assumed to be automatically
|
||||
generated by the conversion tool, unless it is explicitly listed in a <retain> element.
|
||||
The tool then checks every file to determine if it has the expected header present,
|
||||
indiciating that it was automatically generated, before deleting it.
|
||||
|
||||
If unexpected files are found, the "clean" task will fail without deleting anything
|
||||
(unless'forceDelete' is set to override this). Note that even if 'forceDelete' is set,
|
||||
the files listed explicitly below will never be deleted by this process.
|
||||
|
||||
This two-step approach minimizes the risk that the conversion process will ever
|
||||
accidentally delete a manually maintained file.
|
||||
-->
|
||||
<outputDirectories root="${outDir}" forceDelete="${forceDelete}">
|
||||
<dir name="brkitr">
|
||||
<retain path="adaboost"/>
|
||||
<retain path="dictionaries"/>
|
||||
<retain path="lstm"/>
|
||||
<retain path="rules"/>
|
||||
</dir>
|
||||
<dir name="coll">
|
||||
<!-- Legacy files whose file names aren't supported for automatic generation.
|
||||
Simple to maintain manually and unlikely to ever change again. -->
|
||||
<retain path="de__PHONEBOOK.txt"/>
|
||||
<retain path="de_.txt"/>
|
||||
<retain path="es__TRADITIONAL.txt"/>
|
||||
<retain path="es_.txt"/>
|
||||
</dir>
|
||||
<dir name="curr"/>
|
||||
<dir name="lang"/>
|
||||
<dir name="locales"/>
|
||||
<dir name="misc">
|
||||
<!-- Machine generated files produced by different tools.
|
||||
Possibly worth moving into the new LDML conversion tool one day. -->
|
||||
<retain path="currencyNumericCodes.txt"/>
|
||||
<retain path="zoneinfo64.txt"/>
|
||||
<!-- Project file (not ICU data), unlikely to ever be auto-generated. -->
|
||||
<retain path="icudata.rc"/>
|
||||
<!-- Small high-level metadata file, stable and easy to maintain manually. -->
|
||||
<retain path="icustd.txt"/>
|
||||
</dir>
|
||||
<dir name="rbnf"/>
|
||||
<dir name="region"/>
|
||||
<dir name="translit">
|
||||
<!-- Small, easy to maintain, special case top-level files. -->
|
||||
<retain path="en.txt"/>
|
||||
<retain path="el.txt"/>
|
||||
</dir>
|
||||
<dir name="unit"/>
|
||||
<dir name="zone">
|
||||
<!-- Manually edited to support TZ database name compatibility. -->
|
||||
<retain path="tzdbNames.txt"/>
|
||||
</dir>
|
||||
</outputDirectories>
|
||||
|
||||
<generateCode cOutDir="${genCCodeDir}" javaOutDir="${genJavaCodeDir}" action="clean" />
|
||||
</target>
|
||||
</project>
|
295
tools/cldr/cldr-to-icu/config.xml
Normal file
295
tools/cldr/cldr-to-icu/config.xml
Normal file
|
@ -0,0 +1,295 @@
|
|||
<!-- © 2019 and later: Unicode, Inc. and others.
|
||||
License & terms of use: http://www.unicode.org/copyright.html -->
|
||||
|
||||
<config>
|
||||
<convert>
|
||||
|
||||
<!-- The primary set of locale IDs to be generated by default. The IDs in this list are
|
||||
automatically expanded to include default scripts and all available regions. The
|
||||
rules are:
|
||||
|
||||
1) Base languages are expanded to include default scripts (e.g. "en" -> "en_Latn").
|
||||
2) All region and variant subtags are added for any base language or language+script
|
||||
(e.g. "en" -> "en_GB" or "shi_Latn" -> "shi_Latn_MA").
|
||||
|
||||
If a non-default script is desired it should be listed explicitly (e.g. "sr_Latn").
|
||||
|
||||
Locale IDs with deprecated subtags (which become aliases) must still be listed in
|
||||
full (e.g. "en_RH" or "sr_Latn_YU").
|
||||
-->
|
||||
<localeIds>
|
||||
// A
|
||||
af, agq, ak, am, ar, ars, as, asa, ast, az, az_AZ, az_Cyrl
|
||||
|
||||
// B
|
||||
bas, be, bem, bez, bg, bgc, bho, blo, bm, bn, bo, br, brx, bs, bs_BA, bs_Cyrl
|
||||
|
||||
// C
|
||||
ca, ccp, ce, ceb, cgg, chr, ckb, cs, csw, cv, cy
|
||||
|
||||
// D
|
||||
da, dav, de, dje, doi, dsb, dua, dyo, dz
|
||||
|
||||
// E
|
||||
ebu, ee, el, en, en_NH, en_RH, eo, es, et, eu, ewo
|
||||
|
||||
// F
|
||||
fa, ff, ff_Adlm, ff_CM, ff_GN, ff_MR, ff_SN, fi, fil, fo, fr, fur, fy
|
||||
|
||||
// G
|
||||
ga, gaa, gd, gl, gsw, gu, guz, gv
|
||||
|
||||
// H
|
||||
ha, haw, he, hi, hi_Latn, hr, hsb, hu, hy
|
||||
|
||||
// I
|
||||
ia, id, ie, ig, ii, in, in_ID, is, it, iw, iw_IL
|
||||
|
||||
// J
|
||||
ja, jgo, jmc, jv
|
||||
|
||||
// K
|
||||
ka, kab, kam, kde, kea, kgp, khq, ki, kk, kkj, kl, kln, km, kn, ko, kok, kok_Latn, ks
|
||||
ks_Deva, ks_IN, ksb, ksf, ksh, ku, kw, kxv, kxv_Deva, kxv_IN, kxv_Orya, kxv_Telu, ky
|
||||
|
||||
// L
|
||||
lag, lb, lg, lij, lkt, lmo, ln, lo, lrc, lt, lu, luo, luy, lv
|
||||
|
||||
// M
|
||||
mai, mas, mer, mfe, mg, mgh, mgo, mi, mk, ml, mn, mni, mni_IN, mo, mr, ms
|
||||
mt, mua, my, mzn
|
||||
|
||||
// N
|
||||
naq, nb, nd, nds, ne, nl, nmg, nn, nnh, no, no_NO, no_NO_NY, nqo, nso, nus, nyn
|
||||
|
||||
// O
|
||||
oc, om, or, os
|
||||
|
||||
// P
|
||||
pa, pa_Arab, pa_IN, pa_PK, pcm, pl, prg, ps, pt
|
||||
|
||||
// Q
|
||||
qu
|
||||
|
||||
// R
|
||||
raj, rm, rn, ro, rof, ru, rw, rwk
|
||||
|
||||
// S
|
||||
sa, sah, saq, sat, sat_IN, sbp, sc, sd, sd_Deva, sd_IN, sd_PK, se, seh, ses, sg, sh, sh_BA, sh_CS, sh_YU
|
||||
shi, shi_Latn, shi_MA, si, sk, sl, smn, sn, so, sq, sr, sr_BA, sr_CS, sr_Cyrl_CS, sr_Cyrl_YU, sr_Latn
|
||||
sr_Latn_CS, sr_Latn_YU, sr_ME, sr_RS, sr_XK, sr_YU, st, su, su_ID, sv, sw, syr, szl
|
||||
|
||||
// T
|
||||
ta, te, teo, tg, th, ti, tk, tl, tl_PH, tn, to, tok, tr, tt, twq, tzm
|
||||
|
||||
// U
|
||||
ug, uk, ur, uz, uz_AF, uz_Arab, uz_Cyrl, uz_UZ
|
||||
|
||||
// V
|
||||
vai, vai_LR, vai_Latn, vec, vi, vmw, vun
|
||||
|
||||
// W
|
||||
wae, wo
|
||||
|
||||
// X
|
||||
xh, xnr, xog
|
||||
|
||||
// Y
|
||||
yav, yi, yo, yrl, yue, yue_CN, yue_HK, yue_Hans
|
||||
|
||||
// Z
|
||||
za, zgh, zh, zh_CN, zh_HK, zh_Hant, zh_MO, zh_SG, zh_TW, zu
|
||||
</localeIds>
|
||||
|
||||
<!-- The following elements configure directories in which a subset of the available
|
||||
locales IDs should be generated. Unlike the main <localeId> element, these
|
||||
filters must specify all locale IDs in full (but since they mostly select base
|
||||
languages, this isn't a big deal).
|
||||
|
||||
As well as allowing some data directories to have a subset of available data (via
|
||||
the <localeIds> element) there are also mechanisms for controlling aliasing and
|
||||
the locale parent relation which allows the sharing of some ICU data in cases
|
||||
where it would otherwise need to be copied. The two mechanisms are:
|
||||
|
||||
1: inheritLanguageSubtag: Used to rewrite the parent of a locale ID from "root" to
|
||||
its language subtag (e.g. "zh_Hant" has a natural parent of "root", but to allow
|
||||
some base language data to be shared it can be made to have a parent of "zh").
|
||||
|
||||
2: forcedAlias: Used to add aliases for specific directories in order to affect the
|
||||
ICU behaviour in special cases.
|
||||
|
||||
Between them these mechanisms are known as "tailorings" of the affected locales. -->
|
||||
<!-- TODO: Explain why these special cases are needed/different. -->
|
||||
|
||||
<!-- Collation data is large, but also more sharable than other data, which is why there
|
||||
are a number of aliases and parent remappings for this directory. -->
|
||||
<directory dir="coll" inheritLanguageSubtag="bs_Cyrl, sr_Latn, zh_Hant">
|
||||
<!-- These aliases are to avoid needing to copy and maintain the same collation data
|
||||
for "zh" and "yue". The maximized versions of "yue_Hans" is "yue_Hans_CN" (vs
|
||||
"zh_Hans_CN"), and for "yue" it's "yue_Hant_HK" (vs "zh_Hant_HK"), so the
|
||||
aliases are effectively just rewriting the base language. -->
|
||||
<forcedAlias source="yue" target="zh_Hant"/>
|
||||
<forcedAlias source="yue_Hant" target="zh_Hant"/>
|
||||
<forcedAlias source="yue_CN" target="zh_Hans"/>
|
||||
<forcedAlias source="yue_Hans" target="zh_Hans"/>
|
||||
<forcedAlias source="yue_Hans_CN" target="zh_Hans"/>
|
||||
<!-- TODO: Find out and document this properly. -->
|
||||
<forcedAlias source="sr_ME" target="sr_Cyrl_ME"/>
|
||||
|
||||
<localeIds>
|
||||
root,
|
||||
|
||||
// A-B
|
||||
af, am, ars, ar, as, az, be, bg, bn, bo, br, bs_Cyrl, bs,
|
||||
|
||||
// C-F
|
||||
ca, ceb, chr, cs, cy, da, de_AT, de, dsb, dz, ee, el, en,
|
||||
en_US_POSIX, en_US, eo, es, et, fa_AF, fa, ff_Adlm, ff, fil, fi, fo, fr_CA, fr, fy,
|
||||
|
||||
// G-J
|
||||
ga, gl, gu, ha, haw, he, hi, hr, hsb, hu, hy,
|
||||
id_ID, id, ig, in, in_ID, is, it, iw_IL, iw, ja,
|
||||
|
||||
// K-P
|
||||
ka, kk, kl, km, kn, kok, ko, ku, ky, lb, lij, lkt, ln, lo, lt, lv,
|
||||
mk, ml, mn, mo, mr, ms, mt, my, nb, nb_NO, ne, nl, nn, no, no_NO, nso,
|
||||
om, or, pa_IN, pa, pa_Guru, pl, ps, pt,
|
||||
|
||||
// R-T
|
||||
ro, ru, sa, se, sh_BA, sh_CS, sh, sh_YU, si, sk, sl, smn, sq,
|
||||
sr_BA, sr_Cyrl_ME, sr_Latn, sr_ME, sr_RS, sr, st, sv, sw,
|
||||
ta, te, th, tk, tn, to, tr,
|
||||
|
||||
// U-Z
|
||||
ug, uk, ur, uz, vi, wae, wo, xh, yi, yo, yue_CN, yue_Hans_CN, yue_Hans
|
||||
yue_Hant, yue, zh_CN, zh_Hans, zh_Hant, zh_HK, zh_MO, zh_SG, zh_TW, zh, zu
|
||||
</localeIds>
|
||||
</directory>
|
||||
|
||||
<directory dir="rbnf">
|
||||
<!-- It is not at all clear why this is being done. It's certainly not exactly the
|
||||
same as above, since (a) the alias is reversed (b) "zh_Hant" does exist, with
|
||||
different data than "yue", so this alias is not just rewriting the base
|
||||
language. -->
|
||||
<!-- TODO: Find out and document this properly. -->
|
||||
<forcedAlias source="zh_Hant_HK" target="yue"/>
|
||||
|
||||
<localeIds>
|
||||
root,
|
||||
|
||||
// A-E
|
||||
af, ak, am, ars, ar, az, be, bg, bs, ca, ccp, chr, cs, cy,
|
||||
da, de_CH, de, ee, el, en_001, en_IN, en, eo, es_419, es_DO,
|
||||
es_GT, es_HN, es_MX, es_NI, es_PA, es_PR, es_SV, es, es_US, et,
|
||||
|
||||
// F-P
|
||||
fa_AF, fa, ff, fil, fi, fo, fr_BE, fr_CH, fr, ga, he, hi, hr,
|
||||
hu, hy, id, in, is, it, iw, ja, ka, kk, kl, km, ko, ky, lb,
|
||||
lo, lrc, lt, lv, mk, ms, mt, my, nb, ne, nl, nn, no, pl, pt_PT, pt,
|
||||
|
||||
// Q-Z
|
||||
qu, ro, ru, se, sh, sk, sl, sq, sr_Latn, sr, su, sv, sw, ta, th, tr,
|
||||
uk, vec, vi, yue_Hans, yue, zh_Hant_HK, zh_Hant, zh_HK, zh_MO, zh_TW, zh
|
||||
</localeIds>
|
||||
</directory>
|
||||
|
||||
<directory dir="brkitr" inheritLanguageSubtag="zh_Hant">
|
||||
<localeIds>
|
||||
root,
|
||||
de, el, en, en_US_POSIX, en_US, es, fi, fr, it, ja, ko, pt, ru, sv, zh_Hant, zh
|
||||
</localeIds>
|
||||
</directory>
|
||||
|
||||
<!-- GLOBAL ALIASES -->
|
||||
|
||||
<!-- Some spoken languages (e.g. "ars") inherit all their data from a written language
|
||||
(e.g. "ar_SA"). However CLDR doesn't currently support a way to represent that
|
||||
relationship. Unlike deprecated languages for which an alias can be inferred from
|
||||
the "languageAlias" CLDR data, there's no way in CLDR to represent the fact that
|
||||
we want "ars" (a non-deprecated language) to inherit the data of "ar_SA".
|
||||
|
||||
This alias is the first example of potentially many cases where ICU needs to
|
||||
generate an alias in order to affect "sideways inheritance" for spoken languages,
|
||||
and at some stage it should probably be supported properly in the CLDR data. -->
|
||||
<forcedAlias source="ars" target="ar_SA"/>
|
||||
|
||||
<!-- A legacy global alias (note that "no_NO_NY" is not even structurally valid). -->
|
||||
<forcedAlias source="no_NO_NY" target="nn_NO"/>
|
||||
|
||||
<!-- This one is a bit silly, it is just to generate a stub for no_NO, which is
|
||||
not in CLDR. If we do not do this, then including it in localeIds will generate
|
||||
empty no_Latn and no_Latn_NO and then no_NO aliasing to no_Latn_NO. -->
|
||||
<forcedAlias source="no_NO" target="no"/>
|
||||
|
||||
<!-- ALTERNATE VALUES -->
|
||||
|
||||
<!-- The following elements configure alternate values for some special case paths.
|
||||
The target path will only be replaced if both it, and the source path, exist in
|
||||
the CLDR data (paths will not be modified if only the source path exists).
|
||||
|
||||
Since the paths must represent the same semantic type of data, they must be in the
|
||||
same "namespace" (same element names) and must not contain value attributes. Thus
|
||||
they can only differ by distinguishing attributes (either added or modified).
|
||||
|
||||
This feature is typically used to select alternate translations (e.g. short forms)
|
||||
for certain paths. -->
|
||||
<!-- <altPath target="//path/to/value[@attr='foo']"
|
||||
source="//path/to/value[@attr='bar']"
|
||||
locales="xx,yy_ZZ"/> -->
|
||||
</convert>
|
||||
|
||||
<!-- If a directory is listed here, then every file in it is assumed to be automatically
|
||||
generated by the conversion tool, unless it is explicitly listed in a <retain> element.
|
||||
The tool then checks every file to determine if it has the expected header present,
|
||||
indiciating that it was automatically generated, before deleting it.
|
||||
|
||||
If unexpected files are found, the "clean" task will fail without deleting anything
|
||||
(unless'forceDelete' is set to override this). Note that even if 'forceDelete' is set,
|
||||
the files listed explicitly below will never be deleted by this process.
|
||||
|
||||
This two-step approach minimizes the risk that the conversion process will ever
|
||||
accidentally delete a manually maintained file.
|
||||
-->
|
||||
<outputDirectories root="${outDir}" forceDelete="${forceDelete}">
|
||||
<dir name="brkitr">
|
||||
<retain path="adaboost"/>
|
||||
<retain path="dictionaries"/>
|
||||
<retain path="lstm"/>
|
||||
<retain path="rules"/>
|
||||
</dir>
|
||||
<dir name="coll">
|
||||
<!-- Legacy files whose file names aren't supported for automatic generation.
|
||||
Simple to maintain manually and unlikely to ever change again. -->
|
||||
<retain path="de__PHONEBOOK.txt"/>
|
||||
<retain path="de_.txt"/>
|
||||
<retain path="es__TRADITIONAL.txt"/>
|
||||
<retain path="es_.txt"/>
|
||||
</dir>
|
||||
<dir name="curr"/>
|
||||
<dir name="lang"/>
|
||||
<dir name="locales"/>
|
||||
<dir name="misc">
|
||||
<!-- Machine generated files produced by different tools.
|
||||
Possibly worth moving into the new LDML conversion tool one day. -->
|
||||
<retain path="currencyNumericCodes.txt"/>
|
||||
<retain path="zoneinfo64.txt"/>
|
||||
<!-- Project file (not ICU data), unlikely to ever be auto-generated. -->
|
||||
<retain path="icudata.rc"/>
|
||||
<!-- Small high-level metadata file, stable and easy to maintain manually. -->
|
||||
<retain path="icustd.txt"/>
|
||||
</dir>
|
||||
<dir name="rbnf"/>
|
||||
<dir name="region"/>
|
||||
<dir name="translit">
|
||||
<!-- Small, easy to maintain, special case top-level files. -->
|
||||
<retain path="en.txt"/>
|
||||
<retain path="el.txt"/>
|
||||
</dir>
|
||||
<dir name="unit"/>
|
||||
<dir name="zone">
|
||||
<!-- Manually edited to support TZ database name compatibility. -->
|
||||
<retain path="tzdbNames.txt"/>
|
||||
</dir>
|
||||
</outputDirectories>
|
||||
</config>
|
||||
|
|
@ -9,71 +9,60 @@
|
|||
<modelVersion>4.0.0</modelVersion>
|
||||
|
||||
<!-- Include the parent POM file to add the CLDR API dependency. -->
|
||||
<parent>
|
||||
<groupId>org.unicode.icu</groupId>
|
||||
<artifactId>cldr-lib</artifactId>
|
||||
<version>1.0</version>
|
||||
<relativePath>../lib</relativePath>
|
||||
</parent>
|
||||
<groupId>org.unicode.icu</groupId>
|
||||
<artifactId>cldr-to-icu</artifactId>
|
||||
<version>1.0-SNAPSHOT</version>
|
||||
|
||||
<properties>
|
||||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
|
||||
<!-- cldr/tools/ uses JDK 11, and because we depend on it we must
|
||||
use the same version or above -->
|
||||
<maven.compiler.source>11</maven.compiler.source>
|
||||
<maven.compiler.target>11</maven.compiler.target>
|
||||
|
||||
<icu4j.version>76.1</icu4j.version>
|
||||
<cldr-code.version>47.0-SNAPSHOT</cldr-code.version>
|
||||
<guava.version>32.1.1-jre</guava.version>
|
||||
<truth.version>1.4.4</truth.version>
|
||||
<commons-cli.version>1.9.0</commons-cli.version>
|
||||
</properties>
|
||||
|
||||
<!-- No need for <groupId> here (it's defined by the parent POM). -->
|
||||
<artifactId>cldr-to-icu</artifactId>
|
||||
<version>1.0-SNAPSHOT</version>
|
||||
<build>
|
||||
<plugins>
|
||||
<plugin>
|
||||
<groupId>org.apache.maven.plugins</groupId>
|
||||
<artifactId>maven-compiler-plugin</artifactId>
|
||||
<version>3.5.1</version>
|
||||
<version>3.13.0</version>
|
||||
<configuration>
|
||||
<source>8</source>
|
||||
<target>8</target>
|
||||
</configuration>
|
||||
</plugin>
|
||||
<plugin>
|
||||
<groupId>org.codehaus.mojo</groupId>
|
||||
<artifactId>exec-maven-plugin</artifactId>
|
||||
<version>1.6.0</version>
|
||||
<configuration>
|
||||
<mainClass>
|
||||
org.unicode.icu.tool.cldrtoicu.LdmlConverter
|
||||
</mainClass>
|
||||
<systemProperties>
|
||||
<property>
|
||||
<key>ICU_DIR</key>
|
||||
<value>${project.basedir}/../../..</value>
|
||||
</property>
|
||||
</systemProperties>
|
||||
<source>${maven.compiler.source}</source>
|
||||
<target>${maven.compiler.target}</target>
|
||||
</configuration>
|
||||
</plugin>
|
||||
<plugin>
|
||||
<groupId>org.apache.maven.plugins</groupId>
|
||||
<artifactId>maven-assembly-plugin</artifactId>
|
||||
<version>3.1.1</version>
|
||||
<version>3.7.1</version>
|
||||
<executions>
|
||||
<execution>
|
||||
<phase>compile</phase>
|
||||
<goals>
|
||||
<goal>single</goal>
|
||||
</goals>
|
||||
<configuration>
|
||||
<archive>
|
||||
<manifest>
|
||||
<mainClass>
|
||||
org.unicode.icu.tool.cldrtoicu.LdmlConverter
|
||||
</mainClass>
|
||||
</manifest>
|
||||
</archive>
|
||||
<descriptorRefs>
|
||||
<descriptorRef>jar-with-dependencies</descriptorRef>
|
||||
</descriptorRefs>
|
||||
</configuration>
|
||||
</execution>
|
||||
</executions>
|
||||
<configuration>
|
||||
<archive>
|
||||
<manifest>
|
||||
<mainClass>
|
||||
org.unicode.icu.tool.cldrtoicu.Cldr2Icu
|
||||
</mainClass>
|
||||
</manifest>
|
||||
</archive>
|
||||
<descriptorRefs>
|
||||
<descriptorRef>jar-with-dependencies</descriptorRef>
|
||||
</descriptorRefs>
|
||||
</configuration>
|
||||
</plugin>
|
||||
</plugins>
|
||||
</build>
|
||||
|
@ -83,11 +72,16 @@
|
|||
<dependency>
|
||||
<groupId>com.ibm.icu</groupId>
|
||||
<artifactId>icu4j</artifactId>
|
||||
<version>76.1</version>
|
||||
<!-- Note: see https://github.com/unicode-org/icu/packages/1954682/versions
|
||||
for the icu4j.version tag to use. In general we should just use the latest
|
||||
SNAPSHOT for the ICU version that we want, so this should only need updating
|
||||
when the ICU version changes e.g. from 74.0.1, to 74.1, then to 75.0.1 -->
|
||||
<version>${icu4j.version}</version>
|
||||
<!-- Note: see https://github.com/unicode-org/icu/packages/1954682/versions
|
||||
for the icu4j.version tag to use. In general we should just use the latest
|
||||
SNAPSHOT for the ICU version that we want, so this should only need updating
|
||||
when the ICU version changes e.g. from 74.0.1, to 74.1, then to 75.0.1 -->
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.unicode.cldr</groupId>
|
||||
<artifactId>cldr-code</artifactId>
|
||||
<version>${cldr-code.version}</version>
|
||||
</dependency>
|
||||
|
||||
<!-- Useful common libraries. Note that some of the code in the CLDR library is also
|
||||
|
@ -96,36 +90,21 @@
|
|||
<dependency>
|
||||
<groupId>com.google.guava</groupId>
|
||||
<artifactId>guava</artifactId>
|
||||
<version>30.0-jre</version>
|
||||
<version>${guava.version}</version>
|
||||
</dependency>
|
||||
|
||||
<!-- Ant: Only used for running the conversion tool, not compiling it. -->
|
||||
<dependency>
|
||||
<groupId>org.apache.ant</groupId>
|
||||
<artifactId>ant</artifactId>
|
||||
<version>1.10.11</version>
|
||||
<groupId>commons-cli</groupId>
|
||||
<artifactId>commons-cli</artifactId>
|
||||
<version>${commons-cli.version}</version>
|
||||
</dependency>
|
||||
|
||||
<!-- Testing only dependencies. -->
|
||||
<dependency>
|
||||
<groupId>com.google.truth</groupId>
|
||||
<artifactId>truth</artifactId>
|
||||
<version>1.0</version>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.google.truth.extensions</groupId>
|
||||
<artifactId>truth-java8-extension</artifactId>
|
||||
<version>1.0</version>
|
||||
<version>${truth.version}</version>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
</dependencies>
|
||||
|
||||
<repositories>
|
||||
<repository>
|
||||
<id>githubcldr</id>
|
||||
<name>GitHub unicode-org/icu Apache Maven Packages</name>
|
||||
<url>https://maven.pkg.github.com/unicode-org/icu</url>
|
||||
</repository>
|
||||
</repositories>
|
||||
</project>
|
||||
|
|
|
@ -0,0 +1,71 @@
|
|||
// © 2024 and later: Unicode, Inc. and others.
|
||||
// License & terms of use: http://www.unicode.org/copyright.html
|
||||
package org.unicode.icu.tool.cldrtoicu;
|
||||
|
||||
import org.unicode.icu.tool.cldrtoicu.ant.CleanOutputDirectoryTask;
|
||||
import org.unicode.icu.tool.cldrtoicu.ant.ConvertIcuDataTask;
|
||||
import org.unicode.icu.tool.cldrtoicu.ant.GenerateCodeTask;
|
||||
|
||||
public class Cldr2Icu {
|
||||
private final Cldr2IcuCliOptions options = new Cldr2IcuCliOptions();
|
||||
|
||||
private void convert() {
|
||||
ConvertIcuDataTask convert = ConvertIcuDataTask.fromXml(options.xmlConfig);
|
||||
|
||||
convert.setCldrDir(options.cldrDataDir);
|
||||
convert.setOutputDir(options.outDir);
|
||||
convert.setSpecialsDir(options.specialsDir);
|
||||
convert.setOutputTypes(options.outputTypes);
|
||||
convert.setIcuVersion(options.icuVersion);
|
||||
convert.setIcuDataVersion(options.icuDataVersion);
|
||||
convert.setCldrVersion(options.cldrVersion);
|
||||
convert.setMinimalDraftStatus(options.minDraftStatus);
|
||||
convert.setLocaleIdFilter(options.localeIdFilter);
|
||||
convert.setIncludePseudoLocales(options.includePseudoLocales);
|
||||
convert.setEmitReport(options.emitReport);
|
||||
|
||||
convert.init();
|
||||
convert.execute();
|
||||
}
|
||||
|
||||
private void generateCode(String action) {
|
||||
GenerateCodeTask generateCode = new GenerateCodeTask();
|
||||
|
||||
generateCode.setCldrDir(options.cldrDataDir);
|
||||
generateCode.setCOutDir(options.genCCodeDir);
|
||||
generateCode.setJavaOutDir(options.genJavaCodeDir);
|
||||
generateCode.setAction(action);
|
||||
|
||||
generateCode.init();
|
||||
generateCode.execute();
|
||||
}
|
||||
|
||||
private void outputDirectories() {
|
||||
CleanOutputDirectoryTask clean = CleanOutputDirectoryTask.fromXml(options.xmlConfig);
|
||||
|
||||
clean.setRoot(options.outDir);
|
||||
clean.setForceDelete(options.forceDelete);
|
||||
|
||||
clean.init();
|
||||
clean.execute();
|
||||
}
|
||||
|
||||
private void clean() {
|
||||
outputDirectories();
|
||||
generateCode("clean");
|
||||
}
|
||||
|
||||
private void generate() {
|
||||
convert();
|
||||
if (!options.dontGenCode) {
|
||||
generateCode(null);
|
||||
}
|
||||
}
|
||||
|
||||
public static void main(String[] args) {
|
||||
Cldr2Icu self = new Cldr2Icu();
|
||||
self.options.processArgs(args);
|
||||
self.clean();
|
||||
self.generate();
|
||||
}
|
||||
}
|
|
@ -0,0 +1,401 @@
|
|||
// © 2024 and later: Unicode, Inc. and others.
|
||||
// License & terms of use: http://www.unicode.org/copyright.html
|
||||
package org.unicode.icu.tool.cldrtoicu;
|
||||
|
||||
import java.io.File;
|
||||
import java.util.Arrays;
|
||||
import java.util.StringJoiner;
|
||||
|
||||
import org.apache.commons.cli.CommandLine;
|
||||
import org.apache.commons.cli.CommandLineParser;
|
||||
import org.apache.commons.cli.DefaultParser;
|
||||
import org.apache.commons.cli.HelpFormatter;
|
||||
import org.apache.commons.cli.Option;
|
||||
import org.apache.commons.cli.Options;
|
||||
import org.unicode.icu.tool.cldrtoicu.LdmlConverter.OutputType;
|
||||
|
||||
import com.ibm.icu.util.VersionInfo;
|
||||
|
||||
class Cldr2IcuCliOptions {
|
||||
private static final String HELP = "help";
|
||||
private static final String HELP_DESC = "this text";
|
||||
|
||||
private static final String ICU_DIR = "icuDir";
|
||||
private static final String ICU_DIR_DESC = "Path top level ICU directory"
|
||||
+ " (containing `.git`, `icu4c`, `icu4j`, `tools` directories)";
|
||||
private static final String ICU_DIR_DEFAULT = "${environ.ICU_DIR}";
|
||||
String icuDir;
|
||||
|
||||
private static final String CLDR_DIR = "cldrDir";
|
||||
private static final String CLDR_DIR_DESC = "This is the path to the to root of standard CLDR sources,"
|
||||
+ " (containing `common` and `tools` directories).";
|
||||
private static final String CLDR_DIR_DEFAULT = "${environ.CLDR_DIR}";
|
||||
String cldrDir;
|
||||
|
||||
private static final String CLDR_DATA_DIR = "cldrDataDir";
|
||||
private static final String CLDR_DATA_DIR_DESC = "The top-level directory for the CLDR production data"
|
||||
+ " (typically the `production` directory in the staging repository)."
|
||||
+ " Usually generated locally or obtained from https://github.com/unicode-org/cldr-staging/tree/main/production";
|
||||
private static final String CLDR_DATA_DIR_DEFAULT = "${environ.CLDR_DATA_DIR}";
|
||||
String cldrDataDir;
|
||||
|
||||
private static final String OUT_DIR = "outDir";
|
||||
final private static String OUT_DIR_DESC = "The output directory into which to write the converted ICU data. By default"
|
||||
+ " this will overwrite (without deletion) the ICU data files in this ICU release,"
|
||||
+ " so it is recommended that for testing, it be set to another value.";
|
||||
final private static String OUT_DIR_DEFAULT = "${icuDir}/icu4c/source/data";
|
||||
String outDir;
|
||||
|
||||
private static final String GEN_C_CODE_DIR = "genCCodeDir";
|
||||
private static final String GEN_C_CODE_DIR_DESC = "The output directory into which to write generated C/C++ code."
|
||||
+ " By default this will overwrite (without deletion) the generated C/C++ files in this ICU release,"
|
||||
+ " so it is recommended that for testing, it be set to another value.";
|
||||
private static final String GEN_C_CODE_DIR_DEFAULT = "${icuDir}/icu4c/source";
|
||||
String genCCodeDir;
|
||||
|
||||
private static final String GEN_JAVA_CODE_DIR = "genJavaCodeDir";
|
||||
private static final String GEN_JAVA_CODE_DIR_DESC = "The output directory into which to write generated Java code."
|
||||
+ " By default this will overwrite (without deletion) the generated Java files in this ICU release,"
|
||||
+ " so it is recommended that for testing, it be set to another value.";
|
||||
private static final String GEN_JAVA_CODE_DIR_DEFAULT = "${icuDir}/icu4j/main/core";
|
||||
String genJavaCodeDir;
|
||||
|
||||
private static final String DONT_GEN_CODE = "dontGenCode";
|
||||
private static final String DONT_GEN_CODE_DESC = "Set this to true to prevent the generation of"
|
||||
+ " ICU source files";
|
||||
private static final String DONT_GEN_CODE_DEFAULT = "false";
|
||||
boolean dontGenCode;
|
||||
|
||||
private static final String SPECIALS_DIR = "specialsDir";
|
||||
private static final String SPECIALS_DIR_DESC = "The directory in which the additional ICU XML data is stored.";
|
||||
private static final String SPECIALS_DIR_DEFAULT = "${icuDir}/icu4c/source/data/xml";
|
||||
String specialsDir;
|
||||
|
||||
private static final String ICU_VERSION = "icuVersion";
|
||||
private static final String ICU_VERSION_DESC = "Default value for ICU version (`icuver.txt`)."
|
||||
+ " Update this for each release.";
|
||||
private static final String ICU_VERSION_DEFAULT = VersionInfo.ICU_VERSION.toString();
|
||||
String icuVersion;
|
||||
|
||||
private static final String ICU_DATA_VERSION = "icuDataVersion";
|
||||
private static final String ICU_DATA_VERSION_DESC = "Default value for ICU data version (`icuver.txt`)."
|
||||
+ " Update this for each release.";
|
||||
private static final String ICU_DATA_VERSION_DEFAULT = VersionInfo.ICU_DATA_VERSION.toString();
|
||||
String icuDataVersion;
|
||||
|
||||
private static final String CLDR_VERSION = "cldrVersion";
|
||||
private static final String CLDR_VERSION_DESC = "An override for the CLDR version string (`icuver.txt` and others)."
|
||||
+ " This will be extracted from the CLDR library used for building the data if not set here.";
|
||||
private static final String CLDR_VERSION_DEFAULT = "";
|
||||
String cldrVersion;
|
||||
|
||||
private static final String MIN_DRAFT_STATUS = "minDraftStatus";
|
||||
private static final String MIN_DRAFT_STATUS_DESC = "The minimum draft status for CLDR data to be used in the conversion."
|
||||
+ " See CldrDraftStatus for more details.";
|
||||
private static final String MIN_DRAFT_STATUS_DEFAULT = "CONTRIBUTED";
|
||||
String minDraftStatus;
|
||||
|
||||
private static final String LOCALE_ID_FILTER = "localeIdFilter";
|
||||
private static final String LOCALE_ID_FILTER_DESC = "A regular expression to match the locale IDs to be generated"
|
||||
+ " (useful for debugging specific regions). This is applied after locale ID specifications"
|
||||
+ " have been expanded into full locale IDs, so the value `en` will NOT match `en_GB` or `en_001` etc.";
|
||||
private static final String LOCALE_ID_FILTER_DEFAULT = "";
|
||||
String localeIdFilter;
|
||||
|
||||
private static final String INCLUDE_PSEUDO_LOCALES = "includePseudoLocales";
|
||||
private static final String INCLUDE_PSEUDO_LOCALES_DESC = "Whether to synthetically generate \"pseudo locale\" data"
|
||||
+ " (`en_XA` and `ar_XB`).";
|
||||
private static final String INCLUDE_PSEUDO_LOCALES_DEFAULT = "false";
|
||||
boolean includePseudoLocales;
|
||||
|
||||
private static final String EMIT_REPORT = "emitReport";
|
||||
private static final String EMIT_REPORT_DESC = "Whether to emit a debug report containing some possibly"
|
||||
+ " useful information after the conversion has finished.";
|
||||
private static final String EMIT_REPORT_DEFAULT = "false";
|
||||
boolean emitReport;
|
||||
|
||||
private static final String OUTPUT_TYPES = "outputTypes";
|
||||
private static final String OUTPUT_TYPES_DESC = "List of output \"types\" to be generated (e.g. `rbnf,plurals,locales`);"
|
||||
+ " an empty list means \"build everything\".\n"
|
||||
+ "Note that the grouping of types is based on the legacy converter behaviour and"
|
||||
+ " is not always directly associated with an output directory (e.g. \"locales\") produces locale data"
|
||||
+ " for `curr/`, `lang/`, `main/`, `region/`, `unit/`, `zone/` but NOT `coll/`, `brkitr/` or `rbnf/`).\n"
|
||||
// It would be nice to initialize this from OutputType, but to do that we need to read an XML file,
|
||||
// so we need to know what the cldrDir folder is. But we only know that AFTER we parse the command line.
|
||||
+ "Use outputTypesList to get a list of currently know values.";
|
||||
private static final String OUTPUT_TYPES_DEFAULT = "";
|
||||
String outputTypes;
|
||||
|
||||
private static final String OUTPUT_TYPES_LIST = "outputTypesList";
|
||||
private static final String OUTPUT_TYPES_LIST_DESC = "Show the complete list of knonw output types and exit.";
|
||||
private static final String OUTPUT_TYPES_LIST_DEFAULT = "false";
|
||||
|
||||
private static final String FORCE_DELETE = "forceDelete";
|
||||
private static final String FORCE_DELETE_DESC = "Override to force the 'clean' task to delete files it cannot"
|
||||
+ " determine to be auto-generated by this tool. This is useful if the file header changes since"
|
||||
+ " the heading is what's used to recognize auto-generated files.";
|
||||
private static final String FORCE_DELETE_DEFAULT = "false";
|
||||
boolean forceDelete;
|
||||
|
||||
private static final String XML_CONFIG = "xmlConfig";
|
||||
private static final String XML_CONFIG_DESC = "Override to force the 'clean' task to delete files it cannot"
|
||||
+ " determine to be auto-generated by this tool. This is useful if the file header changes since"
|
||||
+ " the heading is what's used to recognize auto-generated files.";
|
||||
private static final String XML_CONFIG_DEFAULT = "${icuDir}/tools/cldr/cldr-to-icu/config.xml";
|
||||
String xmlConfig;
|
||||
|
||||
// These must be kept in sync with getOptions().
|
||||
private static final Options options = new Options()
|
||||
.addOption(Option.builder()
|
||||
.longOpt(HELP)
|
||||
.desc(HELP_DESC)
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(ICU_DIR)
|
||||
.hasArg()
|
||||
.argName("path")
|
||||
.desc(descWithDefault(ICU_DIR_DESC, ICU_DIR_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(CLDR_DIR)
|
||||
.hasArg()
|
||||
.argName("path")
|
||||
.desc(descWithDefault(CLDR_DIR_DESC, CLDR_DIR_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(CLDR_DATA_DIR)
|
||||
.hasArg()
|
||||
.argName("path")
|
||||
.desc(descWithDefault(CLDR_DATA_DIR_DESC, CLDR_DATA_DIR_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(OUT_DIR)
|
||||
.hasArg()
|
||||
.argName("path")
|
||||
.desc(descWithDefault(OUT_DIR_DESC, OUT_DIR_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(GEN_C_CODE_DIR)
|
||||
.hasArg()
|
||||
.argName("path")
|
||||
.desc(descWithDefault(GEN_C_CODE_DIR_DESC, GEN_C_CODE_DIR_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(GEN_JAVA_CODE_DIR)
|
||||
.hasArg()
|
||||
.argName("path")
|
||||
.desc(descWithDefault(GEN_JAVA_CODE_DIR_DESC, GEN_JAVA_CODE_DIR_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(DONT_GEN_CODE)
|
||||
.desc(descWithDefault(DONT_GEN_CODE_DESC, DONT_GEN_CODE_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(SPECIALS_DIR)
|
||||
.hasArg()
|
||||
.argName("path")
|
||||
.desc(descWithDefault(SPECIALS_DIR_DESC, SPECIALS_DIR_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(OUTPUT_TYPES)
|
||||
.hasArg()
|
||||
.argName("out_types")
|
||||
.desc(descWithDefault(OUTPUT_TYPES_DESC, OUTPUT_TYPES_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(OUTPUT_TYPES_LIST)
|
||||
.desc(descWithDefault(OUTPUT_TYPES_LIST_DESC, OUTPUT_TYPES_LIST_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(ICU_VERSION)
|
||||
.hasArg()
|
||||
.argName("version")
|
||||
.desc(descWithDefault(ICU_VERSION_DESC, ICU_VERSION_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(ICU_DATA_VERSION)
|
||||
.hasArg()
|
||||
.argName("version")
|
||||
.desc(descWithDefault(ICU_DATA_VERSION_DESC, ICU_DATA_VERSION_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(CLDR_VERSION)
|
||||
.hasArg()
|
||||
.argName("version")
|
||||
.desc(descWithDefault(CLDR_VERSION_DESC, CLDR_VERSION_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(MIN_DRAFT_STATUS)
|
||||
.hasArg()
|
||||
.argName("draft_status")
|
||||
.desc(descWithDefault(MIN_DRAFT_STATUS_DESC, MIN_DRAFT_STATUS_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(LOCALE_ID_FILTER)
|
||||
.hasArg()
|
||||
.argName("locale_list")
|
||||
.desc(descWithDefault(LOCALE_ID_FILTER_DESC, LOCALE_ID_FILTER_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(INCLUDE_PSEUDO_LOCALES)
|
||||
.desc(descWithDefault(INCLUDE_PSEUDO_LOCALES_DESC, INCLUDE_PSEUDO_LOCALES_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(EMIT_REPORT)
|
||||
.desc(descWithDefault(EMIT_REPORT_DESC, EMIT_REPORT_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(FORCE_DELETE)
|
||||
.desc(descWithDefault(FORCE_DELETE_DESC, FORCE_DELETE_DEFAULT))
|
||||
.build())
|
||||
.addOption(Option.builder()
|
||||
.longOpt(XML_CONFIG)
|
||||
.hasArg()
|
||||
.argName("path")
|
||||
.desc(descWithDefault(XML_CONFIG_DESC, XML_CONFIG_DEFAULT))
|
||||
.build())
|
||||
;
|
||||
|
||||
void processArgs(String[] args) {
|
||||
CommandLine cli = null;
|
||||
try{
|
||||
CommandLineParser parser = new DefaultParser();
|
||||
cli = parser.parse(options, args);
|
||||
} catch (Exception e){
|
||||
cli = CommandLine.builder().build();
|
||||
showUsageAndExit();
|
||||
}
|
||||
if (cli.hasOption(HELP)) {
|
||||
showUsageAndExit();
|
||||
}
|
||||
|
||||
icuDir = cli.getOptionValue(ICU_DIR, icuDir);
|
||||
cldrDir = cli.getOptionValue(CLDR_DIR, cldrDir);
|
||||
cldrDataDir = cli.getOptionValue(CLDR_DATA_DIR, cldrDataDir);
|
||||
|
||||
outDir = cli.getOptionValue(OUT_DIR, expandFolders(OUT_DIR_DEFAULT));
|
||||
genCCodeDir = cli.getOptionValue(GEN_C_CODE_DIR, expandFolders(GEN_C_CODE_DIR_DEFAULT));
|
||||
genJavaCodeDir = cli.getOptionValue(GEN_JAVA_CODE_DIR, expandFolders(GEN_JAVA_CODE_DIR_DEFAULT));
|
||||
dontGenCode = cli.hasOption(DONT_GEN_CODE);
|
||||
specialsDir = cli.getOptionValue(SPECIALS_DIR, expandFolders(SPECIALS_DIR_DEFAULT));
|
||||
outputTypes = cli.getOptionValue(OUTPUT_TYPES, ""); // empty means all
|
||||
icuVersion = cli.getOptionValue(ICU_VERSION, ICU_VERSION_DEFAULT);
|
||||
icuDataVersion = cli.getOptionValue(ICU_DATA_VERSION, ICU_DATA_VERSION_DEFAULT);
|
||||
cldrVersion = cli.getOptionValue(CLDR_VERSION, CLDR_VERSION_DEFAULT);
|
||||
minDraftStatus = cli.getOptionValue(MIN_DRAFT_STATUS, MIN_DRAFT_STATUS_DEFAULT);
|
||||
localeIdFilter = cli.getOptionValue(LOCALE_ID_FILTER, LOCALE_ID_FILTER_DEFAULT);
|
||||
includePseudoLocales = cli.hasOption(INCLUDE_PSEUDO_LOCALES);
|
||||
emitReport = cli.hasOption(EMIT_REPORT);
|
||||
forceDelete = cli.hasOption(FORCE_DELETE);
|
||||
xmlConfig = cli.getOptionValue(XML_CONFIG, expandFolders(XML_CONFIG_DEFAULT));
|
||||
|
||||
if (cli.hasOption(OUTPUT_TYPES_LIST)) {
|
||||
OutputType[] outTypesToSort = OutputType.values();
|
||||
Arrays.sort(outTypesToSort, (o1, o2) -> o1.name().compareTo(o2.name()));
|
||||
StringJoiner strOutType = new StringJoiner(", ");
|
||||
for (OutputType ot : outTypesToSort) {
|
||||
strOutType.add(ot.name());
|
||||
}
|
||||
System.out.println("Known output types: " + strOutType);
|
||||
System.exit(2);
|
||||
}
|
||||
}
|
||||
|
||||
private static String descWithDefault(String description, String defaultValue) {
|
||||
if (defaultValue != null) {
|
||||
return description + "\nDefaults to: \"" + defaultValue + "\"";
|
||||
} else {
|
||||
return description;
|
||||
}
|
||||
}
|
||||
|
||||
private void showUsageAndExit() {
|
||||
String thisClassName = Cldr2Icu.class.getCanonicalName();
|
||||
HelpFormatter formatter = new HelpFormatter();
|
||||
formatter.printHelp(
|
||||
/*width*/ 120,
|
||||
/*cmdLineSyntax*/ thisClassName + " [OPTIONS]\n",
|
||||
/*header*/ "\n"
|
||||
+ "This program is used to convert CLDR xml files to ICU ResourceBundle txt files.\n"
|
||||
+ "Options:",
|
||||
options,
|
||||
/*footer*/ "\nExample: " + thisClassName + " --outDir /tmp/debug --localeIdFilter=fr");
|
||||
System.exit(-1);
|
||||
}
|
||||
|
||||
Cldr2IcuCliOptions() {
|
||||
// This will initialize icuDir, cldrDir, and cldrDataDir from environment variables
|
||||
validateEnvironment();
|
||||
}
|
||||
|
||||
String expandFolders(String str) {
|
||||
return str
|
||||
.replace("${icuDir}", icuDir)
|
||||
.replace("${cldrDir}", cldrDir)
|
||||
.replace("${cldrDataDir}", cldrDataDir);
|
||||
}
|
||||
|
||||
// For certain things we want to check both the environment, and Java properties
|
||||
// (passed with -Dkey=value)
|
||||
// The property takes precedence.
|
||||
private static String getEnvironOrProperty(String key) {
|
||||
String result = System.getProperty(key);
|
||||
if (result == null) {
|
||||
result = System.getenv(key);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
// Check that the environment variables point to the proper `icu` / `cldr` / `cldr-staging` folders
|
||||
private void validateEnvironment() {
|
||||
icuDir = getEnvironOrProperty("ICU_DIR");
|
||||
cldrDir = getEnvironOrProperty("CLDR_DIR");
|
||||
cldrDataDir = getEnvironOrProperty("CLDR_DATA_DIR");
|
||||
|
||||
String icuMessage = "Set the ICU_DIR environment variable to the top level ICU directory (containing `.git`, `icu4c`, `icu4j`, `tools` directories)";
|
||||
String cldrMessage = "Set the CLDR_DIR environment variable to the top level CLDR directory (containing `common` and `tools` directories)";
|
||||
String cldrDataMessage = "Set the CLDR_DATA_DIR environment variable to the top level CLDR production data directory (typically the `production` directory in the staging repository)\n"
|
||||
+ "Usually generated locally or obtained from: https://github.com/unicode-org/cldr-staging/tree/main/production";
|
||||
if (icuDir == null) {
|
||||
System.err.println(icuMessage);
|
||||
System.exit(1);
|
||||
}
|
||||
if (cldrDir == null) {
|
||||
System.err.println(cldrMessage);
|
||||
System.exit(1);
|
||||
}
|
||||
if (cldrDataDir == null) {
|
||||
System.err.println(cldrDataMessage);
|
||||
System.exit(1);
|
||||
}
|
||||
|
||||
if (!new File(icuDir).isDirectory()
|
||||
|| ! new File(icuDir, "icu4c").isDirectory()
|
||||
|| ! new File(icuDir, "icu4j").isDirectory()
|
||||
|| ! new File(icuDir, "tools/cldr/cldr-to-icu").isDirectory()
|
||||
|| ! new File(icuDir, "tools/cldr/cldr-to-icu/pom.xml").isFile()) {
|
||||
System.err.println("The `" + icuDir + "` directory does not look like a valid icu root.");
|
||||
System.err.println(icuMessage);
|
||||
System.exit(1);
|
||||
}
|
||||
if (!new File(cldrDir).isDirectory()
|
||||
|| ! new File(cldrDir, "tools/cldr-code").isDirectory()
|
||||
|| ! new File(cldrDir, "tools/cldr-code/pom.xml").isFile()) {
|
||||
System.err.println("The `" + cldrDir + "` directory does not look like a valid cldr root.");
|
||||
System.err.println(cldrMessage);
|
||||
System.exit(1);
|
||||
}
|
||||
if (!new File(cldrDataDir).isDirectory()
|
||||
|| ! new File(cldrDataDir, "common/supplemental").isDirectory()
|
||||
|| ! new File(cldrDataDir, "common/main").isDirectory()
|
||||
|| ! new File(cldrDataDir, "common/main/en.xml").isFile()) {
|
||||
System.err.println("The `" + cldrDataDir + "` directory does not look like a valid cldr-staging/ root.");
|
||||
System.err.println(cldrDataMessage);
|
||||
System.exit(1);
|
||||
}
|
||||
|
||||
// The cldr-code library checks for CLDR_DIR in the Java properties.
|
||||
// So if we got cldrDir from or from environment or command line we update the property.
|
||||
System.setProperty("CLDR_DIR", cldrDir);
|
||||
}
|
||||
}
|
|
@ -179,7 +179,6 @@ final class IcuDataDumper {
|
|||
LineMatch match = LineType.match(line, inBlockComment);
|
||||
checkState(match.getType().isValidTransitionFrom(lastType),
|
||||
"invalid state transition: %s --//-> %s", lastType, match.getType());
|
||||
boolean isEndOfWrappedValue = false;
|
||||
switch (match.getType()) {
|
||||
case COMMENT:
|
||||
if (name != null) {
|
||||
|
|
|
@ -11,6 +11,7 @@ import static java.util.stream.Collectors.joining;
|
|||
import static java.util.stream.Collectors.partitioningBy;
|
||||
|
||||
import java.io.BufferedReader;
|
||||
import java.io.File;
|
||||
import java.io.IOException;
|
||||
import java.io.InputStream;
|
||||
import java.io.InputStreamReader;
|
||||
|
@ -28,9 +29,14 @@ import java.util.TreeSet;
|
|||
import java.util.stream.Collectors;
|
||||
import java.util.stream.Stream;
|
||||
|
||||
import org.apache.tools.ant.BuildException;
|
||||
import org.apache.tools.ant.Task;
|
||||
import javax.xml.parsers.DocumentBuilder;
|
||||
import javax.xml.parsers.DocumentBuilderFactory;
|
||||
|
||||
import org.unicode.icu.tool.cldrtoicu.LdmlConverterConfig.IcuLocaleDir;
|
||||
import org.w3c.dom.Document;
|
||||
import org.w3c.dom.Element;
|
||||
import org.w3c.dom.Node;
|
||||
import org.w3c.dom.NodeList;
|
||||
|
||||
import com.google.common.base.CharMatcher;
|
||||
import com.google.common.collect.ImmutableList;
|
||||
|
@ -38,7 +44,6 @@ import com.google.common.collect.ImmutableSet;
|
|||
import com.google.common.collect.Iterables;
|
||||
import com.google.common.io.CharStreams;
|
||||
|
||||
// Note: Auto-magical Ant methods are listed as "unused" by IDEs, unless the warning is suppressed.
|
||||
public final class CleanOutputDirectoryTask extends Task {
|
||||
private static final ImmutableSet<String> ALLOWED_DIRECTORIES =
|
||||
Stream
|
||||
|
@ -58,8 +63,7 @@ public final class CleanOutputDirectoryTask extends Task {
|
|||
// header without it (since that's the old behaviour).
|
||||
// Once there's been an ICU release with this line included in the headers of all data
|
||||
// files, we can remove the fallback and just test for this line and nothing else.
|
||||
private static final String WAS_GENERATED_LABEL =
|
||||
"Generated using tools/cldr/cldr-to-icu/build-icu-data.xml";
|
||||
private static final String WAS_GENERATED_LABEL = "Generated using tools/cldr/cldr-to-icu/";
|
||||
|
||||
// The number of header lines to check before giving up if we don't find the generated
|
||||
// label.
|
||||
|
@ -84,9 +88,8 @@ public final class CleanOutputDirectoryTask extends Task {
|
|||
public static final class Retain extends Task {
|
||||
private Path path = null;
|
||||
|
||||
// Don't use "Path" for the argument type because that always makes an absolute path (e.g.
|
||||
// relative to the working directory for the Ant task). We want relative paths.
|
||||
@SuppressWarnings("unused")
|
||||
// Don't use "Path" for the argument type because that always makes an absolute path
|
||||
// (e.g. relative to the working directory). We want relative paths.
|
||||
public void setPath(String path) {
|
||||
Path p = Paths.get(path).normalize();
|
||||
checkBuild(!p.isAbsolute() && !p.startsWith(".."), "invalid path: %s", path);
|
||||
|
@ -103,14 +106,12 @@ public final class CleanOutputDirectoryTask extends Task {
|
|||
private String name;
|
||||
private final Set<Path> retained = new HashSet<>();
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setName(String name) {
|
||||
checkBuild(ALLOWED_DIRECTORIES.contains(name),
|
||||
"unknown directory name '%s'; allowed values: %s", name, ALLOWED_DIRECTORIES);
|
||||
this.name = name;
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void addConfiguredRetain(Retain retain) {
|
||||
retained.add(retain.path);
|
||||
}
|
||||
|
@ -121,18 +122,15 @@ public final class CleanOutputDirectoryTask extends Task {
|
|||
}
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setRoot(String root) {
|
||||
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
|
||||
this.root = Paths.get(root);
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setForceDelete(boolean forceDelete) {
|
||||
this.forceDelete = forceDelete;
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void addConfiguredDir(Dir dir) {
|
||||
outputDirs.add(dir);
|
||||
}
|
||||
|
@ -255,7 +253,7 @@ public final class CleanOutputDirectoryTask extends Task {
|
|||
fileReader.reset();
|
||||
}
|
||||
boolean isLenientHeaderMatchSoFar = true;
|
||||
for (int n = 0; n < MAX_HEADER_CHECK_LINES ; n++) {
|
||||
for (int n = 0; n < MAX_HEADER_CHECK_LINES; n++) {
|
||||
String line = fileReader.readLine();
|
||||
// True if we have processed the header, not including the trailing generated label.
|
||||
boolean headerIsProcessed = n >= headerLines.size() - 1;
|
||||
|
@ -340,4 +338,77 @@ public final class CleanOutputDirectoryTask extends Task {
|
|||
throw new RuntimeException("cannot read resource: " + name, e);
|
||||
}
|
||||
}
|
||||
|
||||
private static Retain getRetain(Element elem) {
|
||||
if (!"retain".equals(elem.getTagName())) {
|
||||
return null;
|
||||
}
|
||||
String path = elem.getAttribute("path");
|
||||
Retain retain = new Retain();
|
||||
retain.setPath(path);
|
||||
return retain;
|
||||
}
|
||||
|
||||
private static Dir getDirectory(Element element) {
|
||||
if (!"dir".equals(element.getTagName())) {
|
||||
return null;
|
||||
}
|
||||
String name = element.getAttribute("name");
|
||||
Dir dir = new Dir();
|
||||
dir.setName(name);
|
||||
Node node = element.getFirstChild();
|
||||
while (node != null) {
|
||||
if (node.getNodeType() == Node.ELEMENT_NODE) {
|
||||
Element childElement = (Element) node;
|
||||
switch (childElement.getTagName()) {
|
||||
case "retain":
|
||||
Retain retain = getRetain(childElement);
|
||||
dir.addConfiguredRetain(retain);
|
||||
break;
|
||||
default:
|
||||
}
|
||||
}
|
||||
node = node.getNextSibling();
|
||||
}
|
||||
return dir;
|
||||
}
|
||||
|
||||
public static CleanOutputDirectoryTask fromXml(String fileName) {
|
||||
try {
|
||||
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
|
||||
Document doc = builder.parse(new File(fileName));
|
||||
Element root = doc.getDocumentElement();
|
||||
if (!"config".equals(root.getTagName())) {
|
||||
System.err.println("The root of the config file should be <config>");
|
||||
return null;
|
||||
}
|
||||
|
||||
NodeList outputDirectories = root.getElementsByTagName("outputDirectories");
|
||||
if (outputDirectories.getLength() != 1) {
|
||||
System.err.println("Exactly one <outputDirectories> element allowed and required");
|
||||
return null;
|
||||
}
|
||||
CleanOutputDirectoryTask cleaner = new CleanOutputDirectoryTask();
|
||||
Node node = outputDirectories.item(0).getFirstChild();
|
||||
while (node != null) {
|
||||
if (node instanceof Element) {
|
||||
Element childElement = (Element) node;
|
||||
String nodeName = childElement.getTagName();
|
||||
switch (nodeName) {
|
||||
case "dir":
|
||||
Dir dir = getDirectory(childElement);
|
||||
cleaner.addConfiguredDir(dir);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
node = node.getNextSibling();
|
||||
}
|
||||
return cleaner;
|
||||
} catch (Exception e) {
|
||||
e.printStackTrace();
|
||||
}
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
|
|
@ -15,6 +15,7 @@ import static com.google.common.collect.Tables.immutableCell;
|
|||
import static java.util.stream.Collectors.joining;
|
||||
import static org.unicode.cldr.api.CldrPath.parseDistinguishingPath;
|
||||
|
||||
import java.io.File;
|
||||
import java.nio.file.Path;
|
||||
import java.nio.file.Paths;
|
||||
import java.util.ArrayList;
|
||||
|
@ -25,8 +26,9 @@ import java.util.function.Predicate;
|
|||
import java.util.regex.Pattern;
|
||||
import java.util.stream.Collectors;
|
||||
|
||||
import org.apache.tools.ant.BuildException;
|
||||
import org.apache.tools.ant.Task;
|
||||
import javax.xml.parsers.DocumentBuilder;
|
||||
import javax.xml.parsers.DocumentBuilderFactory;
|
||||
|
||||
import org.unicode.cldr.api.CldrDataSupplier;
|
||||
import org.unicode.cldr.api.CldrDraftStatus;
|
||||
import org.unicode.cldr.api.CldrPath;
|
||||
|
@ -38,6 +40,10 @@ import org.unicode.icu.tool.cldrtoicu.LdmlConverter.OutputType;
|
|||
import org.unicode.icu.tool.cldrtoicu.LdmlConverterConfig.IcuLocaleDir;
|
||||
import org.unicode.icu.tool.cldrtoicu.PseudoLocales;
|
||||
import org.unicode.icu.tool.cldrtoicu.SupplementalData;
|
||||
import org.w3c.dom.Document;
|
||||
import org.w3c.dom.Element;
|
||||
import org.w3c.dom.Node;
|
||||
import org.w3c.dom.NodeList;
|
||||
|
||||
import com.google.common.base.Ascii;
|
||||
import com.google.common.base.CaseFormat;
|
||||
|
@ -53,10 +59,9 @@ import com.google.common.collect.SetMultimap;
|
|||
import com.google.common.collect.Sets;
|
||||
import com.google.common.collect.Table.Cell;
|
||||
|
||||
// Note: Auto-magical Ant methods are listed as "unused" by IDEs, unless the warning is suppressed.
|
||||
public final class ConvertIcuDataTask extends Task {
|
||||
private static final Splitter LIST_SPLITTER =
|
||||
Splitter.on(CharMatcher.anyOf(",\n")).trimResults(whitespace()).omitEmptyStrings();
|
||||
Splitter.on(CharMatcher.anyOf(",\n")).trimResults(whitespace()).omitEmptyStrings();
|
||||
|
||||
private static final CharMatcher DIGIT_OR_UNDERSCORE = inRange('0', '9').or(is('_'));
|
||||
private static final CharMatcher UPPER_UNDERSCORE = inRange('A', 'Z').or(DIGIT_OR_UNDERSCORE);
|
||||
|
@ -77,39 +82,32 @@ public final class ConvertIcuDataTask extends Task {
|
|||
private boolean includePseudoLocales = false;
|
||||
private Predicate<String> idFilter = id -> true;
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setOutputDir(String path) {
|
||||
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
|
||||
config.setOutputDir(Paths.get(path));
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setCldrDir(String path) {
|
||||
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
|
||||
this.cldrPath = checkNotNull(Paths.get(path));
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setIcuVersion(String icuVersion) {
|
||||
config.setIcuVersion(icuVersion);
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setIcuDataVersion(String icuDataVersion) {
|
||||
config.setIcuDataVersion(icuDataVersion);
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setCldrVersion(String cldrVersion) {
|
||||
config.setCldrVersion(cldrVersion);
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setMinimalDraftStatus(String status) {
|
||||
minimumDraftStatus = resolve(CldrDraftStatus.class, status);
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setOutputTypes(String types) {
|
||||
ImmutableList<OutputType> typeList =
|
||||
LIST_SPLITTER
|
||||
|
@ -121,23 +119,19 @@ public final class ConvertIcuDataTask extends Task {
|
|||
}
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setSpecialsDir(String path) {
|
||||
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
|
||||
config.setSpecialsDir(Paths.get(path));
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setIncludePseudoLocales(boolean includePseudoLocales) {
|
||||
this.includePseudoLocales = includePseudoLocales;
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setLocaleIdFilter(String idFilterRegex) {
|
||||
this.idFilter = Pattern.compile(idFilterRegex).asPredicate();
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setEmitReport(boolean emit) {
|
||||
config.setEmitReport(emit);
|
||||
}
|
||||
|
@ -145,7 +139,6 @@ public final class ConvertIcuDataTask extends Task {
|
|||
public static final class LocaleIds extends Task {
|
||||
private ImmutableSet<String> ids;
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void addText(String localeIds) {
|
||||
this.ids = parseLocaleIds(localeIds);
|
||||
}
|
||||
|
@ -162,22 +155,18 @@ public final class ConvertIcuDataTask extends Task {
|
|||
private final List<ForcedAlias> forcedAliases = new ArrayList<>();
|
||||
private LocaleIds localeIds = null;
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setDir(String directory) {
|
||||
this.dir = resolve(IcuLocaleDir.class, directory);
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setInheritLanguageSubtag(String localeIds) {
|
||||
this.inheritLanguageSubtag = parseLocaleIds(localeIds);
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void addConfiguredForcedAlias(ForcedAlias alias) {
|
||||
forcedAliases.add(alias);
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void addConfiguredLocaleIds(LocaleIds localeIds) {
|
||||
checkBuild(this.localeIds == null,
|
||||
"Cannot add more that one <localeIds> element for <directory>: %s", dir);
|
||||
|
@ -195,12 +184,10 @@ public final class ConvertIcuDataTask extends Task {
|
|||
private String source = "";
|
||||
private String target = "";
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setSource(String source) {
|
||||
this.source = whitespace().trimFrom(source);
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setTarget(String target) {
|
||||
this.target = whitespace().trimFrom(target);
|
||||
}
|
||||
|
@ -217,17 +204,14 @@ public final class ConvertIcuDataTask extends Task {
|
|||
private String target = "";
|
||||
private ImmutableSet<String> localeIds = ImmutableSet.of();
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setTarget(String target) {
|
||||
this.target = target.replace('\'', '"');
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setSource(String source) {
|
||||
this.source = source.replace('\'', '"');
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setLocales(String localeIds) {
|
||||
this.localeIds = parseLocaleIds(localeIds);
|
||||
}
|
||||
|
@ -239,13 +223,11 @@ public final class ConvertIcuDataTask extends Task {
|
|||
}
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void addConfiguredLocaleIds(LocaleIds localeIds) {
|
||||
checkBuild(this.localeIds == null, "Cannot add more that one <localeIds> element");
|
||||
this.localeIds = localeIds;
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void addConfiguredDirectory(Directory filter) {
|
||||
checkState(!perDirectoryIds.containsKey(filter.dir),
|
||||
"directory %s specified twice", filter.dir);
|
||||
|
@ -289,14 +271,12 @@ public final class ConvertIcuDataTask extends Task {
|
|||
}
|
||||
|
||||
// Aliases on the outside are applied to all directories.
|
||||
@SuppressWarnings("unused")
|
||||
public void addConfiguredForcedAlias(ForcedAlias alias) {
|
||||
for (IcuLocaleDir dir : IcuLocaleDir.values()) {
|
||||
config.addForcedAlias(dir, alias.source, alias.target);
|
||||
}
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void addConfiguredAltPath(AltPath altPath) {
|
||||
// Don't convert to CldrPath here (it triggers a bunch of CLDR data loading for the DTDs).
|
||||
// Wait until the "execute()" method since in future we expect to use the configured CLDR
|
||||
|
@ -304,7 +284,6 @@ public final class ConvertIcuDataTask extends Task {
|
|||
altPaths.add(altPath);
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void execute() throws BuildException {
|
||||
// Spin up CLDRConfig outside of other inner loops, to
|
||||
// avoid static init problems seen in CLDR-14636
|
||||
|
@ -408,4 +387,128 @@ public final class ConvertIcuDataTask extends Task {
|
|||
"invalid enumeration name " + name + "; expected one of; " + validNames);
|
||||
}
|
||||
}
|
||||
|
||||
private static AltPath getAltPath(Element elem) {
|
||||
if (!"altPath".equals(elem.getTagName())) {
|
||||
return null;
|
||||
}
|
||||
String source = elem.getAttribute("source");
|
||||
String target = elem.getAttribute("target");
|
||||
String locales = elem.getAttribute("locales");
|
||||
AltPath ap = new AltPath();
|
||||
ap.setSource(source);
|
||||
ap.setTarget(target);
|
||||
ap.setLocales(locales);
|
||||
ap.init();
|
||||
return ap;
|
||||
}
|
||||
|
||||
private static ForcedAlias getForcedAlias(Element elem) {
|
||||
if (!"forcedAlias".equals(elem.getTagName())) {
|
||||
return null;
|
||||
}
|
||||
String source = elem.getAttribute("source");
|
||||
String target = elem.getAttribute("target");
|
||||
ForcedAlias fa = new ForcedAlias();
|
||||
fa.setSource(source);
|
||||
fa.setTarget(target);
|
||||
fa.init();
|
||||
return fa;
|
||||
}
|
||||
|
||||
private static LocaleIds getLocaleIds(Element elem) {
|
||||
if (!"localeIds".equals(elem.getTagName())) {
|
||||
return null;
|
||||
}
|
||||
LocaleIds localeIds = new LocaleIds();
|
||||
String strLocaleIds = elem.getTextContent();
|
||||
localeIds.addText(strLocaleIds);
|
||||
localeIds.init();
|
||||
return localeIds;
|
||||
}
|
||||
|
||||
private static Directory getDirectory(Element element) {
|
||||
if (!"directory".equals(element.getTagName())) {
|
||||
return null;
|
||||
}
|
||||
String dir = element.getAttribute("dir");
|
||||
String inheritLanguageSubtag = element.getAttribute("inheritLanguageSubtag");
|
||||
Directory directory = new Directory();
|
||||
directory.setDir(dir);
|
||||
directory.setInheritLanguageSubtag(inheritLanguageSubtag);
|
||||
Node node = element.getFirstChild();
|
||||
while (node != null) {
|
||||
if (node.getNodeType() == Node.ELEMENT_NODE) {
|
||||
Element childElement = (Element) node;
|
||||
switch (childElement.getTagName()) {
|
||||
case "localeIds":
|
||||
LocaleIds localeIds = getLocaleIds(childElement);
|
||||
directory.addConfiguredLocaleIds(localeIds);
|
||||
break;
|
||||
case "forcedAlias":
|
||||
ForcedAlias fa = getForcedAlias(childElement);
|
||||
directory.addConfiguredForcedAlias(fa);
|
||||
break;
|
||||
default:
|
||||
}
|
||||
}
|
||||
node = node.getNextSibling();
|
||||
}
|
||||
if (directory.localeIds == null) {
|
||||
directory.addConfiguredLocaleIds(new LocaleIds());
|
||||
}
|
||||
directory.init();
|
||||
return directory;
|
||||
}
|
||||
|
||||
public static ConvertIcuDataTask fromXml(String fileName) {
|
||||
try {
|
||||
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
|
||||
Document doc = builder.parse(new File(fileName));
|
||||
Element root = doc.getDocumentElement();
|
||||
if (!"config".equals(root.getTagName())) {
|
||||
System.err.println("The root of the config file should be <config>");
|
||||
return null;
|
||||
}
|
||||
|
||||
NodeList convertNodes = root.getElementsByTagName("convert");
|
||||
if (convertNodes.getLength() != 1) {
|
||||
System.err.println("Exactly one <convert> element allowed and required");
|
||||
return null;
|
||||
}
|
||||
ConvertIcuDataTask converter = new ConvertIcuDataTask();
|
||||
Node node = convertNodes.item(0).getFirstChild();
|
||||
while (node != null) {
|
||||
if (node instanceof Element) {
|
||||
Element childElement = (Element) node;
|
||||
String nodeName = childElement.getTagName();
|
||||
switch (nodeName) {
|
||||
case "localeIds":
|
||||
LocaleIds localeIds = getLocaleIds(childElement);
|
||||
converter.addConfiguredLocaleIds(localeIds);
|
||||
break;
|
||||
case "directory":
|
||||
Directory directory = getDirectory(childElement);
|
||||
converter.addConfiguredDirectory(directory);
|
||||
break;
|
||||
case "forcedAlias":
|
||||
ForcedAlias fa = getForcedAlias(childElement);
|
||||
converter.addConfiguredForcedAlias(fa);
|
||||
break;
|
||||
case "altPath":
|
||||
AltPath altPath = getAltPath(childElement);
|
||||
converter.addConfiguredAltPath(altPath);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
node = node.getNextSibling();
|
||||
}
|
||||
return converter;
|
||||
} catch (Exception e) {
|
||||
e.printStackTrace();
|
||||
}
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
|
|
@ -12,12 +12,9 @@ import java.nio.file.Files;
|
|||
import java.nio.file.Path;
|
||||
import java.nio.file.Paths;
|
||||
|
||||
import org.apache.tools.ant.BuildException;
|
||||
import org.apache.tools.ant.Task;
|
||||
import org.unicode.icu.tool.cldrtoicu.CodeGenerator;
|
||||
import org.unicode.icu.tool.cldrtoicu.generator.ResourceFallbackCodeGenerator;
|
||||
|
||||
// Note: Auto-magical Ant methods are listed as "unused" by IDEs, unless the warning is suppressed.
|
||||
public final class GenerateCodeTask extends Task {
|
||||
private Path cldrPath;
|
||||
private Path cOutDir;
|
||||
|
@ -40,31 +37,26 @@ public final class GenerateCodeTask extends Task {
|
|||
new GeneratedFileDef("common/localefallback_data.h", "src/main/java/com/ibm/icu/impl/LocaleFallbackData.java", new ResourceFallbackCodeGenerator()),
|
||||
};
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setCldrDir(String path) {
|
||||
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
|
||||
this.cldrPath = checkNotNull(Paths.get(path));
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setCOutDir(String path) {
|
||||
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
|
||||
this.cOutDir = Paths.get(path);
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setJavaOutDir(String path) {
|
||||
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
|
||||
this.javaOutDir = Paths.get(path);
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void setAction(String action) {
|
||||
// Use String here since on some systems Ant doesn't support automatically converting Path instances.
|
||||
this.action = action;
|
||||
}
|
||||
|
||||
@SuppressWarnings("unused")
|
||||
public void execute() throws BuildException {
|
||||
for (GeneratedFileDef task : generatedFileDefs) {
|
||||
Path cOutPath = cOutDir.resolve(task.cRelativePath);
|
||||
|
@ -91,5 +83,4 @@ public final class GenerateCodeTask extends Task {
|
|||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
|
|
@ -0,0 +1,25 @@
|
|||
// © 2024 and later: Unicode, Inc. and others.
|
||||
// License & terms of use: http://www.unicode.org/copyright.html
|
||||
package org.unicode.icu.tool.cldrtoicu.ant;
|
||||
|
||||
public class Task {
|
||||
public static class BuildException extends RuntimeException {
|
||||
private static final long serialVersionUID = 2430911677116799373L;
|
||||
|
||||
public BuildException(String message, Throwable cause) {
|
||||
super(message, cause);
|
||||
}
|
||||
|
||||
public BuildException(String message) {
|
||||
super(message);
|
||||
}
|
||||
}
|
||||
|
||||
void log(String format) {
|
||||
System.out.println(format);
|
||||
}
|
||||
|
||||
public void execute() throws BuildException {}
|
||||
|
||||
public void init() throws BuildException {}
|
||||
}
|
|
@ -1,3 +1,3 @@
|
|||
© 2016 and later: Unicode, Inc. and others.
|
||||
License & terms of use: http://www.unicode.org/copyright.html
|
||||
Generated using tools/cldr/cldr-to-icu/build-icu-data.xml
|
||||
Generated using tools/cldr/cldr-to-icu/
|
||||
|
|
|
@ -4,7 +4,6 @@ package org.unicode.icu.tool.cldrtoicu;
|
|||
|
||||
import static com.google.common.truth.Truth.assertThat;
|
||||
import static com.google.common.truth.Truth.assertWithMessage;
|
||||
import static com.google.common.truth.Truth8.assertThat;
|
||||
import static org.unicode.cldr.api.CldrValue.parseValue;
|
||||
|
||||
import java.nio.file.Path;
|
||||
|
@ -38,7 +37,11 @@ public class SupplementalDataTest {
|
|||
|
||||
@BeforeClass
|
||||
public static void loadRegressionData() {
|
||||
Path cldrRoot = Paths.get(System.getProperty("CLDR_DIR"));
|
||||
String cldrDir = System.getProperty("CLDR_DIR");
|
||||
if (cldrDir == null) {
|
||||
cldrDir = System.getenv("CLDR_DIR");
|
||||
}
|
||||
Path cldrRoot = Paths.get(cldrDir);
|
||||
regressionData = SupplementalData.create(CldrDataSupplier.forCldrFilesIn(cldrRoot));
|
||||
likelySubtags = new LikelySubtags();
|
||||
}
|
||||
|
|
|
@ -18,7 +18,7 @@ import java.util.Arrays;
|
|||
public class CleanOutputDirectoryTaskTest {
|
||||
// Not using the original field since we want this test to fail if this changes unexpectedly.
|
||||
private static final String WAS_GENERATED_LABEL =
|
||||
"Generated using tools/cldr/cldr-to-icu/build-icu-data.xml";
|
||||
"Generated using tools/cldr/cldr-to-icu/";
|
||||
|
||||
// Commented version of the label for test data.
|
||||
private static final String WAS_GENERATED_LINE = "// " + WAS_GENERATED_LABEL;
|
||||
|
|
|
@ -135,14 +135,15 @@ public class LocaleDistanceMapperTest {
|
|||
// LSR values come in (language, script, region) tuples. They are the mapped-to
|
||||
// values for the likely subtag mappings, ordered by the DTD order in which the
|
||||
// mapping keys were encountered.
|
||||
assertThat(icuData).hasValuesFor("likely/lsrs",
|
||||
"", "", "",
|
||||
"skip", "script", "",
|
||||
"zh", "Hans", "CN",
|
||||
"zh", "Hant", "TW",
|
||||
"en", "Latn", "US",
|
||||
"zh", "Hant", "HK",
|
||||
"zh", "Hant", "MO");
|
||||
assertThat(icuData).hasValuesFor("likely/lsrnum:intvector",
|
||||
"0", // "", "", ""
|
||||
"1", // "skip", "script", ""
|
||||
"1232236233", // "zh", "Hans", "CN"
|
||||
"1254131029", // "zh", "Hant", "TW"
|
||||
"429941505", // "en", "Latn", "US"
|
||||
"1247517541", // "zh", "Hant", "HK"
|
||||
"1249741720" // "zh", "Hant", "MO"
|
||||
);
|
||||
|
||||
// It's a bit easier to see how match keys are grouped against the partitions.
|
||||
ImmutableSetMultimap<Integer, String> likelyTrie =
|
||||
|
@ -174,11 +175,12 @@ public class LocaleDistanceMapperTest {
|
|||
|
||||
// Pairs of expanded paradigm locales (using LSR tuples) in declaration order.
|
||||
// This is just the list from the CLDR data with no processing.
|
||||
assertThat(icuData).hasValuesFor("match/paradigms",
|
||||
"en", "Latn", "US",
|
||||
"en", "Latn", "GB",
|
||||
"es", "Latn", "ES",
|
||||
"es", "Latn", "419");
|
||||
assertThat(icuData).hasValuesFor("match/paradigmnum:intvector",
|
||||
"429941505", // "en", "Latn", "US"
|
||||
"420631446", // "en", "Latn", "GB"
|
||||
"429626712", // "es", "Latn", "ES"
|
||||
"419470284" // "es", "Latn", "419"
|
||||
);
|
||||
|
||||
// See PartitionInfoTest for a description of the ordering of these strings.
|
||||
assertThat(icuData).hasValuesFor("match/partitions",
|
||||
|
|
|
@ -28,7 +28,9 @@ public class Bcp47MapperTest {
|
|||
RbPath.of("typeAlias", "timezone:alias"),
|
||||
RbValue.of("/ICUDATA/timezoneTypes/typeAlias/timezone"),
|
||||
RbPath.of("typeMap", "timezone:alias"),
|
||||
RbValue.of("/ICUDATA/timezoneTypes/typeMap/timezone"));
|
||||
RbValue.of("/ICUDATA/timezoneTypes/typeMap/timezone"),
|
||||
RbPath.of("ianaMap", "timezone:alias"),
|
||||
RbValue.of("/ICUDATA/timezoneTypes/ianaMap/timezone"));
|
||||
|
||||
@Test
|
||||
public void testSimple() {
|
||||
|
|
|
@ -1,101 +0,0 @@
|
|||
*********************************************************************
|
||||
*** © 2019 and later: Unicode, Inc. and others. ***
|
||||
*** License & terms of use: http://www.unicode.org/copyright.html ***
|
||||
*********************************************************************
|
||||
|
||||
What is this directory and why is it empty?
|
||||
-------------------------------------------
|
||||
|
||||
This is the root of a local Maven repository which needs to be populated before
|
||||
code which uses the CLDR data API can be executed.
|
||||
|
||||
To do this, you need to have a local copy of the CLDR project configured on your
|
||||
computer and be able able to build the API jar file and copy an existing utility
|
||||
jar file. In the examples below it is assumed that $CLDR_ROOT references this
|
||||
CLDR release.
|
||||
|
||||
Setup
|
||||
-----
|
||||
|
||||
This project relies on the Maven build tool for managing dependencies and uses
|
||||
Ant for configuration purposes, so both will need to be installed. On a Debian
|
||||
based system, this should be as simple as:
|
||||
|
||||
$ sudo apt-get install maven ant
|
||||
|
||||
|
||||
Installing the CLDR API jar
|
||||
---------------------------
|
||||
|
||||
From this directory:
|
||||
|
||||
$ ./install-cldr-jars.sh "$CLDR_DIR"
|
||||
|
||||
|
||||
Manually installing the CLDR API jar
|
||||
------------------------------------
|
||||
|
||||
Only follow these remaining steps if the installation script isn't suitable or
|
||||
doesn't work on your system.
|
||||
|
||||
To regenerate the CLDR API jar you need to build the "jar" target manually
|
||||
using the Maven pom.xml file in the "tools" directory of the CLDR project:
|
||||
|
||||
$ cd "$CLDR_ROOT/tools"
|
||||
$ mvn package -DskipTests=true
|
||||
|
||||
This should result in the cldr-code.jar file being built into the cldr-code/target
|
||||
sub-directory, which can then be installed as a Maven dependency as described above.
|
||||
|
||||
|
||||
Updating local Maven repository
|
||||
-------------------------------
|
||||
|
||||
To update the local Maven repository (e.g. to install the CLDR jar) then from
|
||||
this directory (lib/) you should run:
|
||||
|
||||
$ mvn install:install-file \
|
||||
-Dproject.parent.relativePath="" \
|
||||
-DgroupId=org.unicode.cldr \
|
||||
-DartifactId=cldr-api \
|
||||
-Dversion=0.1-SNAPSHOT \
|
||||
-Dpackaging=jar \
|
||||
-DgeneratePom=true \
|
||||
-DlocalRepositoryPath=. \
|
||||
-Dfile="$CLDR_ROOT/tools/cldr-code/target/cldr-code.jar"
|
||||
|
||||
And if you have updated one of these libraries then from this directory run:
|
||||
|
||||
$ mvn dependency:purge-local-repository \
|
||||
-Dproject.parent.relativePath="" \
|
||||
-DmanualIncludes=org.unicode.cldr:cldr-api:jar
|
||||
|
||||
After doing this, you should see something like the following list of files in
|
||||
this directory:
|
||||
|
||||
README.txt <-- this file
|
||||
org/unicode/cldr/cldr-api/maven-metadata-local.xml
|
||||
org/unicode/cldr/cldr-api/0.1-SNAPSHOT/maven-metadata-local.xml
|
||||
org/unicode/cldr/cldr-api/0.1-SNAPSHOT/cldr-api-0.1-SNAPSHOT.pom
|
||||
org/unicode/cldr/cldr-api/0.1-SNAPSHOT/cldr-api-0.1-SNAPSHOT.jar
|
||||
|
||||
Finally, if you choose to update the version number of the snapshot, then also
|
||||
update all the the pom.xml files which reference it (but this is unlikely to be
|
||||
necessary).
|
||||
|
||||
Troubleshooting
|
||||
---------------
|
||||
|
||||
While the Maven system should keep the CLDR JAR up to date, there is a chance
|
||||
that you may have an out of date JAR installed elsewhere. If you have any
|
||||
issues with the JAR not being the expected version (e.g. after making changes)
|
||||
then run the above "purge" step again, from this directory.
|
||||
|
||||
This should re-resolve the current JAR snapshot from the repository in this
|
||||
directory. Having purged the Maven cache, next time you build a project, you
|
||||
should see something like:
|
||||
|
||||
[exec] Downloading from <xxx>: <url>/org/unicode/cldr/cldr-api/0.1-SNAPSHOT/maven-metadata.xml
|
||||
[exec] [INFO] Building jar: <path-to-icu-root>/tools/cldr/cldr-to-icu/target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar
|
||||
|
||||
This shows that it has had to re-fetch the JAR file.
|
|
@ -1,102 +0,0 @@
|
|||
#!/bin/bash -u
|
||||
#
|
||||
#####################################################################
|
||||
### © 2020 and later: Unicode, Inc. and others. ###
|
||||
### License & terms of use: http://www.unicode.org/copyright.html ###
|
||||
#####################################################################
|
||||
#
|
||||
# This script will attempt to build and install the necessary CLDR JAR files
|
||||
# from a given CLDR installation root directory. The JAR files are installed
|
||||
# according to the manual instructions given in README.txt and lib/README.txt.
|
||||
#
|
||||
# The user must have installed both 'ant' and 'maven' in accordance with the
|
||||
# instructions in README.txt before attempting to run this script.
|
||||
#
|
||||
# Usage (from the directory of this script):
|
||||
#
|
||||
# ./install-cldr-jars.sh <CLDR-root-directory>
|
||||
#
|
||||
# Note to maintainers: This script cannot be assumed to run on a Unix/Linux
|
||||
# based system, and while a Posix compliant bash shell is required, any
|
||||
# assumptions about auxiliary Unix tools should be minimized (e.g. things
|
||||
# like "dirname" or "tempfile" may not exist). Where bash-only alternatives
|
||||
# have to be used, they should be clearly documented.
|
||||
|
||||
# Exit with a message for fatal errors.
|
||||
function die() {
|
||||
echo "$1"
|
||||
echo "Exiting..."
|
||||
exit 1
|
||||
} >&2
|
||||
|
||||
# Runs a given command and captures output to the global log file.
|
||||
# If a command errors, the user can then view the log file.
|
||||
function run_with_logging() {
|
||||
echo >> "${LOG_FILE}"
|
||||
echo "Running: ${@}" >> "${LOG_FILE}"
|
||||
echo -- "----------------------------------------------------------------" >> "${LOG_FILE}"
|
||||
"${@}" >> "${LOG_FILE}" 2>&1
|
||||
if (( $? != 0 )) ; then
|
||||
echo -- "---- Previous command failed ----" >> "${LOG_FILE}"
|
||||
echo "Error running: ${@}"
|
||||
read -p "Show log file? " -n 1 -r
|
||||
echo
|
||||
if [[ "${REPLY}" =~ ^[Yy]$ ]] ; then
|
||||
less -RX "${LOG_FILE}"
|
||||
fi
|
||||
echo "Log file: ${LOG_FILE}"
|
||||
exit 1
|
||||
fi
|
||||
echo -- "---- Previous command succeeded ----" >> "${LOG_FILE}"
|
||||
}
|
||||
|
||||
# First require that we are run from the same directory as the script.
|
||||
# Can't assume users have "dirname" available so hack it a bit with shell
|
||||
# substitution (if no directory path was prepended, SCRIPT_DIR==$0).
|
||||
SCRIPT_DIR=${0%/*}
|
||||
if [[ "$SCRIPT_DIR" != "$0" ]] ; then
|
||||
cd $SCRIPT_DIR
|
||||
fi
|
||||
|
||||
# Check for some expected environmental things early.
|
||||
which ant > /dev/null || die "Cannot find Ant executable 'ant' in the current path."
|
||||
which mvn > /dev/null || die "Cannot find Maven executable 'mvn' in the current path."
|
||||
|
||||
# Check there's one argument that points at a directory (or a symbolic link to a directory).
|
||||
(( $# == 1 )) && [[ -d "$1" ]] || die "Usage: ./install-cldr-jars.sh <CLDR-root-directory>"
|
||||
|
||||
# Set up a log file (and be nice about tidying it up).
|
||||
# Cannot assume "tempfile" exists so use a timestamp (we expect "date" to exist though).
|
||||
LOG_FILE="${TMPDIR:-/tmp}/cldr2icu_log_$(date '+%m%d_%H%M%S').txt"
|
||||
touch $LOG_FILE || die "Cannot create temporary file: ${LOG_FILE}"
|
||||
echo -- "---- LOG FILE ---- $(date '+%F %T') ----" >> "${LOG_FILE}"
|
||||
|
||||
# Build the cldr-code.jar in the cldr-code/target subdirectory of the CLDR tools directory.
|
||||
CLDR_TOOLS_DIR="$1/tools"
|
||||
pushd "${CLDR_TOOLS_DIR}" > /dev/null || die "Cannot change directory to: ${CLDR_TOOLS_DIR}"
|
||||
|
||||
echo "Building CLDR JAR file..."
|
||||
run_with_logging mvn package -DskipTests=true
|
||||
[[ -f "cldr-code/target/cldr-code.jar" ]] || die "Error creating cldr-code.jar file"
|
||||
|
||||
popd > /dev/null
|
||||
|
||||
# The -B flag is "batch" mode and won't mess about with escape codes in the log file.
|
||||
echo "Installing CLDR JAR file..."
|
||||
run_with_logging mvn -B install:install-file \
|
||||
-Dproject.parent.relativePath="" \
|
||||
-DgroupId=org.unicode.cldr \
|
||||
-DartifactId=cldr-api \
|
||||
-Dversion=0.1-SNAPSHOT \
|
||||
-Dpackaging=jar \
|
||||
-DgeneratePom=true \
|
||||
-DlocalRepositoryPath=. \
|
||||
-Dfile="${CLDR_TOOLS_DIR}/cldr-code/target/cldr-code.jar"
|
||||
|
||||
echo "Syncing local Maven repository..."
|
||||
run_with_logging mvn -B dependency:purge-local-repository \
|
||||
-Dproject.parent.relativePath="" \
|
||||
-DmanualIncludes=org.unicode.cldr:cldr-api:jar
|
||||
|
||||
echo "All done!"
|
||||
echo "Log file: ${LOG_FILE}"
|
|
@ -1,53 +0,0 @@
|
|||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!-- © 2020 and later: Unicode, Inc. and others.
|
||||
License & terms of use: http://www.unicode.org/copyright.html
|
||||
See README.txt for instructions on updating the local repository.
|
||||
-->
|
||||
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
|
||||
<!-- This POM file acts as a parent POM file for any tool which is built
|
||||
via Maven and requires access to the CLDR data APIs. This POM file
|
||||
and the other files in this directory encapsulate the somewhat messy
|
||||
task of including the Ant-built CLDR JAR file in Maven projects. -->
|
||||
|
||||
<!-- Declares this to be a POM that's included by other POM files. -->
|
||||
<packaging>pom</packaging>
|
||||
|
||||
<!-- This must match any child POM file's <parent> declaration. -->
|
||||
<groupId>org.unicode.icu</groupId>
|
||||
<artifactId>cldr-lib</artifactId>
|
||||
<version>1.0</version>
|
||||
|
||||
<!-- Important: The "${project.basedir}" property is the directory of the
|
||||
child POM file, not this directory (and there's no easy way in Maven
|
||||
to identify the absolute path of a parent POM file). However since
|
||||
child POM files should have a <parent> declaration with the relative
|
||||
path in it, we can use that. Note however that this is a bit fragile
|
||||
and relies on <relativePath> being a directory, not a POM file.
|
||||
|
||||
In order to allow the local repository to work either when it is used
|
||||
by a child POM file or when it's used directly (e.g. for installing
|
||||
or purging the cache) when it is invoked from this directory, the
|
||||
-Dproject.parent.relativePath=""
|
||||
argument must be given. -->
|
||||
<repositories>
|
||||
<repository>
|
||||
<id>local-maven-repo</id>
|
||||
<url>file://${project.basedir}/${project.parent.relativePath}</url>
|
||||
</repository>
|
||||
</repositories>
|
||||
|
||||
<!-- Ant-built JAR file(s) installed into the local Maven repository in this
|
||||
directory by the 'install-cldr-jars.sh' script. -->
|
||||
<dependencies>
|
||||
<dependency>
|
||||
<groupId>org.unicode.cldr</groupId>
|
||||
<artifactId>cldr-api</artifactId>
|
||||
<version>0.1-SNAPSHOT</version>
|
||||
</dependency>
|
||||
</dependencies>
|
||||
</project>
|
||||
|
Loading…
Add table
Reference in a new issue