ICU-22707 change log for Unicode 16 minus hardcoded props

This commit is contained in:
Markus Scherer 2024-03-26 20:11:40 -07:00
parent a57c68364e
commit 2688bca066

View file

@ -59,6 +59,377 @@ TODO
+ No more need to check via grep.
+ Still: If the test fails, then update the hardcoded implementation.
https://www.unicode.org/versions/Unicode15.1.0/
https://www.unicode.org/versions/beta-15.1.0.html
https://www.unicode.org/Public/draft/
https://www.unicode.org/reports/uax-proposed-updates.html
https://www.unicode.org/reports/tr44/tr44-31.html
https://unicode-org.atlassian.net/browse/ICU-22404 Unicode 15.1
https://unicode-org.atlassian.net/browse/CLDR-16669 BRS Unicode 15.1
https://github.com/unicode-org/unicodetools/issues/492 adjust cldr/*BreakTest generation for Unicode 15.1
* Command-line environment setup
Markus:
export UNIDATA_ROOT=~/unidata
export UNICODE_DATA=$UNIDATA_ROOT/uni15.1/final
export CLDR_SRC=~/cldr/uni/src
export ICU_ROOT=~/icu/uni
export ICU_SRC=$ICU_ROOT/src
export ICU_OUT=$ICU_ROOT/dbg
export ICUDT=icudt74b
export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
export LD_LIBRARY_PATH=$ICU_OUT/icu4c/lib
export UNICODE_TOOLS=~/unitools/mine/src
Elango:
export UNIDATA_ROOT=~/oss/unidata
export UNICODE_DATA=$UNIDATA_ROOT/uni15.1/snapshot
export CLDR_SRC=~/oss/cldr/mine/src
export ICU_ROOT=~/oss/icu
export ICU_SRC=$ICU_ROOT
export ICU_OUT=$ICU_ROOT
export ICUDT=icudt74b
export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
export LD_LIBRARY_PATH=$ICU_OUT/icu4c/lib
export UNICODE_TOOLS=~/oss/unicodetools/mine/src
*** Unicode version numbers
- makedata.mak
- uchar.h
- com.ibm.icu.util.VersionInfo
- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
*** Configure: Build Unicode data for ICU4J
- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
so that the makefiles see the new version number.
cd $ICU_OUT/icu4c
ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
*** data files & enums & parser code
* download files
- same as for the early Unicode Tools setup and data refresh:
https://github.com/unicode-org/unicodetools/blob/main/docs/index.md
https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md
- mkdir -p $UNICODE_DATA
- download Unicode files into $UNICODE_DATA
+ new since Unicode 15.1:
for the pre-release (alpha, beta) data files,
download all of https://www.unicode.org/Public/draft/
(you can omit or discard the UCD/charts/ and UCD/ucdxml/ files/folders)
+ if one of us produces the alpha.zip or beta.zip collection of data files for publication,
then we can use its contents directly (no FTP from unicode.org necessary)
+ for final-release data files, the source of truth are the files in
https://www.unicode.org/Public/(version) [=UCD],
https://www.unicode.org/Public/UCA/(version),
https://www.unicode.org/Public/idna/(version),
etc.
+ use an FTP client; anonymous FTP from www.unicode.org at /Public/draft etc.
+ subfolders: emoji, idna, security, ucd, uca
+ whichever way you download the files:
~ inside ucd: extract Unihan.zip to "here" (.../UCD/ucd/Unihan/*.txt), delete Unihan.zip
~ split Unihan into single-property files
~/unitools/mine/src$ py/splitunihan.py $UNICODE_DATA/UCD/ucd/Unihan
~ TODO: for updating ICU, we should not need Unihan.zip contents, correct?
+ alternate way of fetching files, if available:
copy the files from a Unicode Tools workspace that is up to date with
https://github.com/unicode-org/unicodetools
and which might at this point be *ahead* of "Public"
~ before the Unicode release copy files from "dev" subfolders, for example
https://github.com/unicode-org/unicodetools/tree/main/unicodetools/data/ucd/dev
- get the CLDR version of GraphemeBreakTest.txt from CLDR (if it has been updated there already)
or from the UCD/cldr/ output folder of the Unicode Tools:
From Unicode 12/CLDR 35/ICU 64 to Unicode 15.0/CLDR 43/ICU 73,
CLDR used modified grapheme break rules.
This might happen again.
cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
or
cp ~/unitools/mine/Generated/UCD/15.1.0/cldr/GraphemeBreakTest-cldr.txt icu4c/source/test/testdata/GraphemeBreakTest.txt
cp ~/unitools/mine/Generated/UCD/15.1.0/cldr/GraphemeBreakTest-cldr.txt $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
cp ~/unitools/mine/Generated/UCD/15.1.0/cldr/GraphemeBreakTest-cldr.html $CLDR_SRC/common/properties/segments/GraphemeBreakTest.html
+ TODO: figure out whether we need a CLDR version of LineBreakTest.txt:
unicodetools issue #492
- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
+ TODO: modify preparseucd.py to copy this file
* Note: Since Unicode 15.1, data files are no longer published with version suffixes
even during the alpha or beta.
Thus we no longer need steps & tools to remove those suffixes.
(remove this note next time)
* process and/or copy files
- cd $ICU_SRC/tools/unicode
py/preparseucd.py $UNICODE_DATA $ICU_SRC
+ This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+ For debugging, and tweaking how ppucd.txt is written,
the tool has an --only_ppucd option:
py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
* new constants for new property values
- preparseucd.py error:
ValueError: missing uchar.h enum constants for some property values: [('blk', {'CJK_Ext_I'}), ('lb', {'VF', 'VI', 'AS', 'AK', 'AP'})]
= PropertyValueAliases.txt new property values (diff old & new .txt files)
cd $UNIDATA_ROOT
$ diff -u uni15.0/ucd/PropertyValueAliases.txt uni15.1/snapshot/UCD/ucd/PropertyValueAliases.txt | egrep '^[-+][a-zA-Z]'
+age; 15.1 ; V15_1
+blk; CJK_Ext_I ; CJK_Unified_Ideographs_Extension_I
+IDSU; N ; No ; F ; False
+IDSU; Y ; Yes ; T ; True
+ID_Compat_Math_Continue; N ; No ; F ; False
+ID_Compat_Math_Continue; Y ; Yes ; T ; True
+ID_Compat_Math_Start; N ; No ; F ; False
+ID_Compat_Math_Start; Y ; Yes ; T ; True
+lb ; AK ; Aksara
+lb ; AP ; Aksara_Prebase
+lb ; AS ; Aksara_Start
+lb ; VF ; Virama_Final
+lb ; VI ; Virama
-> add new blocks to uchar.h before UBLOCK_COUNT
use long property names for enum constants,
for the trailing comment get the block start code point: diff old & new Blocks.txt
cd $UNIDATA_ROOT
$ diff -u uni15.0/ucd/Blocks.txt uni15.1/snapshot/UCD/ucd/Blocks.txt | egrep '^[-+][0-9A-Z]'
+2EBF0..2EE4F; CJK Unified Ideographs Extension I
(ignore blocks whose end code point changed)
-> add new blocks to UCharacter.UnicodeBlock IDs
Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
replace public static final int \1_ID = \2; \3
-> add new blocks to UCharacter.UnicodeBlock objects
Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
-> add new line break values to uchar.h & UCharacter.LineBreak
* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
(not strictly necessary for NOT_ENCODED scripts)
$ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
* build ICU
to make sure that there are no syntax errors
$ICU_OUT/icu4c$ echo;echo; date; make -j7 tests &> out.txt ; tail -n 30 out.txt ; date
* update spoof checker UnicodeSet initializers:
inclusionPat & recommendedPat in i18n/uspoof.cpp
INCLUSION & RECOMMENDED in SpoofChecker.java
- make sure that the Unicode Tools tree contains the latest security data files
- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
- run the tool (no special environment variables needed)
cd $UNICODE_TOOLS
mvn -s ~/.m2/settings.xml compile exec:java -Dexec.mainClass="org.unicode.text.tools.RecommendedSetGenerator" \
-Dexec.args="" -am -pl unicodetools -DCLDR_DIR=$(cd ../../../cldr/mine/src ; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd)
- copy & paste from the Console output into the .cpp & .java files
* check hardcoded IDS_Unary_Operator
- new in Unicode 15.1, hardcoded because trivial, and unlikely to change
- check that it has not changed:
(cd $UNICODE_DATA && grep -r --include=PropList.txt IDS_Unary_Operator)
- if it has changed, then update the implementation and the tests
- Since ICU 75, this property is tested in C++ intltest against ppucd.txt.
* check hardcoded ID_Compat_Math_Start & ID_Compat_Math_Continue
- new in Unicode 15.1, hardcoded because trivial, and unlikely to change
- check that they have not changed:
(cd $UNICODE_DATA && grep -r --include=PropList.txt ID_Compat_Math)
- if they have changed, then update the implementation and the tests
- Since ICU 75, these properties are tested in C++ intltest against ppucd.txt.
* Bazel build process
See https://unicode-org.github.io/icu/processes/unicode-update#bazel-build-process
for an overview and for setup instructions.
Consider running `bazelisk --version` outside of the $ICU_SRC folder
to find out the latest `bazel` version, and
copying that version number into the $ICU_SRC/.bazeliskrc config file.
(Revert if you find incompatibilities, or, better, update our build & config files.)
* generate data files
- remember to define the environment variables
(see the start of the section for this Unicode version)
- cd $ICU_SRC
- optional but not necessary:
bazelisk clean
or even
bazelisk clean --expunge
- build/bootstrap/generate new files:
icu4c/source/data/unidata/generate.sh
* Since Unicode 15.1, the UTS #46 data derivation no longer looks at the decompositions (NFD).
These characters are now just valid, no longer disallowed_STD3_valid.
Remove special handling of U+2260, U+226E, U+226F (isNonASCIIDisallowedSTD3Valid())
from uts46.cpp & UTS46.java,
and special test code from uts46test.cpp & UTS46Test.java.
(remove this section next time)
* run & fix ICU4C tests
- Note: Some of the collation data and test data will be updated below,
so at this time we might get some collation test failures.
Ignore these for now.
- fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
- update CLDR GraphemeBreakTest.txt
cd ~/unitools/mine/Generated
cp UCD/15.1.0/cldr/GraphemeBreakTest-cldr.txt $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
cp UCD/15.1.0/cldr/GraphemeBreakTest-cldr.html $CLDR_SRC/common/properties/segments/GraphemeBreakTest.html
cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt $ICU_SRC/icu4c/source/test/testdata
- Robin or Andy helps with RBBI & spoof check test failures
* collation: CLDR collation root, UCA DUCET
- UCA DUCET goes into Mark's Unicode tools,
and a tool-tailored version goes into CLDR, see
https://github.com/unicode-org/unicodetools/blob/main/docs/uca/index.md
- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
(note removing the underscore before "Rules")
cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
- restore TODO diffs in UCARules.txt
meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
- update (ICU4C)/source/test/testdata/CollationTest_*.txt
and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
from the CLDR root files (..._CLDR_..._SHORT.txt)
cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
- if CLDR common/uca/unihan-index.txt changes, then update
CLDR common/collation/root.xml <collation type="private-unihan">
and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
- generate data files, as above (generate.sh), now to pick up new collation data
- update CollationFCD.java:
copy & paste the initializers of lcccIndex[] etc. from
ICU4C/source/i18n/collationfcd.cpp to
ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
- rebuild ICU4C (make clean, make check, as usual)
* Unihan collators
https://github.com/unicode-org/unicodetools/blob/main/docs/unihan.md
- run Unicode Tools GenerateUnihanCollators & GenerateUnihanCollatorFiles,
check CLDR diffs, copy to CLDR, test CLDR, ... as documented there
- generate ICU zh collation data
instructions inspired by
https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/README.txt and
https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt
+ setup:
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
(didn't work without setting JAVA_HOME,
nor with the Google default of /usr/local/buildtools/java/jdk
[Google security limitations in the XML parser])
export TOOLS_ROOT=$ICU_SRC/tools
export CLDR_DIR=$CLDR_SRC
export CLDR_DATA_DIR=$CLDR_DIR
(pointing to the "raw" data, not cldr-staging/.../production should be ok for the relevant files)
cd "$TOOLS_ROOT/cldr/lib"
./install-cldr-jars.sh "$CLDR_DIR"
+ generate the files we need
cd "$TOOLS_ROOT/cldr/cldr-to-icu"
ant -f build-icu-data.xml -DoutDir=/tmp/icu -DoutputTypes=coll,transforms -DlocaleIdFilter='zh.*'
+ diff
cd $ICU_SRC
meld icu4c/source/data/coll/zh.txt /tmp/icu/coll/zh.txt
meld icu4c/source/data/translit/Hani_Latn.txt /tmp/icu/translit/Hani_Latn.txt
+ copy into the source tree
cd $ICU_SRC
cp /tmp/icu/coll/zh.txt icu4c/source/data/coll/zh.txt
cp /tmp/icu/translit/Hani_Latn.txt icu4c/source/data/translit/Hani_Latn.txt
- rebuild ICU4C
* run & fix ICU4C tests, now with new CLDR collation root data
- run all tests with the collation test data *_SHORT.txt or the full files
(the full ones have comments, useful for debugging)
- note on intltest: if collate/UCAConformanceTest fails, then
utility/MultithreadTest/TestCollators will fail as well;
fix the conformance test before looking into the multi-thread test
* update Java data files
- refresh just the UCD/UCA-related/derived files, just to be safe
- see (ICU4C)/source/data/icu4j-readme.txt
- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
- $ICU_OUT/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
NOTE: If you get the error "No rule to make target 'out/build/icudt70l/uprops.icu'",
you need to reconfigure with unicore data; see the "configure" line above.
output:
...
make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt74b
mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt74b
LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt74l.dat ./out/icu4j/icudt74b.dat -s ./out/build/icudt74l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt74b
mv ./out/icu4j/"com/ibm/icu/impl/data/icudt74b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt74b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt74b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt74b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt74b"
jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt74b/
mkdir -p /tmp/icu4j/main/shared/data
cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt74b/
mkdir -p /tmp/icu4j/main/shared/data
cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
- copy the binary data files into the ICU4J tree
cd $ICU_OUT/icu4c/data/out/icu4j
cp -v com/ibm/icu/impl/data/$ICUDT/coll/* $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/$ICUDT/coll
cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/$ICUDT/brkitr
cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/$ICUDT
cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/$ICUDT
cd com/ibm/icu/impl/data/$ICUDT/
ls *.icu | egrep -v "cnvalias.icu" | awk '{print "cp " $0 " $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/$ICUDT";}' | sh
- The procedure above is very conservative:
It refreshes only the parts of the ICU4J data that we think are affected by a Unicode data update.
It avoids dealing with any other discrepancies
between the source and generated data files.
*If* instead we wanted to refresh *all* of the ICU4J data from ICU4C:
$ICU_OUT/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
* refresh Java test .txt files
- copy new .txt files into ICU4J's main/core/src/test/resources/com/ibm/icu/dev/data/unicode
cd $ICU_SRC/icu4c/source/data/unidata
cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/unicode
cd ../../test/testdata
cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/unicode
cp -v $UNICODE_DATA/UCD/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/unicode
* run & fix ICU4J tests
*** API additions
- send notice to icu-design about new born-@stable API (enum constants etc.)
*** CLDR numbering systems
- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
for example:
~/icu/mine/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-15.txt
~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-15.1.txt
~/icu/uni/src$ diff -u /tmp/icu/nv4-15.txt /tmp/icu/nv4-15.1.txt
-->
(empty this time)
or:
~/unitools/mine/src$ diff -u unicodetools/data/ucd/15.0.0/extracted/DerivedGeneralCategory.txt unicodetools/data/ucd/dev/extracted/DerivedGeneralCategory.txt | grep '; Nd' | egrep '^\+'
-->
(empty this time)
Unicode 15.1:
(none this time)
*** merge the Unicode update branch back onto the main branch
- do not merge the icudata.jar and testdata.jar,
instead rebuild them from merged & tested ICU4C
- if there is a merge conflict in icudata.jar, here is one way to deal with it:
+ remove icudata.jar from the commit so that rebasing is trivial
+ ~/icu/uni/src$ git restore --source=main icu4j/main/shared/data/icudata.jar
+ ~/icu/uni/src$ git commit -a --amend
+ switch to main, pull updates, switch back to the dev branch
+ ~/icu/uni/src$ git rebase main
+ rebuild icudata.jar
+ ~/icu/uni/src$ git commit -a --amend
+ ~/icu/uni/src$ git push -f
- make sure that changes to Unicode tools are checked in:
https://github.com/unicode-org/unicodetools
---------------------------------------------------------------------------- ***
Unicode 15.1 update for ICU 74
@ -234,55 +605,15 @@ export UNICODE_TOOLS=~/oss/unicodetools/mine/src
- new in Unicode 15.1, hardcoded because trivial, and unlikely to change
- check that it has not changed:
(cd $UNICODE_DATA && grep -r --include=PropList.txt IDS_Unary_Operator)
->
ucd/PropList.txt:2FFE..2FFF ; IDS_Unary_Operator # So [2] IDEOGRAPHIC DESCRIPTION CHAR...
- if it has changed, then update the implementation and the tests
- Since ICU 75, this property is tested in C++ intltest against ppucd.txt.
* check hardcoded ID_Compat_Math_Start & ID_Compat_Math_Continue
- new in Unicode 15.1, hardcoded because trivial, and unlikely to change
- check that they have not changed:
(cd $UNICODE_DATA && grep -r --include=PropList.txt ID_Compat_Math)
->
ucd/PropList.txt:00B2..00B3 ; ID_Compat_Math_Continue # No [2] SUPERSCRIPT TWO..SUPERSCRIPT THREE
ucd/PropList.txt:00B9 ; ID_Compat_Math_Continue # No SUPERSCRIPT ONE
ucd/PropList.txt:2070 ; ID_Compat_Math_Continue # No SUPERSCRIPT ZERO
ucd/PropList.txt:2074..2079 ; ID_Compat_Math_Continue # No [6] SUPERSCRIPT FOUR..SUPERSCRIPT NINE
ucd/PropList.txt:207A..207C ; ID_Compat_Math_Continue # Sm [3] SUPERSCRIPT PLUS SIGN..SUPERSCRIPT EQUALS SIGN
ucd/PropList.txt:207D ; ID_Compat_Math_Continue # Ps SUPERSCRIPT LEFT PARENTHESIS
ucd/PropList.txt:207E ; ID_Compat_Math_Continue # Pe SUPERSCRIPT RIGHT PARENTHESIS
ucd/PropList.txt:2080..2089 ; ID_Compat_Math_Continue # No [10] SUBSCRIPT ZERO..SUBSCRIPT NINE
ucd/PropList.txt:208A..208C ; ID_Compat_Math_Continue # Sm [3] SUBSCRIPT PLUS SIGN..SUBSCRIPT EQUALS SIGN
ucd/PropList.txt:208D ; ID_Compat_Math_Continue # Ps SUBSCRIPT LEFT PARENTHESIS
ucd/PropList.txt:208E ; ID_Compat_Math_Continue # Pe SUBSCRIPT RIGHT PARENTHESIS
ucd/PropList.txt:2202 ; ID_Compat_Math_Continue # Sm PARTIAL DIFFERENTIAL
ucd/PropList.txt:2207 ; ID_Compat_Math_Continue # Sm NABLA
ucd/PropList.txt:221E ; ID_Compat_Math_Continue # Sm INFINITY
ucd/PropList.txt:1D6C1 ; ID_Compat_Math_Continue # Sm MATHEMATICAL BOLD NABLA
ucd/PropList.txt:1D6DB ; ID_Compat_Math_Continue # Sm MATHEMATICAL BOLD PARTIAL DIFFERENTIAL
ucd/PropList.txt:1D6FB ; ID_Compat_Math_Continue # Sm MATHEMATICAL ITALIC NABLA
ucd/PropList.txt:1D715 ; ID_Compat_Math_Continue # Sm MATHEMATICAL ITALIC PARTIAL DIFFERENTIAL
ucd/PropList.txt:1D735 ; ID_Compat_Math_Continue # Sm MATHEMATICAL BOLD ITALIC NABLA
ucd/PropList.txt:1D74F ; ID_Compat_Math_Continue # Sm MATHEMATICAL BOLD ITALIC PARTIAL DIFFERENTIAL
ucd/PropList.txt:1D76F ; ID_Compat_Math_Continue # Sm MATHEMATICAL SANS-SERIF BOLD NABLA
ucd/PropList.txt:1D789 ; ID_Compat_Math_Continue # Sm MATHEMATICAL SANS-SERIF BOLD PARTIAL DIFFERENTIAL
ucd/PropList.txt:1D7A9 ; ID_Compat_Math_Continue # Sm MATHEMATICAL SANS-SERIF BOLD ITALIC NABLA
ucd/PropList.txt:1D7C3 ; ID_Compat_Math_Continue # Sm MATHEMATICAL SANS-SERIF BOLD ITALIC PARTIAL DIFFERENTIAL
ucd/PropList.txt:2202 ; ID_Compat_Math_Start # Sm PARTIAL DIFFERENTIAL
ucd/PropList.txt:2207 ; ID_Compat_Math_Start # Sm NABLA
ucd/PropList.txt:221E ; ID_Compat_Math_Start # Sm INFINITY
ucd/PropList.txt:1D6C1 ; ID_Compat_Math_Start # Sm MATHEMATICAL BOLD NABLA
ucd/PropList.txt:1D6DB ; ID_Compat_Math_Start # Sm MATHEMATICAL BOLD PARTIAL DIFFERENTIAL
ucd/PropList.txt:1D6FB ; ID_Compat_Math_Start # Sm MATHEMATICAL ITALIC NABLA
ucd/PropList.txt:1D715 ; ID_Compat_Math_Start # Sm MATHEMATICAL ITALIC PARTIAL DIFFERENTIAL
ucd/PropList.txt:1D735 ; ID_Compat_Math_Start # Sm MATHEMATICAL BOLD ITALIC NABLA
ucd/PropList.txt:1D74F ; ID_Compat_Math_Start # Sm MATHEMATICAL BOLD ITALIC PARTIAL DIFFERENTIAL
ucd/PropList.txt:1D76F ; ID_Compat_Math_Start # Sm MATHEMATICAL SANS-SERIF BOLD NABLA
ucd/PropList.txt:1D789 ; ID_Compat_Math_Start # Sm MATHEMATICAL SANS-SERIF BOLD PARTIAL DIFFERENTIAL
ucd/PropList.txt:1D7A9 ; ID_Compat_Math_Start # Sm MATHEMATICAL SANS-SERIF BOLD ITALIC NABLA
ucd/PropList.txt:1D7C3 ; ID_Compat_Math_Start # Sm MATHEMATICAL SANS-SERIF BOLD ITALIC PARTIAL DIFFERENTIAL
- if they have changed, then update the implementation and the tests
- TODO: There is a ticket for using ppucd.txt in test code.
Do that and check these hardcoded properties against that.
- Since ICU 75, these properties are tested in C++ intltest against ppucd.txt.
* Bazel build process