diff --git a/tools/unicodetools/readme.html b/tools/unicodetools/readme.html new file mode 100644 index 00000000000..e111393e715 --- /dev/null +++ b/tools/unicodetools/readme.html @@ -0,0 +1,216 @@ + + +
+ + + + +/**
+*******************************************************************************
+* Copyright (C) 1996-2001, International Business Machines Corporation and *
+* others. All Rights Reserved. *
+*******************************************************************************
+*
+* $Source: /xsrl/Nsvn/icu/unicodetools/readme.html,v $
+* $Date: 2004/11/13 23:28:39 $
+* $Revision: 1.1 $
+*
+*******************************************************************************
+*/
This file provides instructions for building and running the UnicodeTools, which
+can be used to:
WARNING!!
+The rest of this will assume that you have set up CVS so that you load the ICU4J project into
+C:\ICU4J
+
+You need both the main icu4j and a subproject called unicodetools. See:
+
+http://oss.software.ibm.com/icu/develop/cvs.html. Inside unicodetools, look at com/ibm/text. The
+main directories of interest are UCD, UCA and utility.
Set up Eclipse to build two projects: ICU4J and UnicodeTools:
+
+Project Name: ICU4J
+Directory: C:\ICU4J\icu4j
+Default output folder = ICU4J/classes
+
+Project Name: UnicodeTools
+Directory: C:\ICU4J\unicodetools
+Default Output Folder: UnicodeTools/classes
+
+After Eclipse is set up with these, exclude certain files from UnicodeTools:
+
+Right-Click UnicodeTools > Properties > Java Build Path > Exclusions
+com/ibm/rbm/
+com/ibm/text/utility/UnicodeMapInt.java
+com/ibm/text/utility/TestUtility.java
+com/ibm/text/UCD/GenerateThaiBreaks-old.java/
+com/ibm/text/UCD/ProcessUnihan.java/
+com/ibm/text/UCA/WriteHTMLCollation.java/
+
+UnicodeTools must also include the ICU4J project, with
+
+Right-Click UnicodeTools > Properties > Java Build Path > Projects
public static final String DATA_DIR = "C:\\DATA\\";
+public static final String UCD_DIR = BASE_DIR + "UCD\\";
+public static final String BIN_DIR = DATA_DIR + "BIN\\";
+public static final String GEN_DIR = DATA_DIR + "GEN\\";
+
+Make sure that each of these directories exist. Also make sure that the following
+exist:
+
+<GEN_DIR>/DerivedData
+<GEN_DIR>/DerivedData/ExtractedProperties
+<UCD_DIR>/EXTRAS-Update
The folder names must be of the form: "3.2.0-Update", so rename the folders on the
+Unicode site to this format.
If you are downloading any "incomplete" release (one that does not contain a complete set of data +files for that release, you need to also download the previous complete release). Most of the N.M-Update +directoriess are complete, *except*:
+4.0-Update, which does not contain a copy of Unihan.txt and some other files
+3.1-Update, which does not contain a copy of BidiMirroring.txt
Also, make the following changes to UnicodeData for 1.1.5:
+Delete
+3400;HANGUL SYLLABLE KIYEOK A;Lo;0;L;1100 1161;;;;N;;;;; +4DFF;HANGUL SYLLABLE MIEUM WEO RIEUL-THIEUTH;Lo;0;L;1106 116F 11B4;;;;N;;;;; +4E00;+;Lo;0;L;;;;;N;;;;;
Add:
+4E00;+;Lo;0;L;;;;;N;;;;; +9FA5; ;Lo;0;L;;;;;N;;;;; +E000; ;Co;0;L;;;;;N;;;;; +F8FF; ;Co;0;L;;;;;N;;;;;
And from a late version of Unicode, add:
+F900;CJK COMPATIBILITY IDEOGRAPH-F900;Lo;0;L;8C48;;;;N;;;;; +... +FA2D;CJK COMPATIBILITY IDEOGRAPH-FA2D;Lo;0;L;9DB4;;;;N;;;;;+
If you are building any of the UCA tools, you need to get a copy of the UCA data file
+from http://www.unicode.org/reports/tr10/#AllKeys. The default location for this is:
+
+BASE_DIR + "Collation\allkeys" + VERSION + ".txt".
+
+If you have it in a different location, change that value for KEYS in UCA.java, and
+the value for BASE_DIR
C://DATA/ + + BIN/ + + Collation/ + allkeys-3.1.1.txt + + GEN/ + DerivedData/ + ExtractedProperties + UCD/ + 3.0.0-Update/ + Unihan-3.2.0.txt + ... + 3.0.1-Update/ + ... + 3.1.0-Update/ + ... + 3.1.1-Update/ + ... + 3.2.0-Update/ + ... + 4.0.0-Update/ + ArabicShaping-4.0.0d14b.txt + BidiMirroring-4.0.0d1b.txt + ... + EXTRAS-Update/+
All of the following have "version X" in the options you give to Java (either on the +command line, or in the Eclipse 'run' options. If you want a specific version like 3.1.0, then you +would write "version 3.1.1". If you want the latest version (4.1.0), you can omit the "version X".
+The Working directory has to be C:\ICU4J\unicodetools\com\ibm\text\UCD
+(In Eclipse you can also use ${workspace_loc:UnicodeTools/com/ibm/text/UCD}, which abstracts away
+the location.)
+
+The same for UCA:
main: com.ibm.text.UCD.Main
+directory:
+C:\ICU4J\unicodetools\com\ibm\text\UCA
For each version, the tools build a set of binary data in BIN that contain the information for
+that release. This is done automatically, or you can manually do it with the options
+
+version X build
+
+This builds an compressed format of all the UCD data (except blocks and Unihan) into the BIN
+directory. Don't worry about the voluminous console messages, unless one says "FAIL".
+
+You have to manually do this if you change any of the data files in that
+version!!
Note: if for any reason you modify the binary format of the BIN files, you also have to bump the
+value in that file:
+
+static final byte BINARY_FORMAT = 8; // bumped if binary format of UCD changes
MakeUnicodeFiles.generateFile
+This will execute the commands in the file MakeUnicodeFiles.txt.
+You will edit that file if you want a different 'd' version for the files, OR if you want to +change which files are built. At the top of the file you will see the following text:
+Generate:+
DeltaVersion: 7+
Generate: .*line.* prop.*+
The matching is case-insensitive.
+version X verify
+
+Don't worry about any console messages except those that say FAIL.
The files will be generated in the GEN directories.
+java <UCA>Main ICU
+