diff --git a/tools/unicodetools/readme.html b/tools/unicodetools/readme.html new file mode 100644 index 00000000000..e111393e715 --- /dev/null +++ b/tools/unicodetools/readme.html @@ -0,0 +1,216 @@ + + + + + + + +New Page 18 + + + + +

/**
+*******************************************************************************
+* Copyright (C) 1996-2001, International Business Machines Corporation and *
+* others. All Rights Reserved. *
+*******************************************************************************
+*
+* $Source: /xsrl/Nsvn/icu/unicodetools/readme.html,v $
+* $Date: 2004/11/13 23:28:39 $
+* $Revision: 1.1 $
+*
+*******************************************************************************
+*/

+

UnicodeTools

+

This file provides instructions for building and running the UnicodeTools, which
+can be used to:

+ +

WARNING!!

+ +

Instructions:

+

0. You will need to get ICU4J on your system, using CVS.

+

The rest of this will assume that you have set up CVS so that you load the ICU4J project into +C:\ICU4J
+
+You need both the main icu4j and a subproject called unicodetools. See: + +http://oss.software.ibm.com/icu/develop/cvs.html. Inside unicodetools, look at com/ibm/text. The +main directories of interest are UCD, UCA and utility.

+

0a. If you are using Eclipse for your IDE, look at the instructions on + +http://oss.software.ibm.com/icu/docs/eclipse_howto/eclipse_howto.htm

+

Set up Eclipse to build two projects: ICU4J and UnicodeTools:
+
+Project Name: ICU4J
+Directory: C:\ICU4J\icu4j
+Default output folder = ICU4J/classes
+
+Project Name: UnicodeTools
+Directory: C:\ICU4J\unicodetools
+Default Output Folder: UnicodeTools/classes
+
+After Eclipse is set up with these, exclude certain files from UnicodeTools:
+
+Right-Click UnicodeTools > Properties > Java Build Path > Exclusions
+com/ibm/rbm/
+com/ibm/text/utility/UnicodeMapInt.java
+com/ibm/text/utility/TestUtility.java
+com/ibm/text/UCD/GenerateThaiBreaks-old.java/
+com/ibm/text/UCD/ProcessUnihan.java/
+com/ibm/text/UCA/WriteHTMLCollation.java/
+
+UnicodeTools must also include the ICU4J project, with
+
+Right-Click UnicodeTools > Properties > Java Build Path > Projects

+

1. In UCD, you must edit UCD_Types.java at the top, to set the directories for the build:

+

public static final String DATA_DIR = "C:\\DATA\\";
+public static final String UCD_DIR = BASE_DIR + "UCD\\";
+public static final String BIN_DIR = DATA_DIR + "BIN\\";
+public static final String GEN_DIR = DATA_DIR + "GEN\\";
+
+Make sure that each of these directories exist. Also make sure that the following
+exist:
+
+<GEN_DIR>/DerivedData
+<GEN_DIR>/DerivedData/ExtractedProperties
+<UCD_DIR>/EXTRAS-Update

+

2. Download all of the UnicodeData files for each version into UCD_DIR.

+

The folder names must be of the form: "3.2.0-Update", so rename the folders on the
+Unicode site to this format.

+

2a Ensure Complete Release

+

If you are downloading any "incomplete" release (one that does not contain a complete set of data +files for that release, you need to also download the previous complete release). Most of the N.M-Update +directoriess are complete, *except*:

+

4.0-Update, which does not contain a copy of Unihan.txt and some other files
+3.1-Update, which does not contain a copy of BidiMirroring.txt

+

Also, make the following changes to UnicodeData for 1.1.5:

+

Delete

+
3400;HANGUL SYLLABLE KIYEOK A;Lo;0;L;1100 1161;;;;N;;;;;
+4DFF;HANGUL SYLLABLE MIEUM WEO RIEUL-THIEUTH;Lo;0;L;1106 116F 11B4;;;;N;;;;;
+4E00;;Lo;0;L;;;;;N;;;;;
+

Add:

+
4E00;;Lo;0;L;;;;;N;;;;;
+9FA5;;Lo;0;L;;;;;N;;;;;
+E000;;Co;0;L;;;;;N;;;;;
+F8FF;;Co;0;L;;;;;N;;;;;
+

And from a late version of Unicode, add:

+
F900;CJK COMPATIBILITY IDEOGRAPH-F900;Lo;0;L;8C48;;;;N;;;;;
+...
+FA2D;CJK COMPATIBILITY IDEOGRAPH-FA2D;Lo;0;L;9DB4;;;;N;;;;;
+

2b. UCA data

+

If you are building any of the UCA tools, you need to get a copy of the UCA data file
+from http://www.unicode.org/reports/tr10/#AllKeys. The default location for this is:
+
+BASE_DIR + "Collation\allkeys" + VERSION + ".txt".
+
+If you have it in a different location, change that value for KEYS in UCA.java, and
+the value for BASE_DIR

+

2c. Here is an example of the default directory structure with files:

+
C://DATA/
+
+        BIN/
+    
+        Collation/
+            allkeys-3.1.1.txt
+        
+        GEN/
+            DerivedData/
+                ExtractedProperties
+        UCD/
+            3.0.0-Update/
+                Unihan-3.2.0.txt
+                ...
+            3.0.1-Update/
+                ...
+            3.1.0-Update/
+                ...
+            3.1.1-Update/
+                ...
+            3.2.0-Update/
+                ...
+            4.0.0-Update/
+                ArabicShaping-4.0.0d14b.txt
+                BidiMirroring-4.0.0d1b.txt
+                ...
+            EXTRAS-Update/
+

3. Versions

+

All of the following have "version X" in the options you give to Java (either on the  +command line, or in the Eclipse 'run' options. If you want a specific version like 3.1.0, then you +would write "version 3.1.1". If you want the latest version (4.1.0), you can omit the "version X".

+

4. Running UCD, you will use com.ibm.text.UCD.Main as your main class.

+

The Working directory has to be C:\ICU4J\unicodetools\com\ibm\text\UCD
+(In Eclipse you can also use ${workspace_loc:UnicodeTools/com/ibm/text/UCD}, which abstracts away +the location.)
+
+The same for UCA:

+

main: com.ibm.text.UCD.Main
+directory: +C:\ICU4J\unicodetools\com\ibm\text\UCA

+

4a. BIN

+

For each version, the tools build a set of binary data in BIN that contain the information for +that release. This is done automatically, or you can manually do it with the options
+
+version X build
+
+This builds an compressed format of all the UCD data (except blocks and Unihan) into the BIN +directory. Don't worry about the voluminous console messages, unless one says "FAIL".
+
+You have to manually do this if you change any of the data files in that +version!!

+

Note: if for any reason you modify the binary format of the BIN files, you also have to bump the +value in that file:
+
+static final byte BINARY_FORMAT = 8; // bumped if binary format of UCD changes

+

4b. To build the Unicode files for a particular version X, run the Main with the following +argument:

+

MakeUnicodeFiles.generateFile

+

This will execute the commands in the file MakeUnicodeFiles.txt.

+

You will edit that file if you want a different 'd' version for the files, OR if you want to +change which files are built. At the top of the file you will see the following text:

+
Generate: 
+
DeltaVersion: 7
+

4c. To change which files are built, put any number of regular expressions separated by spaces +after Generate. Eg,

+
Generate: .*line.* prop.*
+

The matching is case-insensitive.

+

4d. To change the 'd' number that is appended to the generated files names, change the +DeltaVersion.

+

4e. To run basic consistency checking, run:

+

version X verify
+
+Don't worry about any console messages except those that say FAIL.

+

4f. Output

+

The files will be generated in the GEN directories.

+ +

5. Running UCA, you will use com.ibm.text.UCA.Main as your main class.

+

5a. To build all the UCA files used by ICU, use the option:

+

java <UCA>Main ICU

+

6. To build all the charts, use the UCA project, with options: normalizationChart caseChart +scriptChart indexChart

+ + + +