mirror of
https://github.com/unicode-org/icu.git
synced 2025-04-18 11:14:22 +00:00
Jitterbug 3092: initial checkin of the readme file for the RBBI tools.
X-SVN-Rev: 12607
This commit is contained in:
parent
268999268d
commit
8c0145ac8e
1 changed files with 61 additions and 0 deletions
61
icu4j/src/com/ibm/icu/dev/tool/rbbi/readme.html
Normal file
61
icu4j/src/com/ibm/icu/dev/tool/rbbi/readme.html
Normal file
|
@ -0,0 +1,61 @@
|
|||
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<meta http-equiv="content-type"
|
||||
content="text/html; charset=ISO-8859-1">
|
||||
<title>README For RBBI Tools</title>
|
||||
</head>
|
||||
<body>
|
||||
<h3>What Are These Tools?</h3>
|
||||
This directory contains two tools, WriteTablesToFiles, which converts
|
||||
the Java BreakIterators into .brk files for ICU4C, and
|
||||
BuildDictionaryFile, which builds the binary the Thai word break
|
||||
dictionary from a Unicode text file containing a list of Thai words.
|
||||
The rest of this document describes how to use these tools.<br>
|
||||
<h3>How To Build The ICU4C BreakIterator Files</h3>
|
||||
The RuleBasedBreakIterator code was originally developed for ICU4J, and
|
||||
then ported to ICU4C. For various reasons, the code which compiled the
|
||||
state tables from the rule text was hard to port. Instead the
|
||||
WriteTablesToFiles tool was wirtten to read in the Java data and write
|
||||
the .brk files which ICU4C reads. Later the RBBI code was re-written
|
||||
for ICU4C, including the ability to compile the state tables from rules
|
||||
stored in text files. This means that the WriteTablesToFiles tool is
|
||||
now obsolete.<br>
|
||||
<br>
|
||||
<h3>How To Build The Thai Word Break Dictionary</h3>
|
||||
The Thai word berak code was developed originally for ICU4J, and then
|
||||
ported to ICU4C - the dictionary builder tool was never ported, so you
|
||||
have to use the Java tool to build the dictionary file for ICU4C. On
|
||||
the other hand, all of the rest of the ICU locale data was developed
|
||||
originally for
|
||||
ICU4C, and a tool was written to covert the ICU4C locale data to Java
|
||||
resource bundles for use by ICU4J. Consequently, the process of
|
||||
building the Thai
|
||||
word break dictionary for ICU4C and
|
||||
ICU4J is a bit convoluted. Here are the steps:<br>
|
||||
<div style="margin-left: 40px;">
|
||||
<ol>
|
||||
<li>Download and build both ICU4C and ICU4J on a <span
|
||||
style="font-weight: bold;">Big Endian</span> machine.<br>
|
||||
</li>
|
||||
<li>Run the following command line to build the Thai dictionary file:<br>
|
||||
java -classpath $icu4j_root/classes
|
||||
com.ibm.icu.dev.tool.rbbi.BuildDictionaryFile
|
||||
$icu4j_root/src/com/ibm/icu/dev/data/thai6.ucs Unicode
|
||||
$icu_root/soruce/data/brkitr/thai_dict.brk</li>
|
||||
<li>Rebuild the ICU4C resources.</li>
|
||||
<li>Rebuild the ICU4J ICULocaleData.jar file. (See <a
|
||||
href="../../../../../../../readme.html">the ICU4J readme file</a> for
|
||||
instructions)</li>
|
||||
<li>Move ICULocaleData.jar from $icu_root/source/data/locales/java to
|
||||
$icu4j_root/src/com/ibm/icu/impl/data</li>
|
||||
<li>Build ICU4J's _resources target to unjar the new files.<br>
|
||||
</li>
|
||||
</ol>
|
||||
</div>
|
||||
In the above, $icu_root is the root of your ICU4C source tree, for
|
||||
example
|
||||
"~/dev/icu" and $icu4j_root is the root of your ICU4J source tree, for
|
||||
example "~/dev/icu4j".<br>
|
||||
</body>
|
||||
</html>
|
Loading…
Add table
Reference in a new issue