Jitterbug 3092: initial checkin of the readme file for the RBBI tools.

X-SVN-Rev: 12607
2025-04-18 11:14:22 +00:00 · 2003-07-09 23:12:01 +00:00 · 2003-07-09 23:12:01 +00:00 · 8c0145ac8e
commit 8c0145ac8e
parent 268999268d
1 changed files with 61 additions and 0 deletions
--- a/icu4j/src/com/ibm/icu/dev/tool/rbbi/readme.html
+++ b/icu4j/src/com/ibm/icu/dev/tool/rbbi/readme.html
@ -0,0 +1,61 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+  <meta http-equiv="content-type"
+ content="text/html; charset=ISO-8859-1">
+  <title>README For RBBI Tools</title>
+</head>
+<body>
+<h3>What Are These Tools?</h3>
+This directory contains two tools, WriteTablesToFiles, which converts
+the Java&nbsp; BreakIterators into .brk files for ICU4C, and
+BuildDictionaryFile, which builds the binary the Thai word break
+dictionary from a Unicode text file containing a list of Thai words.
+The rest of this document describes how to use these tools.<br>
+<h3>How To Build The ICU4C BreakIterator Files</h3>
+The RuleBasedBreakIterator code was originally developed for ICU4J, and
+then ported to ICU4C. For various reasons, the code which compiled the
+state tables from the rule text was hard to port. Instead the
+WriteTablesToFiles tool was wirtten to read in the Java data and write
+the .brk files which ICU4C reads. Later the RBBI code was re-written
+for ICU4C, including the ability to compile the state tables from rules
+stored in text files. This means that the WriteTablesToFiles tool is
+now obsolete.<br>
+<br>
+<h3>How To Build The Thai Word Break Dictionary</h3>
+The Thai word berak code was developed originally for ICU4J, and then
+ported to ICU4C - the dictionary builder tool was never ported, so you
+have to use the Java tool to build the dictionary file for ICU4C. On
+the other hand, all of the rest of the ICU locale data was developed
+originally for
+ICU4C, and a tool was written to covert the ICU4C locale data to Java
+resource bundles for use by ICU4J. Consequently, the process of
+building the Thai
+word break dictionary for ICU4C and
+ICU4J is a bit convoluted. Here are the steps:<br>
+<div style="margin-left: 40px;">
+<ol>
+  <li>Download and build both ICU4C and ICU4J on a <span
+ style="font-weight: bold;">Big Endian</span> machine.<br>
+  </li>
+  <li>Run the following command line to build the Thai dictionary file:<br>
+java -classpath $icu4j_root/classes
+com.ibm.icu.dev.tool.rbbi.BuildDictionaryFile
+$icu4j_root/src/com/ibm/icu/dev/data/thai6.ucs Unicode
+$icu_root/soruce/data/brkitr/thai_dict.brk</li>
+  <li>Rebuild the ICU4C resources.</li>
+  <li>Rebuild the ICU4J ICULocaleData.jar file. (See <a
+ href="../../../../../../../readme.html">the ICU4J readme file</a> for
+instructions)</li>
+  <li>Move ICULocaleData.jar from $icu_root/source/data/locales/java to
+$icu4j_root/src/com/ibm/icu/impl/data</li>
+  <li>Build ICU4J's _resources target to unjar the new files.<br>
+  </li>
+</ol>
+</div>
+In the above, $icu_root is the root of your ICU4C source tree, for
+example
+"~/dev/icu" and $icu4j_root is the root of your ICU4J source tree, for
+example "~/dev/icu4j".<br>
+</body>
+</html>