ICU-1955 doc update

X-SVN-Rev: 8924
2025-04-13 08:53:20 +00:00 · 2002-06-22 07:23:45 +00:00 · 2002-06-22 07:23:45 +00:00 · d2500d9618
commit d2500d9618
parent 6939f0b1bb
4 changed files with 2487 additions and 2503 deletions
--- a/icu4j/src/com/ibm/icu/text/BOSCU.java
+++ b/icu4j/src/com/ibm/icu/text/BOSCU.java
@ -5,8 +5,8 @@
 *******************************************************************************
 *
 * $Source: /xsrl/Nsvn/icu/icu4j/src/com/ibm/icu/text/Attic/BOSCU.java,v $ 
-* $Date: 2002/06/20 01:21:18 $ 
-* $Revision: 1.2 $
+* $Date: 2002/06/22 07:23:45 $ 
+* $Revision: 1.3 $
 *
 *******************************************************************************
 */
@ -17,366 +17,367 @@ import com.ibm.icu.impl.UnicodeCharacterIterator;
 /**
 * <p>Binary Ordered Compression Scheme for Unicode</p>
 * 
- * <p>Specific application:<br>
- * Encode a Unicode string for the identical level of a sort key.<br>
- * Restrictions:
- * <ul>
- * <li> byte stream (unsigned 8-bit bytes)
- * <li> lexical order of the identical-level run must be the same as code 
- * 		point order for the string
- * <li> avoid byte values 0, 1, 2
- * </ul>
- * </p>
+ * <p>(Syn Wee: reference a paper if we have one on our site)</p>
+ * <p>BOCU is used to compress unicode text into a stream of unsigned
+ * bytes.  For many kinds of text the compression compares favorably
+ * to UTF-8, and for some kinds of text (such as CJK) it does better.
+ * The resulting bytes will compare in the same order as the original
+ * code points.  The byte stream does not contain the values 0, 1, or
+ * 2. (Syn Wee, I don't understand the comment later in the source
+ * about these values being used in sort keys, can you explain?)</p>
+ *
+ * <p>Unlike a UTF encoding, BOCU-compressed text is not suitable for
+ * random access.</p>
 * 
- * <p>Method: Slope Detection<br>
- * Remember the previous code point (initial 0).
- * For each cp in the string, encode the difference to the previous one.
- * </p>
- * <p>With a compact encoding of differences, this yields good results for
- * small scripts and UTF-like results otherwise.
- * </p>
- * <p>Encoding of differences:<br>
- * <ul> 
- * <li>Similar to a UTF, encoding the length of the byte sequence in the lead 
- * 		bytes.
- * <li> Does not need to be friendly for decoding or random access
- *     (trail byte values may overlap with lead/single byte values).
- * <li> The signedness must be encoded as the most significant part.
- * </ul>
- * </p>
- * <p>We encode differences with few bytes if their absolute values are small.
- * For correct ordering, we must treat the entire value range -10ffff..+10ffff
- * in ascending order, which forbids encoding the sign and the absolute value 
- * separately.
- * Instead, we split the lead byte range in the middle and encode non-negative 
- * values going up and negative values going down.
- * </p>
- * <p>For very small absolute values, the difference is added to a middle byte 
- * value for single-byte encoded differences.
- * For somewhat larger absolute values, the difference is divided by the number
- * of byte values available, the modulo is used for one trail byte, and the 
- * remainder is added to a lead byte avoiding the single-byte range.
- * For large absolute values, the difference is similarly encoded in three 
- * bytes.
- * </p>
- * <p>This encoding does not use byte values 0, 1, 2, but uses all other byte 
- * values for lead/single bytes so that the middle range of single bytes is as 
- * large as possible.
- * </p>
- * <p>Note that the lead byte ranges overlap some, but that the sequences as a 
- * whole are well ordered. I.e., even if the lead byte is the same for 
- * sequences of different lengths, the trail bytes establish correct order.
- * It would be possible to encode slightly larger ranges for each length (>1) 
- * by subtracting the lower bound of the range. However, that would also slow 
- * down the calculation.
- * </p>
- * <p>For the actual string encoding, an optimization moves the previous code 
- * point value to the middle of its Unicode script block to minimize the 
- * differences in same-script text runs.
- * </p>
+ * <p>Method: Slope Detection<br> Remember the previous code point
+ * (initial 0).  For each code point in the string, encode the
+ * difference with the previous one.  Similar to a UTF, the length of
+ * the byte sequence is encoded in the lead bytes.  Unlike a UTF, the
+ * trail byte values may overlap with lead/single byte values.  The
+ * signedness of the difference must be encoded as the most
+ * significant part.</p>
+ *
+ * <p>We encode differences with few bytes if their absolute values
+ * are small.  For correct ordering, we must treat the entire value
+ * range -10ffff..+10ffff in ascending order, which forbids encoding
+ * the sign and the absolute value separately. Instead, we split the
+ * lead byte range in the middle and encode non-negative values going
+ * up and negative values going down.</p>
+ *
+ * <p>For very small absolute values, the difference is added to a
+ * middle byte value for single-byte encoded differences.  For
+ * somewhat larger absolute values, the difference is divided by the
+ * number of byte values available, the modulo is used for one trail
+ * byte, and the remainder is added to a lead byte avoiding the
+ * single-byte range.  For large absolute values, the difference is
+ * similarly encoded in three bytes. (Syn Wee, I need examples
+ * here.)</p>
+ *
+ * <p>BOCU does not use byte values 0, 1, or 2, but uses all other
+ * byte values for lead and single bytes, so that the middle range of
+ * single bytes is as large as possible.</p>
+ *
+ * <p>Note that the lead byte ranges overlap some, but that the
+ * sequences as a whole are well ordered. I.e., even if the lead byte
+ * is the same for sequences of different lengths, the trail bytes
+ * establish correct order.  It would be possible to encode slightly
+ * larger ranges for each length (>1) by subtracting the lower bound
+ * of the range. However, that would also slow down the calculation.
+ * (Syn Wee, need an example).</p>
+ *
+ * <p>For the actual string encoding, an optimization moves the
+ * previous code point value to the middle of its Unicode script block
+ * to minimize the differences in same-script text runs.  (Syn Wee,
+ * need an example.)</p>
+ *
 * @author Syn Wee Quek
 * @since release 2.2, May 3rd 2002
- * @draft 2.2
- */
+ * @draft 2.2 */
 public class BOSCU 
 {      
-	// public constructors --------------------------------------------------
+    // public constructors --------------------------------------------------
    
-	// public methods -------------------------------------------------------
-	
-	/**
-	 * <p>Encode the code points of a string as a sequence of byte-encoded 
-	 * differences (slope detection), preserving lexical order.</p>
-	 * <p>Optimize the difference-taking for runs of Unicode text within
-	 * small scripts:<br>
-	 * Most small scripts are allocated within aligned 128-blocks of Unicode
-	 * code points. Lexical order is preserved if "prev" is always moved
-	 * into the middle of such a block.</p>
-	 * <p>Additionally, "prev" is moved from anywhere in the Unihan area into 
-	 * the middle of that area.</p>
-	 * <p>Note that the identical-level run in a sort key is generated from
-	 * NFD text - there are never Hangul characters included.</p>
-	 * @param source text source
-	 * @param buffer output buffer
-	 * @param offset to start writing to
-	 * @return end offset where the writing stop
-	 */
-	public static int writeIdenticalLevelRun(String source, byte buffer[], 
-																int offset) 
-	{
-	    int prev = 0;
-	    UnicodeCharacterIterator iterator = new UnicodeCharacterIterator(source);
-	    int codepoint = iterator.nextCodePoint();
-	    while (codepoint != UnicodeCharacterIterator.DONE_CODEPOINT) {
-	        if (prev < 0x4e00 || prev >= 0xa000) {
-	            prev = (prev & ~0x7f) - SLOPE_REACH_NEG_1_;
-	        } 
-	        else {
-	            // Unihan U+4e00..U+9fa5:
-	            // double-bytes down from the upper end
-	            prev = 0x9fff - SLOPE_REACH_POS_2_;
-	        }
-	
-	        offset = writeDiff(codepoint - prev, buffer, offset);
-	        prev = codepoint;
-	        codepoint = iterator.nextCodePoint();
-	    }
-	    return offset;
-	}
-	
-	/** 
-	 * How many bytes would writeIdenticalLevelRun() write? 
-	 * @param source text source string
-	 * @return the length of the BOSCU result 
-	 */
-	public static int lengthOfIdenticalLevelRun(String source) 
-	{
-	    int prev = 0;
-	    int result = 0;
-	    UnicodeCharacterIterator iterator = new UnicodeCharacterIterator(source);
-	    int codepoint = iterator.nextCodePoint();
-	    while (codepoint != UnicodeCharacterIterator.DONE_CODEPOINT) {
-	        if (prev < 0x4e00 || prev >= 0xa000) {
-	            prev = (prev & ~0x7f) - SLOPE_REACH_NEG_1_;
-	        } 
-	        else {
-	            // Unihan U+4e00..U+9fa5:
-	            // double-bytes down from the upper end
-	            prev = 0x9fff - SLOPE_REACH_POS_2_;
-	        }
-	
-	        codepoint = iterator.nextCodePoint();
-	        result += lengthOfDiff(codepoint - prev);
-	        prev = codepoint;
-	    }
-	    return result;
-	}
+    // public methods -------------------------------------------------------
+        
+    /**
+     * <p>(Syn Wee-- I think this should be renamed to 'compress')</p>
+     * <p>Encode the code points of a string as a sequence of bytes,
+     * preserving lexical order.</p>
+     *
+     * @param source text source
+     * @param buffer output buffer
+     * @param offset to start writing to
+     * @return end offset where the writing stopped
+     */
+    public static int writeIdenticalLevelRun(String source, byte buffer[], 
+                                             int offset) 
+    {
+        // (Syn Wee - this is a public function so comments of this nature don't
+        // really belong in the documentation, I think.  So I moved them.)
+        // Optimize the difference-taking for runs of Unicode text within
+        // small scripts.
+        // Most small scripts are allocated within aligned 128-blocks of Unicode
+        // code points. Lexical order is preserved if "prev" is always moved
+        // into the middle of such a block.
+        // <p>Additionally, "prev" is moved from anywhere in the Unihan area into 
+        // the middle of that area.
+        // Note that the identical-level run in a sort key is generated from
+        // NFD text - there are never Hangul characters included.

-	// public setter methods -------------------------------------------------
-	
+        int prev = 0;
+        UnicodeCharacterIterator iterator = new UnicodeCharacterIterator(source);
+        int codepoint = iterator.nextCodePoint();
+        while (codepoint != UnicodeCharacterIterator.DONE_CODEPOINT) {
+            if (prev < 0x4e00 || prev >= 0xa000) {
+                prev = (prev & ~0x7f) - SLOPE_REACH_NEG_1_;
+            } 
+            else {
+                // Unihan U+4e00..U+9fa5:
+                // double-bytes down from the upper end
+                prev = 0x9fff - SLOPE_REACH_POS_2_;
+            }
+        
+            offset = writeDiff(codepoint - prev, buffer, offset);
+            prev = codepoint;
+            codepoint = iterator.nextCodePoint();
+        }
+        return offset;
+    }
+        
+    /** 
+     * <p>(Syn Wee, I think this should be renamed getCompressedLength).</p>
+     * Return the number of  bytes that writeIdenticalLevelRun() would write.
+     * @param source text source string
+     * @return the length of the BOCU result 
+     */
+    public static int lengthOfIdenticalLevelRun(String source) 
+    {
+        int prev = 0;
+        int result = 0;
+        UnicodeCharacterIterator iterator = new UnicodeCharacterIterator(source);
+        int codepoint = iterator.nextCodePoint();
+        while (codepoint != UnicodeCharacterIterator.DONE_CODEPOINT) {
+            if (prev < 0x4e00 || prev >= 0xa000) {
+                prev = (prev & ~0x7f) - SLOPE_REACH_NEG_1_;
+            } 
+            else {
+                // Unihan U+4e00..U+9fa5:
+                // double-bytes down from the upper end
+                prev = 0x9fff - SLOPE_REACH_POS_2_;
+            }
+        
+            codepoint = iterator.nextCodePoint();
+            result += lengthOfDiff(codepoint - prev);
+            prev = codepoint;
+        }
+        return result;
+    }
+
+    // public setter methods -------------------------------------------------
+        
    // public getter methods ------------------------------------------------
-	    
-	// public other methods -------------------------------------------------
+            
+    // public other methods -------------------------------------------------
    
    // protected constructor ------------------------------------------------
      
-  	// protected data members ------------------------------------------------
+    // protected data members ------------------------------------------------
    
    // protected methods -----------------------------------------------------
 
- 	// private data members --------------------------------------------------
+    // private data members --------------------------------------------------
    
    /** 
     * Do not use byte values 0, 1, 2 because they are separators in sort keys.
     */
-	private static final int SLOPE_MIN_ = 3;
-	private static final int SLOPE_MAX_ = 0xff;
-	private static final int SLOPE_MIDDLE_ = 0x81;
-	private static final int SLOPE_TAIL_COUNT_ = SLOPE_MAX_ - SLOPE_MIN_ + 1;
-	private static final int SLOPE_MAX_BYTES_ = 4;
+    private static final int SLOPE_MIN_ = 3;
+    private static final int SLOPE_MAX_ = 0xff;
+    private static final int SLOPE_MIDDLE_ = 0x81;
+    private static final int SLOPE_TAIL_COUNT_ = SLOPE_MAX_ - SLOPE_MIN_ + 1;
+    private static final int SLOPE_MAX_BYTES_ = 4;

-	/**
- 	 * Number of lead bytes:
-	 * 1        middle byte for 0
-	 * 2*80=160 single bytes for !=0
-	 * 2*42=84  for double-byte values
-	 * 2*3=6    for 3-byte values
-	 * 2*1=2    for 4-byte values
-	 *
-	 * The sum must be <=SLOPE_TAIL_COUNT.
-	 *
-	 * Why these numbers?
-	 * - There should be >=128 single-byte values to cover 128-blocks
-	 *   with small scripts.
-	 * - There should be >=20902 single/double-byte values to cover Unihan.
-	 * - It helps CJK Extension B some if there are 3-byte values that cover
-	 *   the distance between them and Unihan.
-	 *   This also helps to jump among distant places in the BMP.
-	 * - Four-byte values are necessary to cover the rest of Unicode.
-	 *
- 	 * Symmetrical lead byte counts are for convenience.
-	 * With an equal distribution of even and odd differences there is also
-	 * no advantage to asymmetrical lead byte counts.
-	 */
-	private static final int SLOPE_SINGLE_ = 80;
-	private static final int SLOPE_LEAD_2_ = 42;
-	private static final int SLOPE_LEAD_3_ = 3;
-	private static final int SLOPE_LEAD_4_ = 1;
+    /**
+     * Number of lead bytes:
+     * 1        middle byte for 0
+     * 2*80=160 single bytes for !=0
+     * 2*42=84  for double-byte values
+     * 2*3=6    for 3-byte values
+     * 2*1=2    for 4-byte values
+     *
+     * The sum must be <=SLOPE_TAIL_COUNT.
+     *
+     * Why these numbers?
+     * - There should be >=128 single-byte values to cover 128-blocks
+     *   with small scripts.
+     * - There should be >=20902 single/double-byte values to cover Unihan.
+     * - It helps CJK Extension B some if there are 3-byte values that cover
+     *   the distance between them and Unihan.
+     *   This also helps to jump among distant places in the BMP.
+     * - Four-byte values are necessary to cover the rest of Unicode.
+     *
+     * Symmetrical lead byte counts are for convenience.
+     * With an equal distribution of even and odd differences there is also
+     * no advantage to asymmetrical lead byte counts.
+     */
+    private static final int SLOPE_SINGLE_ = 80;
+    private static final int SLOPE_LEAD_2_ = 42;
+    private static final int SLOPE_LEAD_3_ = 3;
+    private static final int SLOPE_LEAD_4_ = 1;

-	/** 
-	 * The difference value range for single-byters.
-	 */
-	private static final int SLOPE_REACH_POS_1_ = SLOPE_SINGLE_;
-	private static final int SLOPE_REACH_NEG_1_ = (-SLOPE_SINGLE_);
+    /** 
+     * The difference value range for single-byters.
+     */
+    private static final int SLOPE_REACH_POS_1_ = SLOPE_SINGLE_;
+    private static final int SLOPE_REACH_NEG_1_ = (-SLOPE_SINGLE_);

-	/** 
-	 * The difference value range for double-byters.
-	 */
-	private static final int SLOPE_REACH_POS_2_ = 
-					SLOPE_LEAD_2_ * SLOPE_TAIL_COUNT_ + SLOPE_LEAD_2_ - 1;
-	private static final int SLOPE_REACH_NEG_2_ = (-SLOPE_REACH_POS_2_ - 1);
+    /** 
+     * The difference value range for double-byters.
+     */
+    private static final int SLOPE_REACH_POS_2_ = 
+        SLOPE_LEAD_2_ * SLOPE_TAIL_COUNT_ + SLOPE_LEAD_2_ - 1;
+    private static final int SLOPE_REACH_NEG_2_ = (-SLOPE_REACH_POS_2_ - 1);

-	/** 
-	 * The difference value range for 3-byters.
-	 */
-	private static final int SLOPE_REACH_POS_3_ = SLOPE_LEAD_3_ 
-	 											  * SLOPE_TAIL_COUNT_ 
-												  * SLOPE_TAIL_COUNT_ 
-												  + (SLOPE_LEAD_3_ - 1)
-												  * SLOPE_TAIL_COUNT_ +
-												  (SLOPE_TAIL_COUNT_ - 1);
-	private static final int SLOPE_REACH_NEG_3_ = (-SLOPE_REACH_POS_3_ - 1);
+    /** 
+     * The difference value range for 3-byters.
+     */
+    private static final int SLOPE_REACH_POS_3_ = SLOPE_LEAD_3_ 
+        * SLOPE_TAIL_COUNT_ 
+        * SLOPE_TAIL_COUNT_ 
+        + (SLOPE_LEAD_3_ - 1)
+        * SLOPE_TAIL_COUNT_ +
+        (SLOPE_TAIL_COUNT_ - 1);
+    private static final int SLOPE_REACH_NEG_3_ = (-SLOPE_REACH_POS_3_ - 1);

-	/** 
-	 * The lead byte start values.
-	 */
-	private static final int SLOPE_START_POS_2_ = SLOPE_MIDDLE_ 
-													+ SLOPE_SINGLE_ + 1;
-	private static final int SLOPE_START_POS_3_ = SLOPE_START_POS_2_ 
-													+ SLOPE_LEAD_2_;
-	private static final int SLOPE_START_NEG_2_ = SLOPE_MIDDLE_ + 
-													SLOPE_REACH_NEG_1_;
-	private static final int SLOPE_START_NEG_3_ = SLOPE_START_NEG_2_
-													- SLOPE_LEAD_2_;
-													
-	// private constructor ---------------------------------------------------
-	
-	/**
-	 * Constructor private to prevent initialization
-	 */
-	private BOSCU()
-	{
-	}													
+    /** 
+     * The lead byte start values.
+     */
+    private static final int SLOPE_START_POS_2_ = SLOPE_MIDDLE_ 
+        + SLOPE_SINGLE_ + 1;
+    private static final int SLOPE_START_POS_3_ = SLOPE_START_POS_2_ 
+        + SLOPE_LEAD_2_;
+    private static final int SLOPE_START_NEG_2_ = SLOPE_MIDDLE_ + 
+        SLOPE_REACH_NEG_1_;
+    private static final int SLOPE_START_NEG_3_ = SLOPE_START_NEG_2_
+        - SLOPE_LEAD_2_;
+                                                                                                        
+    // private constructor ---------------------------------------------------
+        
+    /**
+     * Constructor private to prevent initialization
+     */
+    private BOSCU()
+    {
+    }                                                                                                   
    
    // private methods -------------------------------------------------------
    
    /**
- 	 * Integer division and modulo with negative numerators
- 	 * yields negative modulo results and quotients that are one more than
- 	 * what we need here.
- 	 * @param number which operations are to be performed on
- 	 * @param factor the factor to use for division
- 	 * @return (result of division) << 32 | modulo 
- 	 */
-	private static final long getNegDivMod(int number, int factor) 
-	{
-    	int modulo = number % factor; 
-    	long result = number / factor;
-    	if (modulo < 0) { 
-        	-- result; 
-        	modulo += factor; 
-    	} 
-    	return (result << 32) | modulo;
-   	}
-   	
-   	/**
-	 * Encode one difference value -0x10ffff..+0x10ffff in 1..3 bytes,
-	 * preserving lexical order
-	 * @param diff
-	 * @param buffer byte buffer to append to
-	 * @param offset to the byte buffer to start appending
-	 * @return end offset where the appending stops
-	 */
-	private static final int writeDiff(int diff, byte buffer[], int offset) 
-	{
-	    if (diff >= SLOPE_REACH_NEG_1_) {
-	        if (diff <= SLOPE_REACH_POS_1_) {
-	            buffer[offset ++] = (byte)(SLOPE_MIDDLE_ + diff);
-	        } 
-	        else if (diff <= SLOPE_REACH_POS_2_) {
-	            buffer[offset ++] = (byte)(SLOPE_START_POS_2_ 
-	            							+ (diff / SLOPE_TAIL_COUNT_));
-	            buffer[offset ++] = (byte)(SLOPE_MIN_ + 
-	            								(diff % SLOPE_TAIL_COUNT_));
-	        } 
-	        else if (diff <= SLOPE_REACH_POS_3_) {
-	            buffer[offset + 2] = (byte)(SLOPE_MIN_ 
-	            							+ (diff % SLOPE_TAIL_COUNT_));
-	            diff /= SLOPE_TAIL_COUNT_;
-	            buffer[offset + 1] = (byte)(SLOPE_MIN_ 
-	            							+ (diff % SLOPE_TAIL_COUNT_));
-	            buffer[offset] = (byte)(SLOPE_START_POS_3_ 
-	            						+ (diff / SLOPE_TAIL_COUNT_));
-	            offset += 3;
-	        } 
-	        else {
-	            buffer[offset + 3] = (byte)(SLOPE_MIN_ 
-	            							+ diff % SLOPE_TAIL_COUNT_);
-	            diff /= SLOPE_TAIL_COUNT_;
-	            buffer[offset] = (byte)(SLOPE_MIN_ 
-	            						+ diff % SLOPE_TAIL_COUNT_);
-	            diff /= SLOPE_TAIL_COUNT_;
-	            buffer[offset + 1] = (byte)(SLOPE_MIN_ 
-	            							+ diff % SLOPE_TAIL_COUNT_);
-	            buffer[offset] = (byte)SLOPE_MAX_;
-	            offset += 4;
-	        }
-	    } 
-	    else {
-	        long division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
-	        int modulo = (int)division;
-	        if (diff >= SLOPE_REACH_NEG_2_) {
-	            diff = (int)(division >> 32);
-	            buffer[offset ++] = (byte)(SLOPE_START_NEG_2_ + diff);
-	            buffer[offset ++] = (byte)(SLOPE_MIN_ + modulo);
-	        } 
-	        else if (diff >= SLOPE_REACH_NEG_3_) {
-	            buffer[offset + 2] = (byte)(SLOPE_MIN_ + modulo);
-	            diff = (int)(division >> 32);
-	            division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
-	            modulo = (int)division;
-	            diff = (int)(division >> 32);
-	            buffer[offset + 1] = (byte)(SLOPE_MIN_ + modulo);
-	            buffer[offset] = (byte)(SLOPE_START_NEG_3_ + diff);
-	            offset += 3;
-	        } 
-	        else {
-	            buffer[offset + 3] = (byte)(SLOPE_MIN_ + modulo);
-	            diff = (int)(division >> 32);
-	            division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
-	            modulo = (int)division;
-	            diff = (int)(division >> 32);
-	            buffer[offset + 2] = (byte)(SLOPE_MIN_ + modulo);
-	            division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
-	            modulo = (int)division;
-	            buffer[offset + 1] = (byte)(SLOPE_MIN_ + modulo);
-	            buffer[offset] = SLOPE_MIN_;
-	            offset += 4;
-	        }
-	    }
-	    return offset;
-	}
-	
-	/**
-	 * How many bytes would writeDiff() write? 
-	 * @param diff
-	 */
-	private static final int lengthOfDiff(int diff) 
-	{
-	    if (diff >= SLOPE_REACH_NEG_1_) {
-	        if (diff <= SLOPE_REACH_POS_1_) {
-	            return 1;
-	        } 
-	        else if (diff <= SLOPE_REACH_POS_2_) {
-	            return 2;
-	        } 
-	        else if(diff <= SLOPE_REACH_POS_3_) {
-	            return 3;
-	        } 
-	        else {
-	            return 4;
-	        }
-	    } 
-	    else {
-	        if (diff >= SLOPE_REACH_NEG_2_) {
-	            return 2;
-	        } 
-	        else if (diff >= SLOPE_REACH_NEG_3_) {
-	            return 3;
-	        } 
-	        else {
-	            return 4;
-	        }
-	    }
-	}
+     * Integer division and modulo with negative numerators
+     * yields negative modulo results and quotients that are one more than
+     * what we need here.
+     * @param number which operations are to be performed on
+     * @param factor the factor to use for division
+     * @return (result of division) << 32 | modulo 
+     */
+    private static final long getNegDivMod(int number, int factor) 
+    {
+        int modulo = number % factor; 
+        long result = number / factor;
+        if (modulo < 0) { 
+            -- result; 
+            modulo += factor; 
+        } 
+        return (result << 32) | modulo;
+    }
+        
+    /**
+     * Encode one difference value -0x10ffff..+0x10ffff in 1..3 bytes,
+     * preserving lexical order
+     * @param diff
+     * @param buffer byte buffer to append to
+     * @param offset to the byte buffer to start appending
+     * @return end offset where the appending stops
+     */
+    private static final int writeDiff(int diff, byte buffer[], int offset) 
+    {
+        if (diff >= SLOPE_REACH_NEG_1_) {
+            if (diff <= SLOPE_REACH_POS_1_) {
+                buffer[offset ++] = (byte)(SLOPE_MIDDLE_ + diff);
+            } 
+            else if (diff <= SLOPE_REACH_POS_2_) {
+                buffer[offset ++] = (byte)(SLOPE_START_POS_2_ 
+                                           + (diff / SLOPE_TAIL_COUNT_));
+                buffer[offset ++] = (byte)(SLOPE_MIN_ + 
+                                           (diff % SLOPE_TAIL_COUNT_));
+            } 
+            else if (diff <= SLOPE_REACH_POS_3_) {
+                buffer[offset + 2] = (byte)(SLOPE_MIN_ 
+                                            + (diff % SLOPE_TAIL_COUNT_));
+                diff /= SLOPE_TAIL_COUNT_;
+                buffer[offset + 1] = (byte)(SLOPE_MIN_ 
+                                            + (diff % SLOPE_TAIL_COUNT_));
+                buffer[offset] = (byte)(SLOPE_START_POS_3_ 
+                                        + (diff / SLOPE_TAIL_COUNT_));
+                offset += 3;
+            } 
+            else {
+                buffer[offset + 3] = (byte)(SLOPE_MIN_ 
+                                            + diff % SLOPE_TAIL_COUNT_);
+                diff /= SLOPE_TAIL_COUNT_;
+                buffer[offset] = (byte)(SLOPE_MIN_ 
+                                        + diff % SLOPE_TAIL_COUNT_);
+                diff /= SLOPE_TAIL_COUNT_;
+                buffer[offset + 1] = (byte)(SLOPE_MIN_ 
+                                            + diff % SLOPE_TAIL_COUNT_);
+                buffer[offset] = (byte)SLOPE_MAX_;
+                offset += 4;
+            }
+        } 
+        else {
+            long division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
+            int modulo = (int)division;
+            if (diff >= SLOPE_REACH_NEG_2_) {
+                diff = (int)(division >> 32);
+                buffer[offset ++] = (byte)(SLOPE_START_NEG_2_ + diff);
+                buffer[offset ++] = (byte)(SLOPE_MIN_ + modulo);
+            } 
+            else if (diff >= SLOPE_REACH_NEG_3_) {
+                buffer[offset + 2] = (byte)(SLOPE_MIN_ + modulo);
+                diff = (int)(division >> 32);
+                division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
+                modulo = (int)division;
+                diff = (int)(division >> 32);
+                buffer[offset + 1] = (byte)(SLOPE_MIN_ + modulo);
+                buffer[offset] = (byte)(SLOPE_START_NEG_3_ + diff);
+                offset += 3;
+            } 
+            else {
+                buffer[offset + 3] = (byte)(SLOPE_MIN_ + modulo);
+                diff = (int)(division >> 32);
+                division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
+                modulo = (int)division;
+                diff = (int)(division >> 32);
+                buffer[offset + 2] = (byte)(SLOPE_MIN_ + modulo);
+                division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
+                modulo = (int)division;
+                buffer[offset + 1] = (byte)(SLOPE_MIN_ + modulo);
+                buffer[offset] = SLOPE_MIN_;
+                offset += 4;
+            }
+        }
+        return offset;
+    }
+        
+    /**
+     * How many bytes would writeDiff() write? 
+     * @param diff
+     */
+    private static final int lengthOfDiff(int diff) 
+    {
+        if (diff >= SLOPE_REACH_NEG_1_) {
+            if (diff <= SLOPE_REACH_POS_1_) {
+                return 1;
+            } 
+            else if (diff <= SLOPE_REACH_POS_2_) {
+                return 2;
+            } 
+            else if(diff <= SLOPE_REACH_POS_3_) {
+                return 3;
+            } 
+            else {
+                return 4;
+            }
+        } 
+        else {
+            if (diff >= SLOPE_REACH_NEG_2_) {
+                return 2;
+            } 
+            else if (diff >= SLOPE_REACH_NEG_3_) {
+                return 3;
+            } 
+            else {
+                return 4;
+            }
+        }
+    }
 }
--- a/icu4j/src/com/ibm/icu/text/CollationElementIterator.java
+++ b/icu4j/src/com/ibm/icu/text/CollationElementIterator.java
--- a/icu4j/src/com/ibm/icu/text/CollationKey.java
+++ b/icu4j/src/com/ibm/icu/text/CollationKey.java
@ -5,8 +5,8 @@
 *******************************************************************************
 *
 * $Source: /xsrl/Nsvn/icu/icu4j/src/com/ibm/icu/text/CollationKey.java,v $ 
-* $Date: 2002/06/21 23:56:44 $ 
-* $Revision: 1.6 $
+* $Date: 2002/06/22 07:23:45 $ 
+* $Revision: 1.7 $
 *
 *******************************************************************************
 */
@ -15,43 +15,49 @@ package com.ibm.icu.text;
 import java.util.Arrays;

 /**
- * <p>
- * A <code>CollationKey</code> represents a <code>String</code> under the
- * rules of a specific <code>Collator</code> object. Comparing two
- * <code>CollationKey</code>s returns the relative order of the
- * <code>String</code>s they represent.
- * </p>
- * <p>
- * <code>CollationKey</code> instances can not be create directly. Rather, 
- * they are generated by calling <code>Collator.getCollationKey(String)</code>. 
- * Since the rule set of each <code>Collator differs</code>, the sort orders of 
- * the same string under two unique <code>Collator</code> may not be the same. 
- * Hence comparing <code>CollationKey</code>s generated from different 
- * <code>Collator</code> objects may not give the right results.
- * </p>
- * <p>
- * Similar to <code>CollationKey.compareTo(CollationKey)</code>, 
- * the method <code>RuleBasedCollator.compare(String, String)</code> compares
- * two strings and returns the relative order. During the construction
- * of a <code>CollationKey</code> object, the entire source string is examined
- * and processed into a series of bits that are stored in the 
- * <code>CollationKey</code> object. Bitwise comparison on the bit sequences 
- * are then performed during <code>CollationKey.compareTo(CollationKey)</code>. 
- * This comparison could incurr expensive startup costs while creating 
- * the <code>CollationKey</code> object, but once the objects are created, 
- * binary comparisons are fast, and is recommended when the same strings are
- * to be compared over and over again. 
- * On the other hand <code>Collator.compare(String, String)</code> examines 
- * and processes the string only until the first characters differing in order,
- * and is recommend for use if the <code>String</code>s are to be compared only
- * once.
- * </p>
- * <p>
- * Details of the composition of the bit sequence is located at
- * <a href=http://oss.software.ibm.com/icu/userguide/Collate_ServiceArchitecture.html>
- * user guide</a>.
- * </p>
- * <p>The following example shows how <code>CollationKey</code>s might be used
+ * <p>A <code>CollationKey</code> represents a <code>String</code>
+ * under the rules of a specific <code>Collator</code>
+ * object. Comparing two <code>CollationKey</code>s returns the
+ * relative order of the <code>String</code>s they represent.</p>
+ *
+ * <p><code>CollationKey</code> instances are not created
+ * directly. Rather, they are generated by calling
+ * <code>Collator.getCollationKey(String)</code>.</p>
+ *
+ * <p>Since the rule set of <code>Collator</code>s can differ, the
+ * sort orders of the same string under two different
+ * <code>Collator</code>s might differ.  Hence comparing
+ * <code>CollationKey</code>s generated from different
+ * <code>Collator</code>s can give incorrect results.</p>
+ *
+ * <p>Both the method
+ * <code>CollationKey.compareTo(CollationKey)</code> and the method
+ * <code>Collator.compare(String, String)</code> compare two strings
+ * and returns their relative order.  The performance characterictics
+ * of these two approaches can differ.</p>
+ *
+ * <p>During the construction of a <code>CollationKey</code>, the
+ * entire source string is examined and processed into a series of
+ * bits that are stored in the <code>CollationKey</code>. When
+ * <code>CollationKey.compareTo(CollationKey)</code> executes, it
+ * performs bitwise comparison on the bit sequences.  This can incurs
+ * startup cost when creating the <code>CollationKey</code>, but once
+ * the key is created, binary comparisons are fast.  This approach is
+ * recommended when the same strings are to be compared over and over
+ * again.</p>
+ *
+ * <p>On the other hand, implementations of
+ * <code>Collator.compare(String, String)</code> can examine and
+ * process the strings only until the first characters differing in
+ * order.  This approach is recommended if the strings are to be
+ * compared only once.</p>
+ * 
+ * <p>More information about the composition of the bit sequence can
+ * be found in the 
+ * <a href="http://oss.software.ibm.com/icu/userguide/Collate_ServiceArchitecture.html">
+ * user guide</a>.</p>
+ *
+ * <p>The following example shows how <code>CollationKey</code>s can be used
 * to sort a list of <code>String</code>s.</p>
 * <blockquote>
 * <pre>
@ -82,16 +88,16 @@ import java.util.Arrays;
 * @see RuleBasedCollator
 * @author Syn Wee Quek
 * @since release 2.2, April 18 2002
- * @draft 2.2
+ * @draft 2.2 
 */
 public final class CollationKey implements Comparable 
 {
-	// public methods -------------------------------------------------------
+    // public methods -------------------------------------------------------

-	// public getters -------------------------------------------------------
+    // public getters -------------------------------------------------------
 	
    /**
-     * Returns the source string that this CollationKey represents.
+     * Return the source string that this CollationKey represents.
     * @return source string that this CollationKey represents
     * @draft 2.2
     */
@ -101,20 +107,19 @@ public final class CollationKey implements Comparable
    }

    /**
-     * <p>
-     * Duplicates and returns the value of this CollationKey as a sequence 
-     * of big-endian bytes terminated by a null.
-     * </p> 
-     * <p>
-     * If two CollationKeys could be legitimately compared, then one could 
-     * compare the byte arrays of each to obtain the same result.
+     * <p>Duplicates and returns the value of this CollationKey as a sequence 
+     * of big-endian bytes terminated by a null.</p> 
+     *
+     * <p>If two CollationKeys can be legitimately compared, then one can
+     * compare the byte arrays of each to obtain the same result, e.g.
     * <pre>
     * byte key1[] = collationkey1.toByteArray();
     * byte key2[] = collationkey2.toByteArray();
+     * int key, targetkey;
     * int i = 0;
-     * while (key1[i] != 0 && key2[i] != 0) {
-     *	   int key = key1[i] & 0xFF;
-     *     int targetkey = key2[i] & 0xFF;
+     * do {
+     *	   key = key1[i] & 0xFF;
+     *     targetkey = key2[i] & 0xFF;
     *     if (key &lt; targetkey) {
     *         System.out.println("String 1 is less than string 2");
     *         return;
@ -123,18 +128,9 @@ public final class CollationKey implements Comparable
     *         System.out.println("String 1 is more than string 2");
     *     }
     *     i ++;
-     * }
-     * int key = key1[i] & 0xFF;
-     * int targetkey = key2[i] & 0xFF;
-     * if (key &lt; targetkey) {
-     *     System.out.println("String 1 is less than string 2");
-     *     return;
-     * }
-     * if (targetkey &lt; key) {
-     *     System.out.println("String 1 is more than string 2");
-     *     return;
-     * }
-     * System.out.println("String 1 is equals to string 2");;
+     * } while (key != 0 && targetKey != 0);
+     *
+     * System.out.println("Strings are equal.");
     * </pre>
     * </p>  
     * @return CollationKey value in a sequence of big-endian byte bytes 
@ -145,10 +141,10 @@ public final class CollationKey implements Comparable
    {
    	int length = 0;
    	while (true) {
-    		if (m_key_[length] == 0) {
-    			break;
-    		}
-    		length ++;
+	    if (m_key_[length] == 0) {
+		break;
+	    }
+	    length ++;
    	}
    	length ++;
    	byte result[] = new byte[length];
@ -156,94 +152,88 @@ public final class CollationKey implements Comparable
        return result;
    }

- 	// public other methods -------------------------------------------------	
+    // public other methods -------------------------------------------------	
 	
    /**
-     * <p>
-     * Compare this CollationKey to the argument target CollationKey. 
-     * The collation 
-     * rules of the Collator object which created these keys are applied.
-     * </p>
-     * <p>
-     * <strong>Note:</strong> Comparison between CollationKeys created by 
-     * different Collators may not return the correct result. See class 
-     * documentation.
-     * </p>
+     * <p>Compare this CollationKey to another CollationKey.  The
+     * collation rules of the Collator that created this key are
+     * applied.</p>
+     *
+     * <p><strong>Note:</strong> Comparison between CollationKeys
+     * created by different Collators might return incorrect
+     * results.  See class documentation.</p>
+     *
     * @param target target CollationKey
-     * @return an integer value, if value is less than zero this CollationKey
-     *         is less than than target, if value is zero if they are equal 
-     *         and value is greater than zero if this CollationKey is greater 
+     * @return an integer value.  If the value is less than zero this CollationKey
+     *         is less than than target, if the value is zero they are equal, and
+     *         if the value is greater than zero this CollationKey is greater 
     *         than target.
-     * @exception NullPointerException thrown when argument is null.
+     * @exception NullPointerException is thrown if argument is null.
     * @see Collator#compare(String, String)
-     * @draft 2.2
-     */
+     * @draft 2.2 */
    public int compareTo(CollationKey target)
    {
    	int i = 0;
    	while (m_key_[i] != 0 && target.m_key_[i] != 0) {
-    		int key = m_key_[i] & 0xFF;
-    		int targetkey = target.m_key_[i] & 0xFF;
-    		if (key < targetkey) {
-    			return -1;
-    		}
-    		if (targetkey < key) {
-    			return 1;
-    		}
-    		i ++;
+	    int key = m_key_[i] & 0xFF;
+	    int targetkey = target.m_key_[i] & 0xFF;
+	    if (key < targetkey) {
+		return -1;
+	    }
+	    if (targetkey < key) {
+		return 1;
+	    }
+	    i ++;
    	}
    	// last comparison if we encounter a 0
    	int key = m_key_[i] & 0xFF;
    	int targetkey = target.m_key_[i] & 0xFF;
        if (key < targetkey) {
-    		return -1;
+	    return -1;
    	}
    	if (targetkey < key) {
-    		return 1;
+	    return 1;
    	}
        return 0;
    }

    /**
-     * <p>
-     * Compares this CollationKey with the specified Object.
-     * The collation 
-     * rules of the Collator object which created these objects are applied.
-     * </p>
-     * <p>
-     * See note in compareTo(CollationKey) for warnings of incorrect results
-     * </p>
-     * @param obj the Object to be compared.
+     * <p>Compare this CollationKey with the specified Object.  The
+     * collation rules of the Collator that created this key are
+     * applied.</p>
+     * 
+     * <p>See note in compareTo(CollationKey) for warnings about possible
+     * incorrect results.</p>
+     *
+     * @param obj the Object to be compared to.
     * @return Returns a negative integer, zero, or a positive integer 
     *         respectively if this CollationKey is less than, equal to, or 
     *         greater than the given Object.
-     * @exception ClassCastException thrown when the specified argument is not 
-     *            a CollationKey. NullPointerException thrown when argument 
+     * @exception ClassCastException is thrown when the argument is not 
+     *            a CollationKey.  NullPointerException is thrown when the argument 
     *            is null.
     * @see #compareTo(CollationKey)
-     * @draft 2.2
-     */
+     * @draft 2.2 */
    public int compareTo(Object obj) 
    {
- 		return compareTo((CollationKey)obj);
+	return compareTo((CollationKey)obj);
    }

    /**
-     * <p>
-     * Compare this CollationKey and the argument target object for equality.
-     * The collation 
-     * rules of the Collator object which created these objects are applied.
-     * </p>
-     * <p>
-     * See note in compareTo(CollationKey) for warnings of incorrect results
-     * </p>
+     * <p>Compare this CollationKey and the specified Object for
+     * equality.  The collation rules of the Collator that created
+     * this key are applied.</p>
+     *
+     * <p>See note in compareTo(CollationKey) for warnings about
+     * possible incorrect results.</p>
+     *
     * @param target the object to compare to.
-     * @return true if two objects are equal, false otherwise.
+     * @return true if the two keys compare as equal, false otherwise.
     * @see #compareTo(CollationKey)
-     * @exception ClassCastException thrown when the specified argument is not 
-     *            a CollationKey. NullPointerException thrown when argument 
+     * @exception ClassCastException is thrown when the argument is not 
+     *            a CollationKey.  NullPointerException is thrown when the argument 
     *            is null.
-     * @draft 2.2
+     * @draft 2.2 
     */
    public boolean equals(Object target) 
    {
@ -266,13 +256,13 @@ public final class CollationKey implements Comparable
     * </p>
     * @param target the CollationKey to compare to.
     * @return true if two objects are equal, false otherwise.
-     * @exception NullPointerException thrown when argument is null.
+     * @exception NullPointerException is thrown when the argument is null.
     * @draft 2.2
     */
    public boolean equals(CollationKey target) 
    {
        if (this == target) {
-        	return true;
+	    return true;
        }
        if (target == null) {
            return false;
@ -280,20 +270,19 @@ public final class CollationKey implements Comparable
        CollationKey other = (CollationKey)target;
        int i = 0;
        while (true) {
-        	if (m_key_[i] != other.m_key_[i]) {
-        		return false;
-        	}
-        	if (m_key_[i] == 0) {
-        		break;
-        	}
-        	i ++;
+	    if (m_key_[i] != other.m_key_[i]) {
+		return false;
+	    }
+	    if (m_key_[i] == 0) {
+		break;
+	    }
+	    i ++;
        }
        return true;
    }

    /**
-     * <p>
-     * Creates a hash code for this CollationKey. The hash value is calculated 
+     * <p>Returns a hash code for this CollationKey. The hash value is calculated 
     * on the key itself, not the String from which the key was created. Thus 
     * if x and y are CollationKeys, then x.hashCode(x) == y.hashCode() 
     * if x.equals(y) is true. This allows language-sensitive comparison in a 
@ -305,25 +294,25 @@ public final class CollationKey implements Comparable
    public int hashCode() 
    {
    	if (m_hashCode_ == 0) {
-    		int size = m_key_.length >> 1;
-    		StringBuffer key = new StringBuffer(size);
-    		int i = 0;
-    		while (m_key_[i] != 0 && m_key_[i + 1] != 0) {
-    			key.append((char)((m_key_[i] << 8) | m_key_[i + 1]));
-    			i += 2;
-    		}
-    		if (m_key_[i] != 0) {
-    			key.append((char)(m_key_[i] << 8));
-    		}
-    		m_hashCode_ = key.toString().hashCode();
+	    int size = m_key_.length >> 1;
+	    StringBuffer key = new StringBuffer(size);
+	    int i = 0;
+	    while (m_key_[i] != 0 && m_key_[i + 1] != 0) {
+		key.append((char)((m_key_[i] << 8) | m_key_[i + 1]));
+		i += 2;
+	    }
+	    if (m_key_[i] != 0) {
+		key.append((char)(m_key_[i] << 8));
+	    }
+	    m_hashCode_ = key.toString().hashCode();
    	}
        return m_hashCode_;
    }

-	// protected constructor ------------------------------------------------
+    // protected constructor ------------------------------------------------
    
    /**
-     * Protected CollationKey can only be generated by Collator objects
+     * CollationKey can only be generated by Collator objects
     * @param source string the CollationKey represents
     * @param key sort key array of bytes
     * @param size of sort key 
@ -336,18 +325,20 @@ public final class CollationKey implements Comparable
    	m_hashCode_ = 0;
    }

-	// private data members -------------------------------------------------
+    // private data members -------------------------------------------------

-	/**
-	 * Source string this CollationKey represents
-	 */	
+    /**
+     * Source string this CollationKey represents
+     */	
    private String m_source_;
+
    /**
     * Sequence of bytes that represents the sort key
     */
    private byte m_key_[];
+
    /**
     * Hash code for the key
     */
    private int m_hashCode_;
-}
+}
--- a/icu4j/src/com/ibm/icu/text/Collator.java
+++ b/icu4j/src/com/ibm/icu/text/Collator.java
@ -5,8 +5,8 @@
 *******************************************************************************
 *
 * $Source: /xsrl/Nsvn/icu/icu4j/src/com/ibm/icu/text/Collator.java,v $ 
-* $Date: 2002/06/21 23:56:44 $ 
-* $Revision: 1.7 $
+* $Date: 2002/06/22 07:23:45 $ 
+* $Revision: 1.8 $
 *
 *******************************************************************************
 */
@ -15,18 +15,16 @@ package com.ibm.icu.text;
 import java.util.Locale;

 /**
-* <p>
-* Collator is an abstract base class, its subclasses performs 
-* locale-sensitive String comparison. A concrete subclass, RuleBasedCollator, 
-* is provided and it allows customization of the collation ordering by the use 
-* of rule sets.
-* </p>
-* <p>
-* Following the 
-* <a href=http://www.unicode.org>Unicode Consortium</a>'s specifications for
-* the <a href=http://www.unicode.org/unicode/reports/tr10/>
-* Unicode Collation Algorithm (UCA)</a>, there are
-* 5 different levels of strength used in comparisons.
+* <p>Collator performs locale-sensitive string comparison. A concrete
+* subclass, RuleBasedCollator, allows customization of the collation
+* ordering by the use of rule sets.</p>
+* 
+* <p>Following the <a href=http://www.unicode.org>Unicode
+* Consortium</a>'s specifications for the 
+* <a href="http://www.unicode.org/unicode/reports/tr10/"> Unicode Collation
+* Algorithm (UCA)</a>, there are 5 different levels of strength used
+* in comparisons:
+*
 * <ul>
 * <li>PRIMARY strength: Typically, this is used to denote differences between 
 *     base characters (for example, "a" &lt; "b"). 
@ -60,11 +58,12 @@ import java.util.Locale;
 *     are compared, just in case there is no difference. 
 *     For example, Hebrew cantellation marks are only distinguished at this 
 *     strength. This strength should be used sparingly, as only code point 
-*     values differences between two strings is an extremely rare occurrence. 
+*     value differences between two strings is an extremely rare occurrence. 
 *     Using this strength substantially decreases the performance for both 
 *     comparison and collation key generation APIs. This strength also 
 *     increases the size of the collation key.
 * </ul>
+*
 * Unlike the JDK, ICU4J's Collator deals only with 2 decomposition modes, 
 * the canonical decomposition mode and one that does not use any decomposition.
 * The compatibility decomposition mode, java.text.Collator.FULL_DECOMPOSITION
@ -73,15 +72,13 @@ import java.util.Locale;
 * producing the same results as if the text were normalized in NFD. If 
 * canonical decomposition is turned off, it is the user's responsibility to 
 * ensure that all text is already in the appropriate form before performing
-* a comparison or before getting a CollationKey.
-* </p>
-* <p>
-* For more information about the collation service see the 
+* a comparison or before getting a CollationKey.</p>
+*
+* <p>For more information about the collation service see the 
 * <a href="http://oss.software.ibm.com/icu/userguide/Collate_Intro.html">users 
-* guide</a>.
-* </p>
-* <p>
-* Examples of use
+* guide</a>.</p>
+*
+* <p>Examples of use
 * <pre>
 * // Get the Collator for US English and set its strength to PRIMARY
 * Collator usCollator = Collator.getInstance(Locale.US);
@ -90,8 +87,9 @@ import java.util.Locale;
 *     System.out.println("Strings are equivalent");
 * }
 * 
-* The following example shows how to compare two strings using the Collator 
-* for the default locale. 
+* The following example shows how to compare two strings using the
+* Collator for the default locale.
+*
 * // Compare two strings in the default locale
 * Collator myCollator = Collator.getInstance();
 * myCollator.setDecomposition(NO_DECOMPOSITION);
@ -114,22 +112,21 @@ import java.util.Locale;
 * @see CollationKey
 * @author Syn Wee Quek
 * @since release 2.2, April 18 2002
-* @draft 2.2
+* @draft 2.2 
 */
-
 public abstract class Collator
 {     
-	// public data members ---------------------------------------------------
-	
-	/**
-     * Strongest collator strength value. Typically, used to denote differences 
-     * between base characters.
-     * See class documentation for more explanation.
+    // public data members ---------------------------------------------------
+        
+    /**
+     * Strongest collator strength value. Typically used to denote differences 
+     * between base characters. See class documentation for more explanation.
     * @see #setStrength
     * @see #getStrength
     * @draft 2.2
     */
    public final static int PRIMARY = 0;
+
    /**
     * Second level collator strength value. 
     * Accents in the characters are considered secondary differences.
@ -141,6 +138,7 @@ public abstract class Collator
     * @draft 2.2
     */
    public final static int SECONDARY = 1;
+
    /**
     * Third level collator strength value. 
     * Upper and lower case differences in characters are distinguished at this
@ -152,19 +150,21 @@ public abstract class Collator
     * @draft 2.2
     */
    public final static int TERTIARY = 2;                            
+
    /**
     * Fourth level collator strength value. 
     * When punctuation is ignored 
-     * <a href=http://www-124.ibm.com/icu/userguide/Collate_Concepts.html#Ignoring_Punctuation>
+     * <a href="http://www-124.ibm.com/icu/userguide/Collate_Concepts.html#Ignoring_Punctuation">
     * (see Ignoring Punctuations in the user guide)</a> at PRIMARY to TERTIARY 
     * strength, an additional strength level can 
-     * be used to distinguish words with and without punctuation
+     * be used to distinguish words with and without punctuation.
     * See class documentation for more explanation.
     * @see #setStrength
     * @see #getStrength
     * @draft 2.2
     */
    public final static int QUATERNARY = 3;
+
    /**
     * <p>
     * Smallest Collator strength value. When all other strengths are equal, 
@ -181,36 +181,32 @@ public abstract class Collator
    public final static int IDENTICAL = 15;

    /**
-     * <p>
-     * Decomposition mode value. With NO_DECOMPOSITION set, Strings will not be 
-     * decomposed for collation. This is the default 
-     * decomposition setting unless otherwise specified by the locale used
-     * to create the Collator.
-     * </p>
-     * <p>
-     * Note this value is different from JDK's
-     * </p>
+     * <p>Decomposition mode value. With NO_DECOMPOSITION set, Strings
+     * will not be decomposed for collation. This is the default
+     * decomposition setting unless otherwise specified by the locale
+     * used to create the Collator.</p>
+     *
+     * <p><strong>Note</strong> this value is different from the JDK's.</p>
     * @see #CANONICAL_DECOMPOSITION
     * @see #getDecomposition
     * @see #setDecomposition
-     * @draft 2.2
+     * @draft 2.2 
     */
    public final static int NO_DECOMPOSITION = 16;
+
    /**
-     * <p>
-     * Decomposition mode value. With CANONICAL_DECOMPOSITION set, 
-     * characters that are canonical variants according to Unicode 2.0 will be 
-     * decomposed for collation.
-     * </p>
-     * <p>
-     * CANONICAL_DECOMPOSITION corresponds to Normalization Form D as
+     * <p>Decomposition mode value. With CANONICAL_DECOMPOSITION set,
+     * characters that are canonical variants according to Unicode 2.0
+     * will be decomposed for collation.</p>
+     *
+     * <p>CANONICAL_DECOMPOSITION corresponds to Normalization Form D as
     * described in <a href="http://www.unicode.org/unicode/reports/tr15/">
     * Unicode Technical Report #15</a>.
     * </p>
     * @see #NO_DECOMPOSITION
     * @see #getDecomposition
     * @see #setDecomposition
-     * @draft 2.2
+     * @draft 2.2 
     */
    public final static int CANONICAL_DECOMPOSITION = 1;
    
@ -219,25 +215,23 @@ public abstract class Collator
    // public setters --------------------------------------------------------
    
    /**
-     * <p>
-     * Sets this Collator's strength property. The strength property 
+     * <p>Sets this Collator's strength property. The strength property 
     * determines the minimum level of difference considered significant 
-     * during comparison.
-     * </p>
-     * <p> 
-     * The default strength for the Collator is TERTIARY, unless specified 
-     * otherwise by the locale used to create the Collator.
-     * </p>
+     * during comparison.</p>
+     * 
+     * <p>The default strength for the Collator is TERTIARY, unless specified 
+     * otherwise by the locale used to create the Collator.</p>
+     *
     * <p>See the Collator class description for an example of use.</p>
-     * @param the new strength value.
+     * @param new Strength the new strength value.
     * @see #getStrength
     * @see #PRIMARY
     * @see #SECONDARY
     * @see #TERTIARY
     * @see #QUATERNARY
     * @see #IDENTICAL
-     * @exception IllegalArgumentException If the new strength value is not one 
-     * 		      of PRIMARY, SECONDARY, TERTIARY, QUATERNARY or IDENTICAL.
+     * @exception IllegalArgumentException if the new strength value is not one 
+     *                of PRIMARY, SECONDARY, TERTIARY, QUATERNARY or IDENTICAL.
     * @draft 2.2
     */
    public void setStrength(int newStrength) 
@ -253,35 +247,34 @@ public abstract class Collator
    }
    
    /**
-     * <p>
-     * Set the decomposition mode of this Collator. 
-     * Setting this decomposition property with CANONICAL_DECOMPOSITION allows 
-     * the Collator to handle 
-     * un-normalized text properly, producing the same results as if the text 
-     * were normalized. If NO_DECOMPOSITION is set, it is the user's 
-     * responsibility to insure that all text is already in the appropriate 
-     * form before a comparison or before getting a CollationKey. Adjusting
-     * decomposition mode allows the user to select between faster and more
-     * complete collation behavior.
-     * </p>
-     * <p>
-     * Since a great majority of the world languages does not require text
-     * normalization, most locales has NO_DECOMPOSITION has the default 
-     * decomposition mode.
-     * <p>
-     * The default decompositon mode for the Collator is NO_DECOMPOSITON, 
-     * unless specified otherwise by the locale used to create the Collator.
-     * </p>
-     * <p>
-     * See getDecomposition for a description of decomposition mode.
-     * </p>
+     * <p>Set the decomposition mode of this Collator.  Setting this
+     * decomposition property with CANONICAL_DECOMPOSITION allows the
+     * Collator to handle un-normalized text properly, producing the
+     * same results as if the text were normalized. If
+     * NO_DECOMPOSITION is set, it is the user's responsibility to
+     * insure that all text is already in the appropriate form before
+     * a comparison or before getting a CollationKey. Adjusting
+     * decomposition mode allows the user to select between faster and
+     * more complete collation behavior.</p>
+     * 
+     * <p>Since a great many of the world's languages do not require
+     * text normalization, most locales set NO_DECOMPOSITION as the
+     * default decomposition mode.</p>
+     * 
+     * The default decompositon mode for the Collator is
+     * NO_DECOMPOSITON, unless specified otherwise by the locale used
+     * to create the Collator.</p>
+     * 
+     * <p>See getDecomposition for a description of decomposition
+     * mode.</p>
+     * 
     * @param decomposition the new decomposition mode
     * @see #getDecomposition
     * @see #NO_DECOMPOSITION
     * @see #CANONICAL_DECOMPOSITION
     * @exception IllegalArgumentException If the given value is not a valid 
     *            decomposition mode.
-     * @draft 2.2
+     * @draft 2.2 
     */
    public void setDecomposition(int decomposition) 
    {
@ -324,17 +317,16 @@ public abstract class Collator
     */
    public static final Collator getInstance(Locale locale)
    {
-    	try {
-    		return new RuleBasedCollator(locale);
-    	} 
-    	catch(Exception e) {
-    		return RuleBasedCollator.UCA_;
-    	}
+        try {
+            return new RuleBasedCollator(locale);
+        } 
+        catch(Exception e) {
+            return RuleBasedCollator.UCA_;
+        }
    }
    
    /**
-     * <p>
-     * Returns this Collator's strength property. The strength property 
+     * <p>Returns this Collator's strength property. The strength property 
     * determines the minimum level of difference considered significant.
     * </p>
     * <p>
@ -376,12 +368,12 @@ public abstract class Collator
    // public other methods -------------------------------------------------

    /**
-     * Convenience method for comparing the equality of two text Strings based 
-     * on this Collator's collation rules, strength and decomposition mode.
-     * @param source the source string to be compared with.
-     * @param target the target string to be compared with.
+     * Convenience method for comparing the equality of two text Strings using
+     * this Collator's rules, strength and decomposition mode.
+     * @param source the source string to be compared.
+     * @param target the target string to be compared.
     * @return true if the strings are equal according to the collation
-     *         rules. false, otherwise.
+     *         rules, otherwise false.
     * @see #compare
     * @exception NullPointerException thrown if either arguments is null.
     * @draft 2.2
@ -412,7 +404,7 @@ public abstract class Collator
    /**
     * <p>
     * Compares the source text String to the target text String according to 
-     * the collation rules, strength and decomposition mode for this Collator. 
+     * this Collator's rules, strength and decomposition mode.
     * Returns an integer less than, 
     * equal to or greater than zero depending on whether the source String is 
     * less than, equal to or greater than the target String. See the Collator
@ -432,8 +424,8 @@ public abstract class Collator

    /**
     * <p>
-     * Transforms the String into a series of bits that can be compared 
-     * bitwise to other CollationKeys. Bits generated depends on the collation
+     * Transforms the String into a CollationKey suitable for efficient
+     * repeated comparison.  The resulting key depends on the collator's
     * rules, strength and decomposition mode.
     * </p> 
     * <p>See the CollationKey class documentation for more information.</p>
@ -448,7 +440,6 @@ public abstract class Collator
    public abstract CollationKey getCollationKey(String source);
    
    // protected constructor -------------------------------------------------
-
  
    // private data members --------------------------------------------------
    
@ -456,6 +447,7 @@ public abstract class Collator
     * Collation strength
     */
    private int m_strength_ = TERTIARY;
+
    /**
     * Decomposition mode
     */