ICU-1955 doc update

X-SVN-Rev: 8924
This commit is contained in:
Doug Felt 2002-06-22 07:23:45 +00:00
parent 6939f0b1bb
commit d2500d9618
4 changed files with 2487 additions and 2503 deletions

View file

@ -5,8 +5,8 @@
*******************************************************************************
*
* $Source: /xsrl/Nsvn/icu/icu4j/src/com/ibm/icu/text/Attic/BOSCU.java,v $
* $Date: 2002/06/20 01:21:18 $
* $Revision: 1.2 $
* $Date: 2002/06/22 07:23:45 $
* $Revision: 1.3 $
*
*******************************************************************************
*/
@ -17,366 +17,367 @@ import com.ibm.icu.impl.UnicodeCharacterIterator;
/**
* <p>Binary Ordered Compression Scheme for Unicode</p>
*
* <p>Specific application:<br>
* Encode a Unicode string for the identical level of a sort key.<br>
* Restrictions:
* <ul>
* <li> byte stream (unsigned 8-bit bytes)
* <li> lexical order of the identical-level run must be the same as code
* point order for the string
* <li> avoid byte values 0, 1, 2
* </ul>
* </p>
* <p>(Syn Wee: reference a paper if we have one on our site)</p>
* <p>BOCU is used to compress unicode text into a stream of unsigned
* bytes. For many kinds of text the compression compares favorably
* to UTF-8, and for some kinds of text (such as CJK) it does better.
* The resulting bytes will compare in the same order as the original
* code points. The byte stream does not contain the values 0, 1, or
* 2. (Syn Wee, I don't understand the comment later in the source
* about these values being used in sort keys, can you explain?)</p>
*
* <p>Unlike a UTF encoding, BOCU-compressed text is not suitable for
* random access.</p>
*
* <p>Method: Slope Detection<br>
* Remember the previous code point (initial 0).
* For each cp in the string, encode the difference to the previous one.
* </p>
* <p>With a compact encoding of differences, this yields good results for
* small scripts and UTF-like results otherwise.
* </p>
* <p>Encoding of differences:<br>
* <ul>
* <li>Similar to a UTF, encoding the length of the byte sequence in the lead
* bytes.
* <li> Does not need to be friendly for decoding or random access
* (trail byte values may overlap with lead/single byte values).
* <li> The signedness must be encoded as the most significant part.
* </ul>
* </p>
* <p>We encode differences with few bytes if their absolute values are small.
* For correct ordering, we must treat the entire value range -10ffff..+10ffff
* in ascending order, which forbids encoding the sign and the absolute value
* separately.
* Instead, we split the lead byte range in the middle and encode non-negative
* values going up and negative values going down.
* </p>
* <p>For very small absolute values, the difference is added to a middle byte
* value for single-byte encoded differences.
* For somewhat larger absolute values, the difference is divided by the number
* of byte values available, the modulo is used for one trail byte, and the
* remainder is added to a lead byte avoiding the single-byte range.
* For large absolute values, the difference is similarly encoded in three
* bytes.
* </p>
* <p>This encoding does not use byte values 0, 1, 2, but uses all other byte
* values for lead/single bytes so that the middle range of single bytes is as
* large as possible.
* </p>
* <p>Note that the lead byte ranges overlap some, but that the sequences as a
* whole are well ordered. I.e., even if the lead byte is the same for
* sequences of different lengths, the trail bytes establish correct order.
* It would be possible to encode slightly larger ranges for each length (>1)
* by subtracting the lower bound of the range. However, that would also slow
* down the calculation.
* </p>
* <p>For the actual string encoding, an optimization moves the previous code
* point value to the middle of its Unicode script block to minimize the
* differences in same-script text runs.
* </p>
* <p>Method: Slope Detection<br> Remember the previous code point
* (initial 0). For each code point in the string, encode the
* difference with the previous one. Similar to a UTF, the length of
* the byte sequence is encoded in the lead bytes. Unlike a UTF, the
* trail byte values may overlap with lead/single byte values. The
* signedness of the difference must be encoded as the most
* significant part.</p>
*
* <p>We encode differences with few bytes if their absolute values
* are small. For correct ordering, we must treat the entire value
* range -10ffff..+10ffff in ascending order, which forbids encoding
* the sign and the absolute value separately. Instead, we split the
* lead byte range in the middle and encode non-negative values going
* up and negative values going down.</p>
*
* <p>For very small absolute values, the difference is added to a
* middle byte value for single-byte encoded differences. For
* somewhat larger absolute values, the difference is divided by the
* number of byte values available, the modulo is used for one trail
* byte, and the remainder is added to a lead byte avoiding the
* single-byte range. For large absolute values, the difference is
* similarly encoded in three bytes. (Syn Wee, I need examples
* here.)</p>
*
* <p>BOCU does not use byte values 0, 1, or 2, but uses all other
* byte values for lead and single bytes, so that the middle range of
* single bytes is as large as possible.</p>
*
* <p>Note that the lead byte ranges overlap some, but that the
* sequences as a whole are well ordered. I.e., even if the lead byte
* is the same for sequences of different lengths, the trail bytes
* establish correct order. It would be possible to encode slightly
* larger ranges for each length (>1) by subtracting the lower bound
* of the range. However, that would also slow down the calculation.
* (Syn Wee, need an example).</p>
*
* <p>For the actual string encoding, an optimization moves the
* previous code point value to the middle of its Unicode script block
* to minimize the differences in same-script text runs. (Syn Wee,
* need an example.)</p>
*
* @author Syn Wee Quek
* @since release 2.2, May 3rd 2002
* @draft 2.2
*/
* @draft 2.2 */
public class BOSCU
{
// public constructors --------------------------------------------------
// public constructors --------------------------------------------------
// public methods -------------------------------------------------------
/**
* <p>Encode the code points of a string as a sequence of byte-encoded
* differences (slope detection), preserving lexical order.</p>
* <p>Optimize the difference-taking for runs of Unicode text within
* small scripts:<br>
* Most small scripts are allocated within aligned 128-blocks of Unicode
* code points. Lexical order is preserved if "prev" is always moved
* into the middle of such a block.</p>
* <p>Additionally, "prev" is moved from anywhere in the Unihan area into
* the middle of that area.</p>
* <p>Note that the identical-level run in a sort key is generated from
* NFD text - there are never Hangul characters included.</p>
* @param source text source
* @param buffer output buffer
* @param offset to start writing to
* @return end offset where the writing stop
*/
public static int writeIdenticalLevelRun(String source, byte buffer[],
int offset)
{
int prev = 0;
UnicodeCharacterIterator iterator = new UnicodeCharacterIterator(source);
int codepoint = iterator.nextCodePoint();
while (codepoint != UnicodeCharacterIterator.DONE_CODEPOINT) {
if (prev < 0x4e00 || prev >= 0xa000) {
prev = (prev & ~0x7f) - SLOPE_REACH_NEG_1_;
}
else {
// Unihan U+4e00..U+9fa5:
// double-bytes down from the upper end
prev = 0x9fff - SLOPE_REACH_POS_2_;
}
offset = writeDiff(codepoint - prev, buffer, offset);
prev = codepoint;
codepoint = iterator.nextCodePoint();
}
return offset;
}
/**
* How many bytes would writeIdenticalLevelRun() write?
* @param source text source string
* @return the length of the BOSCU result
*/
public static int lengthOfIdenticalLevelRun(String source)
{
int prev = 0;
int result = 0;
UnicodeCharacterIterator iterator = new UnicodeCharacterIterator(source);
int codepoint = iterator.nextCodePoint();
while (codepoint != UnicodeCharacterIterator.DONE_CODEPOINT) {
if (prev < 0x4e00 || prev >= 0xa000) {
prev = (prev & ~0x7f) - SLOPE_REACH_NEG_1_;
}
else {
// Unihan U+4e00..U+9fa5:
// double-bytes down from the upper end
prev = 0x9fff - SLOPE_REACH_POS_2_;
}
codepoint = iterator.nextCodePoint();
result += lengthOfDiff(codepoint - prev);
prev = codepoint;
}
return result;
}
// public methods -------------------------------------------------------
/**
* <p>(Syn Wee-- I think this should be renamed to 'compress')</p>
* <p>Encode the code points of a string as a sequence of bytes,
* preserving lexical order.</p>
*
* @param source text source
* @param buffer output buffer
* @param offset to start writing to
* @return end offset where the writing stopped
*/
public static int writeIdenticalLevelRun(String source, byte buffer[],
int offset)
{
// (Syn Wee - this is a public function so comments of this nature don't
// really belong in the documentation, I think. So I moved them.)
// Optimize the difference-taking for runs of Unicode text within
// small scripts.
// Most small scripts are allocated within aligned 128-blocks of Unicode
// code points. Lexical order is preserved if "prev" is always moved
// into the middle of such a block.
// <p>Additionally, "prev" is moved from anywhere in the Unihan area into
// the middle of that area.
// Note that the identical-level run in a sort key is generated from
// NFD text - there are never Hangul characters included.
// public setter methods -------------------------------------------------
int prev = 0;
UnicodeCharacterIterator iterator = new UnicodeCharacterIterator(source);
int codepoint = iterator.nextCodePoint();
while (codepoint != UnicodeCharacterIterator.DONE_CODEPOINT) {
if (prev < 0x4e00 || prev >= 0xa000) {
prev = (prev & ~0x7f) - SLOPE_REACH_NEG_1_;
}
else {
// Unihan U+4e00..U+9fa5:
// double-bytes down from the upper end
prev = 0x9fff - SLOPE_REACH_POS_2_;
}
offset = writeDiff(codepoint - prev, buffer, offset);
prev = codepoint;
codepoint = iterator.nextCodePoint();
}
return offset;
}
/**
* <p>(Syn Wee, I think this should be renamed getCompressedLength).</p>
* Return the number of bytes that writeIdenticalLevelRun() would write.
* @param source text source string
* @return the length of the BOCU result
*/
public static int lengthOfIdenticalLevelRun(String source)
{
int prev = 0;
int result = 0;
UnicodeCharacterIterator iterator = new UnicodeCharacterIterator(source);
int codepoint = iterator.nextCodePoint();
while (codepoint != UnicodeCharacterIterator.DONE_CODEPOINT) {
if (prev < 0x4e00 || prev >= 0xa000) {
prev = (prev & ~0x7f) - SLOPE_REACH_NEG_1_;
}
else {
// Unihan U+4e00..U+9fa5:
// double-bytes down from the upper end
prev = 0x9fff - SLOPE_REACH_POS_2_;
}
codepoint = iterator.nextCodePoint();
result += lengthOfDiff(codepoint - prev);
prev = codepoint;
}
return result;
}
// public setter methods -------------------------------------------------
// public getter methods ------------------------------------------------
// public other methods -------------------------------------------------
// public other methods -------------------------------------------------
// protected constructor ------------------------------------------------
// protected data members ------------------------------------------------
// protected data members ------------------------------------------------
// protected methods -----------------------------------------------------
// private data members --------------------------------------------------
// private data members --------------------------------------------------
/**
* Do not use byte values 0, 1, 2 because they are separators in sort keys.
*/
private static final int SLOPE_MIN_ = 3;
private static final int SLOPE_MAX_ = 0xff;
private static final int SLOPE_MIDDLE_ = 0x81;
private static final int SLOPE_TAIL_COUNT_ = SLOPE_MAX_ - SLOPE_MIN_ + 1;
private static final int SLOPE_MAX_BYTES_ = 4;
private static final int SLOPE_MIN_ = 3;
private static final int SLOPE_MAX_ = 0xff;
private static final int SLOPE_MIDDLE_ = 0x81;
private static final int SLOPE_TAIL_COUNT_ = SLOPE_MAX_ - SLOPE_MIN_ + 1;
private static final int SLOPE_MAX_BYTES_ = 4;
/**
* Number of lead bytes:
* 1 middle byte for 0
* 2*80=160 single bytes for !=0
* 2*42=84 for double-byte values
* 2*3=6 for 3-byte values
* 2*1=2 for 4-byte values
*
* The sum must be <=SLOPE_TAIL_COUNT.
*
* Why these numbers?
* - There should be >=128 single-byte values to cover 128-blocks
* with small scripts.
* - There should be >=20902 single/double-byte values to cover Unihan.
* - It helps CJK Extension B some if there are 3-byte values that cover
* the distance between them and Unihan.
* This also helps to jump among distant places in the BMP.
* - Four-byte values are necessary to cover the rest of Unicode.
*
* Symmetrical lead byte counts are for convenience.
* With an equal distribution of even and odd differences there is also
* no advantage to asymmetrical lead byte counts.
*/
private static final int SLOPE_SINGLE_ = 80;
private static final int SLOPE_LEAD_2_ = 42;
private static final int SLOPE_LEAD_3_ = 3;
private static final int SLOPE_LEAD_4_ = 1;
/**
* Number of lead bytes:
* 1 middle byte for 0
* 2*80=160 single bytes for !=0
* 2*42=84 for double-byte values
* 2*3=6 for 3-byte values
* 2*1=2 for 4-byte values
*
* The sum must be <=SLOPE_TAIL_COUNT.
*
* Why these numbers?
* - There should be >=128 single-byte values to cover 128-blocks
* with small scripts.
* - There should be >=20902 single/double-byte values to cover Unihan.
* - It helps CJK Extension B some if there are 3-byte values that cover
* the distance between them and Unihan.
* This also helps to jump among distant places in the BMP.
* - Four-byte values are necessary to cover the rest of Unicode.
*
* Symmetrical lead byte counts are for convenience.
* With an equal distribution of even and odd differences there is also
* no advantage to asymmetrical lead byte counts.
*/
private static final int SLOPE_SINGLE_ = 80;
private static final int SLOPE_LEAD_2_ = 42;
private static final int SLOPE_LEAD_3_ = 3;
private static final int SLOPE_LEAD_4_ = 1;
/**
* The difference value range for single-byters.
*/
private static final int SLOPE_REACH_POS_1_ = SLOPE_SINGLE_;
private static final int SLOPE_REACH_NEG_1_ = (-SLOPE_SINGLE_);
/**
* The difference value range for single-byters.
*/
private static final int SLOPE_REACH_POS_1_ = SLOPE_SINGLE_;
private static final int SLOPE_REACH_NEG_1_ = (-SLOPE_SINGLE_);
/**
* The difference value range for double-byters.
*/
private static final int SLOPE_REACH_POS_2_ =
SLOPE_LEAD_2_ * SLOPE_TAIL_COUNT_ + SLOPE_LEAD_2_ - 1;
private static final int SLOPE_REACH_NEG_2_ = (-SLOPE_REACH_POS_2_ - 1);
/**
* The difference value range for double-byters.
*/
private static final int SLOPE_REACH_POS_2_ =
SLOPE_LEAD_2_ * SLOPE_TAIL_COUNT_ + SLOPE_LEAD_2_ - 1;
private static final int SLOPE_REACH_NEG_2_ = (-SLOPE_REACH_POS_2_ - 1);
/**
* The difference value range for 3-byters.
*/
private static final int SLOPE_REACH_POS_3_ = SLOPE_LEAD_3_
* SLOPE_TAIL_COUNT_
* SLOPE_TAIL_COUNT_
+ (SLOPE_LEAD_3_ - 1)
* SLOPE_TAIL_COUNT_ +
(SLOPE_TAIL_COUNT_ - 1);
private static final int SLOPE_REACH_NEG_3_ = (-SLOPE_REACH_POS_3_ - 1);
/**
* The difference value range for 3-byters.
*/
private static final int SLOPE_REACH_POS_3_ = SLOPE_LEAD_3_
* SLOPE_TAIL_COUNT_
* SLOPE_TAIL_COUNT_
+ (SLOPE_LEAD_3_ - 1)
* SLOPE_TAIL_COUNT_ +
(SLOPE_TAIL_COUNT_ - 1);
private static final int SLOPE_REACH_NEG_3_ = (-SLOPE_REACH_POS_3_ - 1);
/**
* The lead byte start values.
*/
private static final int SLOPE_START_POS_2_ = SLOPE_MIDDLE_
+ SLOPE_SINGLE_ + 1;
private static final int SLOPE_START_POS_3_ = SLOPE_START_POS_2_
+ SLOPE_LEAD_2_;
private static final int SLOPE_START_NEG_2_ = SLOPE_MIDDLE_ +
SLOPE_REACH_NEG_1_;
private static final int SLOPE_START_NEG_3_ = SLOPE_START_NEG_2_
- SLOPE_LEAD_2_;
// private constructor ---------------------------------------------------
/**
* Constructor private to prevent initialization
*/
private BOSCU()
{
}
/**
* The lead byte start values.
*/
private static final int SLOPE_START_POS_2_ = SLOPE_MIDDLE_
+ SLOPE_SINGLE_ + 1;
private static final int SLOPE_START_POS_3_ = SLOPE_START_POS_2_
+ SLOPE_LEAD_2_;
private static final int SLOPE_START_NEG_2_ = SLOPE_MIDDLE_ +
SLOPE_REACH_NEG_1_;
private static final int SLOPE_START_NEG_3_ = SLOPE_START_NEG_2_
- SLOPE_LEAD_2_;
// private constructor ---------------------------------------------------
/**
* Constructor private to prevent initialization
*/
private BOSCU()
{
}
// private methods -------------------------------------------------------
/**
* Integer division and modulo with negative numerators
* yields negative modulo results and quotients that are one more than
* what we need here.
* @param number which operations are to be performed on
* @param factor the factor to use for division
* @return (result of division) << 32 | modulo
*/
private static final long getNegDivMod(int number, int factor)
{
int modulo = number % factor;
long result = number / factor;
if (modulo < 0) {
-- result;
modulo += factor;
}
return (result << 32) | modulo;
}
/**
* Encode one difference value -0x10ffff..+0x10ffff in 1..3 bytes,
* preserving lexical order
* @param diff
* @param buffer byte buffer to append to
* @param offset to the byte buffer to start appending
* @return end offset where the appending stops
*/
private static final int writeDiff(int diff, byte buffer[], int offset)
{
if (diff >= SLOPE_REACH_NEG_1_) {
if (diff <= SLOPE_REACH_POS_1_) {
buffer[offset ++] = (byte)(SLOPE_MIDDLE_ + diff);
}
else if (diff <= SLOPE_REACH_POS_2_) {
buffer[offset ++] = (byte)(SLOPE_START_POS_2_
+ (diff / SLOPE_TAIL_COUNT_));
buffer[offset ++] = (byte)(SLOPE_MIN_ +
(diff % SLOPE_TAIL_COUNT_));
}
else if (diff <= SLOPE_REACH_POS_3_) {
buffer[offset + 2] = (byte)(SLOPE_MIN_
+ (diff % SLOPE_TAIL_COUNT_));
diff /= SLOPE_TAIL_COUNT_;
buffer[offset + 1] = (byte)(SLOPE_MIN_
+ (diff % SLOPE_TAIL_COUNT_));
buffer[offset] = (byte)(SLOPE_START_POS_3_
+ (diff / SLOPE_TAIL_COUNT_));
offset += 3;
}
else {
buffer[offset + 3] = (byte)(SLOPE_MIN_
+ diff % SLOPE_TAIL_COUNT_);
diff /= SLOPE_TAIL_COUNT_;
buffer[offset] = (byte)(SLOPE_MIN_
+ diff % SLOPE_TAIL_COUNT_);
diff /= SLOPE_TAIL_COUNT_;
buffer[offset + 1] = (byte)(SLOPE_MIN_
+ diff % SLOPE_TAIL_COUNT_);
buffer[offset] = (byte)SLOPE_MAX_;
offset += 4;
}
}
else {
long division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
int modulo = (int)division;
if (diff >= SLOPE_REACH_NEG_2_) {
diff = (int)(division >> 32);
buffer[offset ++] = (byte)(SLOPE_START_NEG_2_ + diff);
buffer[offset ++] = (byte)(SLOPE_MIN_ + modulo);
}
else if (diff >= SLOPE_REACH_NEG_3_) {
buffer[offset + 2] = (byte)(SLOPE_MIN_ + modulo);
diff = (int)(division >> 32);
division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
modulo = (int)division;
diff = (int)(division >> 32);
buffer[offset + 1] = (byte)(SLOPE_MIN_ + modulo);
buffer[offset] = (byte)(SLOPE_START_NEG_3_ + diff);
offset += 3;
}
else {
buffer[offset + 3] = (byte)(SLOPE_MIN_ + modulo);
diff = (int)(division >> 32);
division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
modulo = (int)division;
diff = (int)(division >> 32);
buffer[offset + 2] = (byte)(SLOPE_MIN_ + modulo);
division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
modulo = (int)division;
buffer[offset + 1] = (byte)(SLOPE_MIN_ + modulo);
buffer[offset] = SLOPE_MIN_;
offset += 4;
}
}
return offset;
}
/**
* How many bytes would writeDiff() write?
* @param diff
*/
private static final int lengthOfDiff(int diff)
{
if (diff >= SLOPE_REACH_NEG_1_) {
if (diff <= SLOPE_REACH_POS_1_) {
return 1;
}
else if (diff <= SLOPE_REACH_POS_2_) {
return 2;
}
else if(diff <= SLOPE_REACH_POS_3_) {
return 3;
}
else {
return 4;
}
}
else {
if (diff >= SLOPE_REACH_NEG_2_) {
return 2;
}
else if (diff >= SLOPE_REACH_NEG_3_) {
return 3;
}
else {
return 4;
}
}
}
* Integer division and modulo with negative numerators
* yields negative modulo results and quotients that are one more than
* what we need here.
* @param number which operations are to be performed on
* @param factor the factor to use for division
* @return (result of division) << 32 | modulo
*/
private static final long getNegDivMod(int number, int factor)
{
int modulo = number % factor;
long result = number / factor;
if (modulo < 0) {
-- result;
modulo += factor;
}
return (result << 32) | modulo;
}
/**
* Encode one difference value -0x10ffff..+0x10ffff in 1..3 bytes,
* preserving lexical order
* @param diff
* @param buffer byte buffer to append to
* @param offset to the byte buffer to start appending
* @return end offset where the appending stops
*/
private static final int writeDiff(int diff, byte buffer[], int offset)
{
if (diff >= SLOPE_REACH_NEG_1_) {
if (diff <= SLOPE_REACH_POS_1_) {
buffer[offset ++] = (byte)(SLOPE_MIDDLE_ + diff);
}
else if (diff <= SLOPE_REACH_POS_2_) {
buffer[offset ++] = (byte)(SLOPE_START_POS_2_
+ (diff / SLOPE_TAIL_COUNT_));
buffer[offset ++] = (byte)(SLOPE_MIN_ +
(diff % SLOPE_TAIL_COUNT_));
}
else if (diff <= SLOPE_REACH_POS_3_) {
buffer[offset + 2] = (byte)(SLOPE_MIN_
+ (diff % SLOPE_TAIL_COUNT_));
diff /= SLOPE_TAIL_COUNT_;
buffer[offset + 1] = (byte)(SLOPE_MIN_
+ (diff % SLOPE_TAIL_COUNT_));
buffer[offset] = (byte)(SLOPE_START_POS_3_
+ (diff / SLOPE_TAIL_COUNT_));
offset += 3;
}
else {
buffer[offset + 3] = (byte)(SLOPE_MIN_
+ diff % SLOPE_TAIL_COUNT_);
diff /= SLOPE_TAIL_COUNT_;
buffer[offset] = (byte)(SLOPE_MIN_
+ diff % SLOPE_TAIL_COUNT_);
diff /= SLOPE_TAIL_COUNT_;
buffer[offset + 1] = (byte)(SLOPE_MIN_
+ diff % SLOPE_TAIL_COUNT_);
buffer[offset] = (byte)SLOPE_MAX_;
offset += 4;
}
}
else {
long division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
int modulo = (int)division;
if (diff >= SLOPE_REACH_NEG_2_) {
diff = (int)(division >> 32);
buffer[offset ++] = (byte)(SLOPE_START_NEG_2_ + diff);
buffer[offset ++] = (byte)(SLOPE_MIN_ + modulo);
}
else if (diff >= SLOPE_REACH_NEG_3_) {
buffer[offset + 2] = (byte)(SLOPE_MIN_ + modulo);
diff = (int)(division >> 32);
division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
modulo = (int)division;
diff = (int)(division >> 32);
buffer[offset + 1] = (byte)(SLOPE_MIN_ + modulo);
buffer[offset] = (byte)(SLOPE_START_NEG_3_ + diff);
offset += 3;
}
else {
buffer[offset + 3] = (byte)(SLOPE_MIN_ + modulo);
diff = (int)(division >> 32);
division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
modulo = (int)division;
diff = (int)(division >> 32);
buffer[offset + 2] = (byte)(SLOPE_MIN_ + modulo);
division = getNegDivMod(diff, SLOPE_TAIL_COUNT_);
modulo = (int)division;
buffer[offset + 1] = (byte)(SLOPE_MIN_ + modulo);
buffer[offset] = SLOPE_MIN_;
offset += 4;
}
}
return offset;
}
/**
* How many bytes would writeDiff() write?
* @param diff
*/
private static final int lengthOfDiff(int diff)
{
if (diff >= SLOPE_REACH_NEG_1_) {
if (diff <= SLOPE_REACH_POS_1_) {
return 1;
}
else if (diff <= SLOPE_REACH_POS_2_) {
return 2;
}
else if(diff <= SLOPE_REACH_POS_3_) {
return 3;
}
else {
return 4;
}
}
else {
if (diff >= SLOPE_REACH_NEG_2_) {
return 2;
}
else if (diff >= SLOPE_REACH_NEG_3_) {
return 3;
}
else {
return 4;
}
}
}
}

File diff suppressed because it is too large Load diff

View file

@ -5,8 +5,8 @@
*******************************************************************************
*
* $Source: /xsrl/Nsvn/icu/icu4j/src/com/ibm/icu/text/CollationKey.java,v $
* $Date: 2002/06/21 23:56:44 $
* $Revision: 1.6 $
* $Date: 2002/06/22 07:23:45 $
* $Revision: 1.7 $
*
*******************************************************************************
*/
@ -15,43 +15,49 @@ package com.ibm.icu.text;
import java.util.Arrays;
/**
* <p>
* A <code>CollationKey</code> represents a <code>String</code> under the
* rules of a specific <code>Collator</code> object. Comparing two
* <code>CollationKey</code>s returns the relative order of the
* <code>String</code>s they represent.
* </p>
* <p>
* <code>CollationKey</code> instances can not be create directly. Rather,
* they are generated by calling <code>Collator.getCollationKey(String)</code>.
* Since the rule set of each <code>Collator differs</code>, the sort orders of
* the same string under two unique <code>Collator</code> may not be the same.
* Hence comparing <code>CollationKey</code>s generated from different
* <code>Collator</code> objects may not give the right results.
* </p>
* <p>
* Similar to <code>CollationKey.compareTo(CollationKey)</code>,
* the method <code>RuleBasedCollator.compare(String, String)</code> compares
* two strings and returns the relative order. During the construction
* of a <code>CollationKey</code> object, the entire source string is examined
* and processed into a series of bits that are stored in the
* <code>CollationKey</code> object. Bitwise comparison on the bit sequences
* are then performed during <code>CollationKey.compareTo(CollationKey)</code>.
* This comparison could incurr expensive startup costs while creating
* the <code>CollationKey</code> object, but once the objects are created,
* binary comparisons are fast, and is recommended when the same strings are
* to be compared over and over again.
* On the other hand <code>Collator.compare(String, String)</code> examines
* and processes the string only until the first characters differing in order,
* and is recommend for use if the <code>String</code>s are to be compared only
* once.
* </p>
* <p>
* Details of the composition of the bit sequence is located at
* <a href=http://oss.software.ibm.com/icu/userguide/Collate_ServiceArchitecture.html>
* user guide</a>.
* </p>
* <p>The following example shows how <code>CollationKey</code>s might be used
* <p>A <code>CollationKey</code> represents a <code>String</code>
* under the rules of a specific <code>Collator</code>
* object. Comparing two <code>CollationKey</code>s returns the
* relative order of the <code>String</code>s they represent.</p>
*
* <p><code>CollationKey</code> instances are not created
* directly. Rather, they are generated by calling
* <code>Collator.getCollationKey(String)</code>.</p>
*
* <p>Since the rule set of <code>Collator</code>s can differ, the
* sort orders of the same string under two different
* <code>Collator</code>s might differ. Hence comparing
* <code>CollationKey</code>s generated from different
* <code>Collator</code>s can give incorrect results.</p>
*
* <p>Both the method
* <code>CollationKey.compareTo(CollationKey)</code> and the method
* <code>Collator.compare(String, String)</code> compare two strings
* and returns their relative order. The performance characterictics
* of these two approaches can differ.</p>
*
* <p>During the construction of a <code>CollationKey</code>, the
* entire source string is examined and processed into a series of
* bits that are stored in the <code>CollationKey</code>. When
* <code>CollationKey.compareTo(CollationKey)</code> executes, it
* performs bitwise comparison on the bit sequences. This can incurs
* startup cost when creating the <code>CollationKey</code>, but once
* the key is created, binary comparisons are fast. This approach is
* recommended when the same strings are to be compared over and over
* again.</p>
*
* <p>On the other hand, implementations of
* <code>Collator.compare(String, String)</code> can examine and
* process the strings only until the first characters differing in
* order. This approach is recommended if the strings are to be
* compared only once.</p>
*
* <p>More information about the composition of the bit sequence can
* be found in the
* <a href="http://oss.software.ibm.com/icu/userguide/Collate_ServiceArchitecture.html">
* user guide</a>.</p>
*
* <p>The following example shows how <code>CollationKey</code>s can be used
* to sort a list of <code>String</code>s.</p>
* <blockquote>
* <pre>
@ -82,16 +88,16 @@ import java.util.Arrays;
* @see RuleBasedCollator
* @author Syn Wee Quek
* @since release 2.2, April 18 2002
* @draft 2.2
* @draft 2.2
*/
public final class CollationKey implements Comparable
{
// public methods -------------------------------------------------------
// public methods -------------------------------------------------------
// public getters -------------------------------------------------------
// public getters -------------------------------------------------------
/**
* Returns the source string that this CollationKey represents.
* Return the source string that this CollationKey represents.
* @return source string that this CollationKey represents
* @draft 2.2
*/
@ -101,20 +107,19 @@ public final class CollationKey implements Comparable
}
/**
* <p>
* Duplicates and returns the value of this CollationKey as a sequence
* of big-endian bytes terminated by a null.
* </p>
* <p>
* If two CollationKeys could be legitimately compared, then one could
* compare the byte arrays of each to obtain the same result.
* <p>Duplicates and returns the value of this CollationKey as a sequence
* of big-endian bytes terminated by a null.</p>
*
* <p>If two CollationKeys can be legitimately compared, then one can
* compare the byte arrays of each to obtain the same result, e.g.
* <pre>
* byte key1[] = collationkey1.toByteArray();
* byte key2[] = collationkey2.toByteArray();
* int key, targetkey;
* int i = 0;
* while (key1[i] != 0 && key2[i] != 0) {
* int key = key1[i] & 0xFF;
* int targetkey = key2[i] & 0xFF;
* do {
* key = key1[i] & 0xFF;
* targetkey = key2[i] & 0xFF;
* if (key &lt; targetkey) {
* System.out.println("String 1 is less than string 2");
* return;
@ -123,18 +128,9 @@ public final class CollationKey implements Comparable
* System.out.println("String 1 is more than string 2");
* }
* i ++;
* }
* int key = key1[i] & 0xFF;
* int targetkey = key2[i] & 0xFF;
* if (key &lt; targetkey) {
* System.out.println("String 1 is less than string 2");
* return;
* }
* if (targetkey &lt; key) {
* System.out.println("String 1 is more than string 2");
* return;
* }
* System.out.println("String 1 is equals to string 2");;
* } while (key != 0 && targetKey != 0);
*
* System.out.println("Strings are equal.");
* </pre>
* </p>
* @return CollationKey value in a sequence of big-endian byte bytes
@ -145,10 +141,10 @@ public final class CollationKey implements Comparable
{
int length = 0;
while (true) {
if (m_key_[length] == 0) {
break;
}
length ++;
if (m_key_[length] == 0) {
break;
}
length ++;
}
length ++;
byte result[] = new byte[length];
@ -156,94 +152,88 @@ public final class CollationKey implements Comparable
return result;
}
// public other methods -------------------------------------------------
// public other methods -------------------------------------------------
/**
* <p>
* Compare this CollationKey to the argument target CollationKey.
* The collation
* rules of the Collator object which created these keys are applied.
* </p>
* <p>
* <strong>Note:</strong> Comparison between CollationKeys created by
* different Collators may not return the correct result. See class
* documentation.
* </p>
* <p>Compare this CollationKey to another CollationKey. The
* collation rules of the Collator that created this key are
* applied.</p>
*
* <p><strong>Note:</strong> Comparison between CollationKeys
* created by different Collators might return incorrect
* results. See class documentation.</p>
*
* @param target target CollationKey
* @return an integer value, if value is less than zero this CollationKey
* is less than than target, if value is zero if they are equal
* and value is greater than zero if this CollationKey is greater
* @return an integer value. If the value is less than zero this CollationKey
* is less than than target, if the value is zero they are equal, and
* if the value is greater than zero this CollationKey is greater
* than target.
* @exception NullPointerException thrown when argument is null.
* @exception NullPointerException is thrown if argument is null.
* @see Collator#compare(String, String)
* @draft 2.2
*/
* @draft 2.2 */
public int compareTo(CollationKey target)
{
int i = 0;
while (m_key_[i] != 0 && target.m_key_[i] != 0) {
int key = m_key_[i] & 0xFF;
int targetkey = target.m_key_[i] & 0xFF;
if (key < targetkey) {
return -1;
}
if (targetkey < key) {
return 1;
}
i ++;
int key = m_key_[i] & 0xFF;
int targetkey = target.m_key_[i] & 0xFF;
if (key < targetkey) {
return -1;
}
if (targetkey < key) {
return 1;
}
i ++;
}
// last comparison if we encounter a 0
int key = m_key_[i] & 0xFF;
int targetkey = target.m_key_[i] & 0xFF;
if (key < targetkey) {
return -1;
return -1;
}
if (targetkey < key) {
return 1;
return 1;
}
return 0;
}
/**
* <p>
* Compares this CollationKey with the specified Object.
* The collation
* rules of the Collator object which created these objects are applied.
* </p>
* <p>
* See note in compareTo(CollationKey) for warnings of incorrect results
* </p>
* @param obj the Object to be compared.
* <p>Compare this CollationKey with the specified Object. The
* collation rules of the Collator that created this key are
* applied.</p>
*
* <p>See note in compareTo(CollationKey) for warnings about possible
* incorrect results.</p>
*
* @param obj the Object to be compared to.
* @return Returns a negative integer, zero, or a positive integer
* respectively if this CollationKey is less than, equal to, or
* greater than the given Object.
* @exception ClassCastException thrown when the specified argument is not
* a CollationKey. NullPointerException thrown when argument
* @exception ClassCastException is thrown when the argument is not
* a CollationKey. NullPointerException is thrown when the argument
* is null.
* @see #compareTo(CollationKey)
* @draft 2.2
*/
* @draft 2.2 */
public int compareTo(Object obj)
{
return compareTo((CollationKey)obj);
return compareTo((CollationKey)obj);
}
/**
* <p>
* Compare this CollationKey and the argument target object for equality.
* The collation
* rules of the Collator object which created these objects are applied.
* </p>
* <p>
* See note in compareTo(CollationKey) for warnings of incorrect results
* </p>
* <p>Compare this CollationKey and the specified Object for
* equality. The collation rules of the Collator that created
* this key are applied.</p>
*
* <p>See note in compareTo(CollationKey) for warnings about
* possible incorrect results.</p>
*
* @param target the object to compare to.
* @return true if two objects are equal, false otherwise.
* @return true if the two keys compare as equal, false otherwise.
* @see #compareTo(CollationKey)
* @exception ClassCastException thrown when the specified argument is not
* a CollationKey. NullPointerException thrown when argument
* @exception ClassCastException is thrown when the argument is not
* a CollationKey. NullPointerException is thrown when the argument
* is null.
* @draft 2.2
* @draft 2.2
*/
public boolean equals(Object target)
{
@ -266,13 +256,13 @@ public final class CollationKey implements Comparable
* </p>
* @param target the CollationKey to compare to.
* @return true if two objects are equal, false otherwise.
* @exception NullPointerException thrown when argument is null.
* @exception NullPointerException is thrown when the argument is null.
* @draft 2.2
*/
public boolean equals(CollationKey target)
{
if (this == target) {
return true;
return true;
}
if (target == null) {
return false;
@ -280,20 +270,19 @@ public final class CollationKey implements Comparable
CollationKey other = (CollationKey)target;
int i = 0;
while (true) {
if (m_key_[i] != other.m_key_[i]) {
return false;
}
if (m_key_[i] == 0) {
break;
}
i ++;
if (m_key_[i] != other.m_key_[i]) {
return false;
}
if (m_key_[i] == 0) {
break;
}
i ++;
}
return true;
}
/**
* <p>
* Creates a hash code for this CollationKey. The hash value is calculated
* <p>Returns a hash code for this CollationKey. The hash value is calculated
* on the key itself, not the String from which the key was created. Thus
* if x and y are CollationKeys, then x.hashCode(x) == y.hashCode()
* if x.equals(y) is true. This allows language-sensitive comparison in a
@ -305,25 +294,25 @@ public final class CollationKey implements Comparable
public int hashCode()
{
if (m_hashCode_ == 0) {
int size = m_key_.length >> 1;
StringBuffer key = new StringBuffer(size);
int i = 0;
while (m_key_[i] != 0 && m_key_[i + 1] != 0) {
key.append((char)((m_key_[i] << 8) | m_key_[i + 1]));
i += 2;
}
if (m_key_[i] != 0) {
key.append((char)(m_key_[i] << 8));
}
m_hashCode_ = key.toString().hashCode();
int size = m_key_.length >> 1;
StringBuffer key = new StringBuffer(size);
int i = 0;
while (m_key_[i] != 0 && m_key_[i + 1] != 0) {
key.append((char)((m_key_[i] << 8) | m_key_[i + 1]));
i += 2;
}
if (m_key_[i] != 0) {
key.append((char)(m_key_[i] << 8));
}
m_hashCode_ = key.toString().hashCode();
}
return m_hashCode_;
}
// protected constructor ------------------------------------------------
// protected constructor ------------------------------------------------
/**
* Protected CollationKey can only be generated by Collator objects
* CollationKey can only be generated by Collator objects
* @param source string the CollationKey represents
* @param key sort key array of bytes
* @param size of sort key
@ -336,18 +325,20 @@ public final class CollationKey implements Comparable
m_hashCode_ = 0;
}
// private data members -------------------------------------------------
// private data members -------------------------------------------------
/**
* Source string this CollationKey represents
*/
/**
* Source string this CollationKey represents
*/
private String m_source_;
/**
* Sequence of bytes that represents the sort key
*/
private byte m_key_[];
/**
* Hash code for the key
*/
private int m_hashCode_;
}
}

View file

@ -5,8 +5,8 @@
*******************************************************************************
*
* $Source: /xsrl/Nsvn/icu/icu4j/src/com/ibm/icu/text/Collator.java,v $
* $Date: 2002/06/21 23:56:44 $
* $Revision: 1.7 $
* $Date: 2002/06/22 07:23:45 $
* $Revision: 1.8 $
*
*******************************************************************************
*/
@ -15,18 +15,16 @@ package com.ibm.icu.text;
import java.util.Locale;
/**
* <p>
* Collator is an abstract base class, its subclasses performs
* locale-sensitive String comparison. A concrete subclass, RuleBasedCollator,
* is provided and it allows customization of the collation ordering by the use
* of rule sets.
* </p>
* <p>
* Following the
* <a href=http://www.unicode.org>Unicode Consortium</a>'s specifications for
* the <a href=http://www.unicode.org/unicode/reports/tr10/>
* Unicode Collation Algorithm (UCA)</a>, there are
* 5 different levels of strength used in comparisons.
* <p>Collator performs locale-sensitive string comparison. A concrete
* subclass, RuleBasedCollator, allows customization of the collation
* ordering by the use of rule sets.</p>
*
* <p>Following the <a href=http://www.unicode.org>Unicode
* Consortium</a>'s specifications for the
* <a href="http://www.unicode.org/unicode/reports/tr10/"> Unicode Collation
* Algorithm (UCA)</a>, there are 5 different levels of strength used
* in comparisons:
*
* <ul>
* <li>PRIMARY strength: Typically, this is used to denote differences between
* base characters (for example, "a" &lt; "b").
@ -60,11 +58,12 @@ import java.util.Locale;
* are compared, just in case there is no difference.
* For example, Hebrew cantellation marks are only distinguished at this
* strength. This strength should be used sparingly, as only code point
* values differences between two strings is an extremely rare occurrence.
* value differences between two strings is an extremely rare occurrence.
* Using this strength substantially decreases the performance for both
* comparison and collation key generation APIs. This strength also
* increases the size of the collation key.
* </ul>
*
* Unlike the JDK, ICU4J's Collator deals only with 2 decomposition modes,
* the canonical decomposition mode and one that does not use any decomposition.
* The compatibility decomposition mode, java.text.Collator.FULL_DECOMPOSITION
@ -73,15 +72,13 @@ import java.util.Locale;
* producing the same results as if the text were normalized in NFD. If
* canonical decomposition is turned off, it is the user's responsibility to
* ensure that all text is already in the appropriate form before performing
* a comparison or before getting a CollationKey.
* </p>
* <p>
* For more information about the collation service see the
* a comparison or before getting a CollationKey.</p>
*
* <p>For more information about the collation service see the
* <a href="http://oss.software.ibm.com/icu/userguide/Collate_Intro.html">users
* guide</a>.
* </p>
* <p>
* Examples of use
* guide</a>.</p>
*
* <p>Examples of use
* <pre>
* // Get the Collator for US English and set its strength to PRIMARY
* Collator usCollator = Collator.getInstance(Locale.US);
@ -90,8 +87,9 @@ import java.util.Locale;
* System.out.println("Strings are equivalent");
* }
*
* The following example shows how to compare two strings using the Collator
* for the default locale.
* The following example shows how to compare two strings using the
* Collator for the default locale.
*
* // Compare two strings in the default locale
* Collator myCollator = Collator.getInstance();
* myCollator.setDecomposition(NO_DECOMPOSITION);
@ -114,22 +112,21 @@ import java.util.Locale;
* @see CollationKey
* @author Syn Wee Quek
* @since release 2.2, April 18 2002
* @draft 2.2
* @draft 2.2
*/
public abstract class Collator
{
// public data members ---------------------------------------------------
/**
* Strongest collator strength value. Typically, used to denote differences
* between base characters.
* See class documentation for more explanation.
// public data members ---------------------------------------------------
/**
* Strongest collator strength value. Typically used to denote differences
* between base characters. See class documentation for more explanation.
* @see #setStrength
* @see #getStrength
* @draft 2.2
*/
public final static int PRIMARY = 0;
/**
* Second level collator strength value.
* Accents in the characters are considered secondary differences.
@ -141,6 +138,7 @@ public abstract class Collator
* @draft 2.2
*/
public final static int SECONDARY = 1;
/**
* Third level collator strength value.
* Upper and lower case differences in characters are distinguished at this
@ -152,19 +150,21 @@ public abstract class Collator
* @draft 2.2
*/
public final static int TERTIARY = 2;
/**
* Fourth level collator strength value.
* When punctuation is ignored
* <a href=http://www-124.ibm.com/icu/userguide/Collate_Concepts.html#Ignoring_Punctuation>
* <a href="http://www-124.ibm.com/icu/userguide/Collate_Concepts.html#Ignoring_Punctuation">
* (see Ignoring Punctuations in the user guide)</a> at PRIMARY to TERTIARY
* strength, an additional strength level can
* be used to distinguish words with and without punctuation
* be used to distinguish words with and without punctuation.
* See class documentation for more explanation.
* @see #setStrength
* @see #getStrength
* @draft 2.2
*/
public final static int QUATERNARY = 3;
/**
* <p>
* Smallest Collator strength value. When all other strengths are equal,
@ -181,36 +181,32 @@ public abstract class Collator
public final static int IDENTICAL = 15;
/**
* <p>
* Decomposition mode value. With NO_DECOMPOSITION set, Strings will not be
* decomposed for collation. This is the default
* decomposition setting unless otherwise specified by the locale used
* to create the Collator.
* </p>
* <p>
* Note this value is different from JDK's
* </p>
* <p>Decomposition mode value. With NO_DECOMPOSITION set, Strings
* will not be decomposed for collation. This is the default
* decomposition setting unless otherwise specified by the locale
* used to create the Collator.</p>
*
* <p><strong>Note</strong> this value is different from the JDK's.</p>
* @see #CANONICAL_DECOMPOSITION
* @see #getDecomposition
* @see #setDecomposition
* @draft 2.2
* @draft 2.2
*/
public final static int NO_DECOMPOSITION = 16;
/**
* <p>
* Decomposition mode value. With CANONICAL_DECOMPOSITION set,
* characters that are canonical variants according to Unicode 2.0 will be
* decomposed for collation.
* </p>
* <p>
* CANONICAL_DECOMPOSITION corresponds to Normalization Form D as
* <p>Decomposition mode value. With CANONICAL_DECOMPOSITION set,
* characters that are canonical variants according to Unicode 2.0
* will be decomposed for collation.</p>
*
* <p>CANONICAL_DECOMPOSITION corresponds to Normalization Form D as
* described in <a href="http://www.unicode.org/unicode/reports/tr15/">
* Unicode Technical Report #15</a>.
* </p>
* @see #NO_DECOMPOSITION
* @see #getDecomposition
* @see #setDecomposition
* @draft 2.2
* @draft 2.2
*/
public final static int CANONICAL_DECOMPOSITION = 1;
@ -219,25 +215,23 @@ public abstract class Collator
// public setters --------------------------------------------------------
/**
* <p>
* Sets this Collator's strength property. The strength property
* <p>Sets this Collator's strength property. The strength property
* determines the minimum level of difference considered significant
* during comparison.
* </p>
* <p>
* The default strength for the Collator is TERTIARY, unless specified
* otherwise by the locale used to create the Collator.
* </p>
* during comparison.</p>
*
* <p>The default strength for the Collator is TERTIARY, unless specified
* otherwise by the locale used to create the Collator.</p>
*
* <p>See the Collator class description for an example of use.</p>
* @param the new strength value.
* @param new Strength the new strength value.
* @see #getStrength
* @see #PRIMARY
* @see #SECONDARY
* @see #TERTIARY
* @see #QUATERNARY
* @see #IDENTICAL
* @exception IllegalArgumentException If the new strength value is not one
* of PRIMARY, SECONDARY, TERTIARY, QUATERNARY or IDENTICAL.
* @exception IllegalArgumentException if the new strength value is not one
* of PRIMARY, SECONDARY, TERTIARY, QUATERNARY or IDENTICAL.
* @draft 2.2
*/
public void setStrength(int newStrength)
@ -253,35 +247,34 @@ public abstract class Collator
}
/**
* <p>
* Set the decomposition mode of this Collator.
* Setting this decomposition property with CANONICAL_DECOMPOSITION allows
* the Collator to handle
* un-normalized text properly, producing the same results as if the text
* were normalized. If NO_DECOMPOSITION is set, it is the user's
* responsibility to insure that all text is already in the appropriate
* form before a comparison or before getting a CollationKey. Adjusting
* decomposition mode allows the user to select between faster and more
* complete collation behavior.
* </p>
* <p>
* Since a great majority of the world languages does not require text
* normalization, most locales has NO_DECOMPOSITION has the default
* decomposition mode.
* <p>
* The default decompositon mode for the Collator is NO_DECOMPOSITON,
* unless specified otherwise by the locale used to create the Collator.
* </p>
* <p>
* See getDecomposition for a description of decomposition mode.
* </p>
* <p>Set the decomposition mode of this Collator. Setting this
* decomposition property with CANONICAL_DECOMPOSITION allows the
* Collator to handle un-normalized text properly, producing the
* same results as if the text were normalized. If
* NO_DECOMPOSITION is set, it is the user's responsibility to
* insure that all text is already in the appropriate form before
* a comparison or before getting a CollationKey. Adjusting
* decomposition mode allows the user to select between faster and
* more complete collation behavior.</p>
*
* <p>Since a great many of the world's languages do not require
* text normalization, most locales set NO_DECOMPOSITION as the
* default decomposition mode.</p>
*
* The default decompositon mode for the Collator is
* NO_DECOMPOSITON, unless specified otherwise by the locale used
* to create the Collator.</p>
*
* <p>See getDecomposition for a description of decomposition
* mode.</p>
*
* @param decomposition the new decomposition mode
* @see #getDecomposition
* @see #NO_DECOMPOSITION
* @see #CANONICAL_DECOMPOSITION
* @exception IllegalArgumentException If the given value is not a valid
* decomposition mode.
* @draft 2.2
* @draft 2.2
*/
public void setDecomposition(int decomposition)
{
@ -324,17 +317,16 @@ public abstract class Collator
*/
public static final Collator getInstance(Locale locale)
{
try {
return new RuleBasedCollator(locale);
}
catch(Exception e) {
return RuleBasedCollator.UCA_;
}
try {
return new RuleBasedCollator(locale);
}
catch(Exception e) {
return RuleBasedCollator.UCA_;
}
}
/**
* <p>
* Returns this Collator's strength property. The strength property
* <p>Returns this Collator's strength property. The strength property
* determines the minimum level of difference considered significant.
* </p>
* <p>
@ -376,12 +368,12 @@ public abstract class Collator
// public other methods -------------------------------------------------
/**
* Convenience method for comparing the equality of two text Strings based
* on this Collator's collation rules, strength and decomposition mode.
* @param source the source string to be compared with.
* @param target the target string to be compared with.
* Convenience method for comparing the equality of two text Strings using
* this Collator's rules, strength and decomposition mode.
* @param source the source string to be compared.
* @param target the target string to be compared.
* @return true if the strings are equal according to the collation
* rules. false, otherwise.
* rules, otherwise false.
* @see #compare
* @exception NullPointerException thrown if either arguments is null.
* @draft 2.2
@ -412,7 +404,7 @@ public abstract class Collator
/**
* <p>
* Compares the source text String to the target text String according to
* the collation rules, strength and decomposition mode for this Collator.
* this Collator's rules, strength and decomposition mode.
* Returns an integer less than,
* equal to or greater than zero depending on whether the source String is
* less than, equal to or greater than the target String. See the Collator
@ -432,8 +424,8 @@ public abstract class Collator
/**
* <p>
* Transforms the String into a series of bits that can be compared
* bitwise to other CollationKeys. Bits generated depends on the collation
* Transforms the String into a CollationKey suitable for efficient
* repeated comparison. The resulting key depends on the collator's
* rules, strength and decomposition mode.
* </p>
* <p>See the CollationKey class documentation for more information.</p>
@ -448,7 +440,6 @@ public abstract class Collator
public abstract CollationKey getCollationKey(String source);
// protected constructor -------------------------------------------------
// private data members --------------------------------------------------
@ -456,6 +447,7 @@ public abstract class Collator
* Collation strength
*/
private int m_strength_ = TERTIARY;
/**
* Decomposition mode
*/