mirror of
https://github.com/unicode-org/icu.git
synced 2025-04-07 22:44:49 +00:00
ICU-12646 Syncing spoof data binary file format description with icu4c.
X-SVN-Rev: 39361
This commit is contained in:
parent
09ad3d8e4b
commit
a1b7d39c3b
1 changed files with 27 additions and 14 deletions
|
@ -1670,30 +1670,43 @@ public class SpoofChecker {
|
|||
|
||||
private static Normalizer2 nfdNormalizer = Normalizer2.getNFDInstance();
|
||||
|
||||
// Confusable Mappings Data Structures
|
||||
// Confusable Mappings Data Structures, version 2.0
|
||||
//
|
||||
// This description and the corresponding implementation are to be kept
|
||||
// in-sync with the copy in icu4c uspoof_impl.h.
|
||||
//
|
||||
// For the confusable data, we are essentially implementing a map,
|
||||
// key: a code point
|
||||
// value: a string. Most commonly one char in length, but can be more.
|
||||
// key: a code point
|
||||
// value: a string. Most commonly one char in length, but can be more.
|
||||
//
|
||||
// The keys are stored as a sorted array of 32 bit ints.
|
||||
// bits 0-23 a code point value
|
||||
// bits 24-31 length of value string, in UChars (between 1 and 256 UChars).
|
||||
// The key table is sorted in ascending code point order. (not on the
|
||||
// 32 bit int value, the flag bits do not participate in the sorting.)
|
||||
// bits 0-23 a code point value
|
||||
// bits 24-31 length of value string, in UChars (between 1 and 256 UChars).
|
||||
// The key table is sorted in ascending code point order. (not on the
|
||||
// 32 bit int value, the flag bits do not participate in the sorting.)
|
||||
//
|
||||
// Lookup is done by means of a binary search in the key table.
|
||||
// Lookup is done by means of a binary search in the key table.
|
||||
//
|
||||
// The corresponding values are kept in a parallel array of 16 bit ints.
|
||||
// If the value string is of length 1, it is literally in the value array.
|
||||
// For longer strings, the value array contains an index into the strings
|
||||
// table.
|
||||
// If the value string is of length 1, it is literally in the value array.
|
||||
// For longer strings, the value array contains an index into the strings
|
||||
// table.
|
||||
//
|
||||
// String Table:
|
||||
// The strings table contains all of the value strings (those of length two or greater)
|
||||
// concatentated together into one long char (UTF-16) array.
|
||||
// The strings table contains all of the value strings (those of length two or greater)
|
||||
// concatentated together into one long char (UTF-16) array.
|
||||
//
|
||||
// There is no nul character or other mark between adjacent strings.
|
||||
//
|
||||
//----------------------------------------------------------------------------
|
||||
//
|
||||
// Changes from format version 1 to format version 2:
|
||||
// 1) Removal of the whole-script confusable data tables.
|
||||
// 2) Removal of the SL/SA/ML/MA and multi-table flags in the key bitmask.
|
||||
// 3) Expansion of string length value in the key bitmask from 2 bits to 8 bits.
|
||||
// 4) Removal of the string lengths table since 8 bits is sufficient for the
|
||||
// lengths of all entries in confusables.txt.
|
||||
//
|
||||
// There is no nul character or other mark between adjacent strings.
|
||||
private static final class ConfusableDataUtils {
|
||||
public static final int FORMAT_VERSION = 2; // version for ICU 58
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue