mirror of
https://github.com/unicode-org/icu.git
synced 2025-04-21 12:40:02 +00:00
ICU-22332 document advanced usage of bidi confusability
Co-authored-by: Markus Scherer <markus.icu@gmail.com>
This commit is contained in:
parent
aa377d7366
commit
9ed8532d2d
2 changed files with 93 additions and 1 deletions
|
@ -89,7 +89,7 @@
|
|||
* \code{.c}
|
||||
* UErrorCode status = U_ZERO_ERROR;
|
||||
* // These strings look identical when rendered in a left-to-right context.
|
||||
* // They look distinct in a left-to-right context.
|
||||
* // They look distinct in a right-to-left context.
|
||||
* UChar* str1 = (UChar*) u"A1\u05D0"; // A1א
|
||||
* UChar* str2 = (UChar*) u"A\u05D01"; // Aא1
|
||||
*
|
||||
|
@ -359,6 +359,51 @@
|
|||
* COMMON or INHERITED, such as numbers and punctuation, are ignored when computing whether a string has multiple
|
||||
* scripts.
|
||||
*
|
||||
* <h2>Advanced bidirectional usage</h2>
|
||||
* If the paragraph direction with which the identifiers will be displayed is not known, there are
|
||||
* multiple options for confusable detection depending on the circumstances.
|
||||
*
|
||||
* <p>
|
||||
* In some circumstances, the only concern is confusion between identifiers displayed with the same
|
||||
* paragraph direction.
|
||||
*
|
||||
* <p>
|
||||
* An example is the case where identifiers are usernames prefixed with the @ symbol.
|
||||
* That symbol will appear to the left in a left-to-right context, and to the right in a
|
||||
* right-to-left context, so that an identifier displayed in a left-to-right context can never be
|
||||
* confused with an identifier displayed in a right-to-left context:
|
||||
* <ul>
|
||||
* <li>
|
||||
* The usernames "A1א" (A one aleph) and "Aא1" (A aleph 1)
|
||||
* would be considered confusable, since they both appear as \@A1א in a left-to-right context, and the
|
||||
* usernames "אA_1" (aleph A underscore one) and "א1_A" (aleph one underscore A) would be considered
|
||||
* confusable, since they both appear as A_1א@ in a right-to-left context.
|
||||
* </li>
|
||||
* <li>
|
||||
* The username "Mark_" would not be considered confusable with the username "_Mark",
|
||||
* even though the latter would appear as Mark_@ in a right-to-left context, and the
|
||||
* former as \@Mark_ in a left-to-right context.
|
||||
* </li>
|
||||
* </ul>
|
||||
* <p>
|
||||
* In that case, the caller should check for both LTR-confusability and RTL-confusability:
|
||||
*
|
||||
* \code{.cpp}
|
||||
* bool confusableInEitherDirection =
|
||||
* uspoof_areBidiConfusableUnicodeString(sc, UBIDI_LTR, id1, id2, &status) ||
|
||||
* uspoof_areBidiConfusableUnicodeString(sc, UBIDI_RTL, id1, id2, &status);
|
||||
* \endcode
|
||||
*
|
||||
* If the bidiSkeleton is used, the LTR and RTL skeleta should be kept separately and compared, LTR
|
||||
* with LTR and RTL with RTL.
|
||||
*
|
||||
* <p>
|
||||
* In cases where confusability between the visual appearances of an identifier displayed in a
|
||||
* left-to-right context with another identifier displayed in a right-to-left context is a concern,
|
||||
* the LTR skeleton of one can be compared with the RTL skeleton of the other. However, this
|
||||
* very broad definition of confusability may have unexpected results; for instance, it treats the
|
||||
* ASCII identifiers "Mark_" and "_Mark" as confusable.
|
||||
*
|
||||
* <h2>Additional Information</h2>
|
||||
*
|
||||
* A <code>USpoofChecker</code> instance may be used repeatedly to perform checks on any number of identifiers.
|
||||
|
|
|
@ -214,6 +214,53 @@ import com.ibm.icu.util.ULocale;
|
|||
* COMMON or INHERITED, such as numbers and punctuation, are ignored when computing whether a string has multiple
|
||||
* scripts.
|
||||
*
|
||||
* <h2>Advanced bidirectional usage</h2>
|
||||
* If the paragraph direction with which the identifiers will be displayed is not known, there are
|
||||
* multiple options for confusable detection depending on the circumstances.
|
||||
*
|
||||
* <p>
|
||||
* In some circumstances, the only concern is confusion between identifiers displayed with the same
|
||||
* paragraph direction.
|
||||
*
|
||||
* <p>
|
||||
* An example is the case where identifiers are usernames prefixed with the @ symbol.
|
||||
* That symbol will appear to the left in a left-to-right context, and to the right in a
|
||||
* right-to-left context, so that an identifier displayed in a left-to-right context can never be
|
||||
* confused with an identifier displayed in a right-to-left context:
|
||||
* <ul>
|
||||
* <li>
|
||||
* The usernames "A1א" (A one aleph) and "Aא1" (A aleph 1)
|
||||
* would be considered confusable, since they both appear as @A1א in a left-to-right context, and the
|
||||
* usernames "אA_1" (aleph A underscore one) and "א1_A" (aleph one underscore A) would be considered
|
||||
* confusable, since they both appear as A_1א@ in a right-to-left context.
|
||||
* </li>
|
||||
* <li>
|
||||
* The username "Mark_" would not be considered confusable with the username "_Mark",
|
||||
* even though the latter would appear as Mark_@ in a right-to-left context, and the
|
||||
* former as @Mark_ in a left-to-right context.
|
||||
* </li>
|
||||
* </ul>
|
||||
* <p>
|
||||
* In that case, the caller should check for both LTR-confusability and RTL-confusability:
|
||||
*
|
||||
* <pre>
|
||||
* <code>
|
||||
* boolean confusableInEitherDirection =
|
||||
* sc.areConfusable(Bidi.DIRECTION_LEFT_TO_RIGHT, id1, id2) ||
|
||||
* sc.areConfusable(Bidi.DIRECTION_RIGHT_TO_LEFT, id1, id2);
|
||||
* </code>
|
||||
* </pre>
|
||||
*
|
||||
* If the bidiSkeleton is used, the LTR and RTL skeleta should be kept separately and compared, LTR
|
||||
* with LTR and RTL with RTL.
|
||||
*
|
||||
* <p>
|
||||
* In cases where confusability between the visual appearances of an identifier displayed in a
|
||||
* left-to-right context with another identifier displayed in a right-to-left context is a concern,
|
||||
* the LTR skeleton of one can be compared with the RTL skeleton of the other. However, this
|
||||
* very broad definition of confusability may have unexpected results; for instance, it treats the
|
||||
* ASCII identifiers "Mark_" and "_Mark" as confusable.
|
||||
*
|
||||
* <h2>Additional Information</h2>
|
||||
*
|
||||
* <p>
|
||||
|
|
Loading…
Add table
Reference in a new issue