mirror of
https://github.com/unicode-org/icu.git
synced 2025-04-06 22:15:31 +00:00
ICU-8112 moved by srl, incorrectly filed under #7548: spoof api docs cleanup
X-SVN-Rev: 27695
This commit is contained in:
parent
ac780b9f96
commit
cae10acab5
1 changed files with 46 additions and 33 deletions
|
@ -56,7 +56,7 @@ U_NAMESPACE_USE
|
|||
* from these Unicode documents.
|
||||
*
|
||||
* The tests available on identifiers fall into two general categories:
|
||||
* -# Single identier tests. Check whether an identifier is
|
||||
* -# Single identifier tests. Check whether an identifier is
|
||||
* potentially confusable with any other string, or is suspicious
|
||||
* for other reasons.
|
||||
* -# Two identifier tests. Check whether two specific identifiers are confusable.
|
||||
|
@ -70,7 +70,7 @@ U_NAMESPACE_USE
|
|||
* -# Perform the checks using the pre-configured USpoofChecker. The results indicate
|
||||
* which (if any) of the selected tests have identified possible problems with the identifier.
|
||||
* Results are reported as a set of USpoofChecks flags; this mirrors the form in which
|
||||
* the set of tests to perform was originally specified tothe USpoofChecker.
|
||||
* the set of tests to perform was originally specified to the USpoofChecker.
|
||||
*
|
||||
* A USpoofChecker may be used repeatedly to perform checks on any number of identifiers.
|
||||
*
|
||||
|
@ -88,19 +88,19 @@ U_NAMESPACE_USE
|
|||
* When testing whether pairs of identifiers are confusable, with the uspoof_areConfusable()
|
||||
* family of functions, the relevant tests are
|
||||
*
|
||||
* -# USPOOF_SINGLE_SCRIPT_CONFUSABLE: All of the characters from the two idenifiers are
|
||||
* -# USPOOF_SINGLE_SCRIPT_CONFUSABLE: All of the characters from the two identifiers are
|
||||
* from a single script, and the two identifiers are visually confusable.
|
||||
* -# USPOOF_MIXED_SCRIPT_CONFUSABLE: At least one of the identifiers contains characters
|
||||
* from more than one script, and the two identifiers are visually confusable.
|
||||
* -# USPOOF_WHOLE_SCRIPT_CONFUSABLE: Each of the two idenifiers is of a single script, but
|
||||
* the the two identifiers are from different scripts, and they are visually confusable.
|
||||
* -# USPOOF_WHOLE_SCRIPT_CONFUSABLE: Each of the two identifiers is of a single script, but
|
||||
* the two identifiers are from different scripts, and they are visually confusable.
|
||||
*
|
||||
* The safest approach is to enable all three of these checks as a group.
|
||||
*
|
||||
* USPOOF_ANY_CASE is a modifier for the above tests. If the identifiers being checked can
|
||||
* be of mixed case and are used in a case-sensitive manner, this option should be specified.
|
||||
*
|
||||
* If the identiers being checked are used in a case-insensitive manner, and if they are
|
||||
* If the identifiers being checked are used in a case-insensitive manner, and if they are
|
||||
* displayed to users in lower-case form only, the USPOOF_ANY_CASE option should not be
|
||||
* specified. Confusabality issues involving upper case letters will not be reported.
|
||||
*
|
||||
|
@ -108,10 +108,10 @@ U_NAMESPACE_USE
|
|||
* the relevant tests are:
|
||||
*
|
||||
* -# USPOOF_MIXED_SCRIPT_CONFUSABLE: the identifier contains characters from multiple
|
||||
* scripts, and there exists an identier of a single script that is visually confusable.
|
||||
* scripts, and there exists an identifier of a single script that is visually confusable.
|
||||
* -# USPOOF_WHOLE_SCRIPT_CONFUSABLE: the identifier consists of characters from a single
|
||||
* script, and there exists a visually confusable identifier.
|
||||
* The visally confusable identifier also consists of characters from a single script.
|
||||
* The visually confusable identifier also consists of characters from a single script.
|
||||
* but not the same script as the identifier being checked.
|
||||
* -# USPOOF_ANY_CASE: modifies the mixed script and whole script confusables tests. If
|
||||
* specified, the checks will confusable characters of any case. If this flag is not
|
||||
|
@ -121,7 +121,7 @@ U_NAMESPACE_USE
|
|||
* This is not a test for confusable identifiers
|
||||
* -# USPOOF_INVISIBLE: check an identifier for the presence of invisible characters,
|
||||
* such as zero-width spaces, or character sequences that are
|
||||
* likely not to display, such as multiple occurences of the same
|
||||
* likely not to display, such as multiple occurrences of the same
|
||||
* non-spacing mark. This check does not test the input string as a whole
|
||||
* for conformance to any particular syntax for identifiers.
|
||||
* -# USPOOF_CHAR_LIMIT: check that an identifier contains only characters from a specified set
|
||||
|
@ -129,10 +129,23 @@ U_NAMESPACE_USE
|
|||
* uspoof_setAllowedLocales().
|
||||
*
|
||||
* Note on Scripts:
|
||||
* Characters from the Unicode Scripts "Common" and "Inherited" are ignored when consdering
|
||||
* Characters from the Unicode Scripts "Common" and "Inherited" are ignored when considering
|
||||
* the script of an identifier. Common characters include digits and symbols that
|
||||
* are normally used with text from more than one script.
|
||||
*
|
||||
* Identifier Skeletons: A skeleton is a transformation of an identifier, such that
|
||||
* all identifiers that are confusable with each other have the same skeleton.
|
||||
* Using skeletons, it is possible to build a dictionary data structure for
|
||||
* a set of identifiers, and then quickly test whether a new identifier is
|
||||
* confusable with an identifier already in the set. The uspoof_getSkeleton()
|
||||
* family of functions will produce the skeleton from an identifier.
|
||||
*
|
||||
* Note that skeletons are not guaranteed to be stable between versions
|
||||
* of Unicode or ICU, so an applications should not rely on creating a permanent,
|
||||
* or difficult to update, database of skeletons. Instabilities result from
|
||||
* identifying new pairs or sequences of characters that are visually
|
||||
* confusable, and thus must be mapped to the same skeleton character(s).
|
||||
*
|
||||
*/
|
||||
|
||||
struct USpoofChecker;
|
||||
|
@ -156,9 +169,9 @@ typedef enum USpoofChecks {
|
|||
/** Mixed script confusable test.
|
||||
* When checking a single identifier, report a problem if
|
||||
* the identifier contains multiple scripts, and
|
||||
* is confusable with some other identifer in a single script
|
||||
* is confusable with some other identifier in a single script
|
||||
* When testing whether two identifiers are confusable, report that they are if
|
||||
* the two IDs are visually confusable, and
|
||||
* the two IDs are visually confusable,
|
||||
* and at least one contains characters from more than one script.
|
||||
*/
|
||||
USPOOF_MIXED_SCRIPT_CONFUSABLE = 2,
|
||||
|
@ -167,7 +180,7 @@ typedef enum USpoofChecks {
|
|||
* When checking a single identifier, report a problem if
|
||||
* The identifier is of a single script, and
|
||||
* there exists a confusable identifier in another script.
|
||||
* When testing whether two identfiers are confusable, report that they are if
|
||||
* When testing whether two identifiers are confusable, report that they are if
|
||||
* each is of a single script,
|
||||
* the scripts of the two identifiers are different, and
|
||||
* the identifiers are visually confusable.
|
||||
|
@ -177,20 +190,20 @@ typedef enum USpoofChecks {
|
|||
/** Any Case Modifier for confusable identifier tests.
|
||||
If specified, consider all characters, of any case, when looking for confusables.
|
||||
If USPOOF_ANY_CASE is not specified, identifiers being checked are assumed to have been
|
||||
case folded. Upper case conusable characters will not be checked.
|
||||
case folded. Upper case confusable characters will not be checked.
|
||||
Selects between Lower Case Confusable and
|
||||
Any Case Confusable. */
|
||||
USPOOF_ANY_CASE = 8,
|
||||
|
||||
/** Check that an identifer contains only characters from a
|
||||
/** Check that an identifier contains only characters from a
|
||||
* single script (plus chars from the common and inherited scripts.)
|
||||
* Applies to checks of a single identifier check only.
|
||||
*/
|
||||
USPOOF_SINGLE_SCRIPT = 16,
|
||||
|
||||
/** Check an identifier for the presence of invisble characters,
|
||||
/** Check an identifier for the presence of invisible characters,
|
||||
* such as zero-width spaces, or character sequences that are
|
||||
* likely not to display, such as multiple occurences of the same
|
||||
* likely not to display, such as multiple occurrences of the same
|
||||
* non-spacing mark. This check does not test the input string as a whole
|
||||
* for conformance to any particular syntax for identifiers.
|
||||
*/
|
||||
|
@ -223,7 +236,7 @@ uspoof_open(UErrorCode *status);
|
|||
/**
|
||||
* Open a Spoof checker from its serialized from, stored in 32-bit-aligned memory.
|
||||
* Inverse of uspoof_serialize().
|
||||
* The memory containing the serailized data must remain valid and unchanged
|
||||
* The memory containing the serialized data must remain valid and unchanged
|
||||
* as long as the spoof checker, or any cloned copies of the spoof checker,
|
||||
* are in use. Ownership of the memory remains with the caller.
|
||||
* The spoof checker (and any clones) must be closed prior to deleting the
|
||||
|
@ -260,7 +273,7 @@ uspoof_openFromSerialized(const void *data, int32_t length, int32_t *pActualLeng
|
|||
* input string is zero terminated.
|
||||
* @param confusablesWholeScript
|
||||
* a pointer to the whole script confusables definitions,
|
||||
* as found in the file xonfusablesWholeScript.txt from unicode.org.
|
||||
* as found in the file confusablesWholeScript.txt from unicode.org.
|
||||
* @param confusablesWholeScriptLen The length of the whole script confusables text, or
|
||||
* -1 if the input string is zero terminated.
|
||||
* @param errType In the event of an error in the input, indicates
|
||||
|
@ -432,7 +445,7 @@ uspoof_getAllowedLocales(USpoofChecker *sc, UErrorCode *status);
|
|||
*
|
||||
* @param sc The USpoofChecker
|
||||
* @param chars A Unicode Set containing the list of
|
||||
* charcters that are permitted. Ownership of the set
|
||||
* characters that are permitted. Ownership of the set
|
||||
* remains with the caller. The incoming set is cloned by
|
||||
* this function, so there are no restrictions on modifying
|
||||
* or deleting the USet after calling this function.
|
||||
|
@ -479,7 +492,7 @@ uspoof_getAllowedChars(const USpoofChecker *sc, UErrorCode *status);
|
|||
*
|
||||
* @param sc The USpoofChecker
|
||||
* @param chars A Unicode Set containing the list of
|
||||
* charcters that are permitted. Ownership of the set
|
||||
* characters that are permitted. Ownership of the set
|
||||
* remains with the caller. The incoming set is cloned by
|
||||
* this function, so there are no restrictions on modifying
|
||||
* or deleting the USet after calling this function.
|
||||
|
@ -517,7 +530,7 @@ uspoof_getAllowedUnicodeSet(const USpoofChecker *sc, UErrorCode *status);
|
|||
|
||||
/**
|
||||
* Check the specified string for possible security issues.
|
||||
* The text to be checked will typically be an indentifier of some sort.
|
||||
* The text to be checked will typically be an identifier of some sort.
|
||||
* The set of checks to be performed is specified with uspoof_setChecks().
|
||||
*
|
||||
* @param sc The USpoofChecker
|
||||
|
@ -533,7 +546,7 @@ uspoof_getAllowedUnicodeSet(const USpoofChecker *sc, UErrorCode *status);
|
|||
* is not needed.
|
||||
* If the string passes the requested checks the
|
||||
* parameter value will not be set.
|
||||
* @param status The error code, set if an error occured while attempting to
|
||||
* @param status The error code, set if an error occurred while attempting to
|
||||
* perform the check.
|
||||
* Spoofing or security issues detected with the input string are
|
||||
* not reported here, but through the function's return value.
|
||||
|
@ -552,7 +565,7 @@ uspoof_check(const USpoofChecker *sc,
|
|||
|
||||
/**
|
||||
* Check the specified string for possible security issues.
|
||||
* The text to be checked will typically be an indentifier of some sort.
|
||||
* The text to be checked will typically be an identifier of some sort.
|
||||
* The set of checks to be performed is specified with uspoof_setChecks().
|
||||
*
|
||||
* @param sc The USpoofChecker
|
||||
|
@ -566,7 +579,7 @@ uspoof_check(const USpoofChecker *sc,
|
|||
* is not needed.
|
||||
* If the string passes the requested checks the
|
||||
* parameter value will not be set.
|
||||
* @param status The error code, set if an error occured while attempting to
|
||||
* @param status The error code, set if an error occurred while attempting to
|
||||
* perform the check.
|
||||
* Spoofing or security issues detected with the input string are
|
||||
* not reported here, but through the function's return value.
|
||||
|
@ -588,7 +601,7 @@ uspoof_checkUTF8(const USpoofChecker *sc,
|
|||
#if U_SHOW_CPLUSPLUS_API
|
||||
/**
|
||||
* Check the specified string for possible security issues.
|
||||
* The text to be checked will typically be an indentifier of some sort.
|
||||
* The text to be checked will typically be an identifier of some sort.
|
||||
* The set of checks to be performed is specified with uspoof_setChecks().
|
||||
*
|
||||
* @param sc The USpoofChecker
|
||||
|
@ -600,7 +613,7 @@ uspoof_checkUTF8(const USpoofChecker *sc,
|
|||
* is not needed.
|
||||
* If the string passes the requested checks the
|
||||
* parameter value will not be set.
|
||||
* @param status The error code, set if an error occured while attempting to
|
||||
* @param status The error code, set if an error occurred while attempting to
|
||||
* perform the check.
|
||||
* Spoofing or security issues detected with the input string are
|
||||
* not reported here, but through the function's return value.
|
||||
|
@ -649,7 +662,7 @@ uspoof_checkUnicodeString(const USpoofChecker *sc,
|
|||
* @param length2 The length of the second string, expressed in
|
||||
* 16 bit UTF-16 code units, or -1 if the string is
|
||||
* zero terminated.
|
||||
* @param status The error code, set if an error occured while attempting to
|
||||
* @param status The error code, set if an error occurred while attempting to
|
||||
* perform the check.
|
||||
* Confusability of the strings is not reported here,
|
||||
* but through this function's return value.
|
||||
|
@ -682,7 +695,7 @@ uspoof_areConfusable(const USpoofChecker *sc,
|
|||
* confusability. The strings are in UTF-18 format.
|
||||
* @param length2 The length of the second string in bytes, or -1
|
||||
* if the string is zero terminated.
|
||||
* @param status The error code, set if an error occured while attempting to
|
||||
* @param status The error code, set if an error occurred while attempting to
|
||||
* perform the check.
|
||||
* Confusability of the strings is not reported here,
|
||||
* but through this function's return value.
|
||||
|
@ -713,7 +726,7 @@ uspoof_areConfusableUTF8(const USpoofChecker *sc,
|
|||
* confusability. The strings are in UTF-8 format.
|
||||
* @param s2 The second of the two strings to be compared for
|
||||
* confusability. The strings are in UTF-18 format.
|
||||
* @param status The error code, set if an error occured while attempting to
|
||||
* @param status The error code, set if an error occurred while attempting to
|
||||
* perform the check.
|
||||
* Confusability of the strings is not reported here,
|
||||
* but through this function's return value.
|
||||
|
@ -755,7 +768,7 @@ uspoof_areConfusableUnicodeString(const USpoofChecker *sc,
|
|||
* @param destCapacity The length of the output buffer, in 16 bit units.
|
||||
* The destCapacity may be zero, in which case the function will
|
||||
* return the actual length of the skeleton.
|
||||
* @param status The error code, set if an error occured while attempting to
|
||||
* @param status The error code, set if an error occurred while attempting to
|
||||
* perform the check.
|
||||
* @return The length of the skeleton string. The returned length
|
||||
* is always that of the complete skeleton, even when the
|
||||
|
@ -794,7 +807,7 @@ uspoof_getSkeleton(const USpoofChecker *sc,
|
|||
* @param destCapacity The length of the output buffer, in bytes.
|
||||
* The destCapacity may be zero, in which case the function will
|
||||
* return the actual length of the skeleton.
|
||||
* @param status The error code, set if an error occured while attempting to
|
||||
* @param status The error code, set if an error occurred while attempting to
|
||||
* perform the check. Possible Errors include U_INVALID_CHAR_FOUND
|
||||
* for invalid UTF-8 sequences, and
|
||||
* U_BUFFER_OVERFLOW_ERROR if the destination buffer is too small
|
||||
|
@ -835,7 +848,7 @@ uspoof_getSkeletonUTF8(const USpoofChecker *sc,
|
|||
* @param destCapacity The length of the output buffer, in bytes.
|
||||
* The destCapacity may be zero, in which case the function will
|
||||
* return the actual length of the skeleton.
|
||||
* @param status The error code, set if an error occured while attempting to
|
||||
* @param status The error code, set if an error occurred while attempting to
|
||||
* perform the check.
|
||||
* @return A reference to the destination (skeleton) string.
|
||||
*
|
||||
|
|
Loading…
Add table
Reference in a new issue