ICU-22489 Clarify the default setting of Collator

See #2595
2025-04-07 06:25:30 +00:00 · 2023-09-13 00:56:05 +00:00 · 2023-09-13 00:56:05 +00:00 · 9e9bc3695f
commit 9e9bc3695f
parent 9fb9bd4950
2 changed files with 48 additions and 29 deletions
--- a/icu4c/source/i18n/unicode/ucol.h
+++ b/icu4c/source/i18n/unicode/ucol.h
@ -251,42 +251,52 @@ typedef enum {
      */
     UCOL_FRENCH_COLLATION, 
     /** Attribute for handling variable elements.
-      * Acceptable values are UCOL_NON_IGNORABLE (default)
-      * which treats all the codepoints with non-ignorable 
+      * Acceptable values are UCOL_NON_IGNORABLE
+      * which treats all the codepoints with non-ignorable
      * primary weights in the same way,
-      * and UCOL_SHIFTED which causes codepoints with primary 
+      * and UCOL_SHIFTED which causes codepoints with primary
      * weights that are equal or below the variable top value
-      * to be ignored on primary level and moved to the quaternary 
-      * level.
+      * to be ignored on primary level and moved to the quaternary
+      * level. The default setting in a Collator object depends on the
+      * locale data loaded from the resources. For most locales, the
+      * default is UCOL_NON_IGNORABLE, but for others, such as "th",
+      * the default could be UCOL_SHIFTED.
      * @stable ICU 2.0
      */
-     UCOL_ALTERNATE_HANDLING, 
+     UCOL_ALTERNATE_HANDLING,
     /** Controls the ordering of upper and lower case letters.
-      * Acceptable values are UCOL_OFF (default), which orders
+      * Acceptable values are UCOL_OFF, which orders
      * upper and lower case letters in accordance to their tertiary
-      * weights, UCOL_UPPER_FIRST which forces upper case letters to 
-      * sort before lower case letters, and UCOL_LOWER_FIRST which does 
-      * the opposite.
+      * weights, UCOL_UPPER_FIRST which forces upper case letters to
+      * sort before lower case letters, and UCOL_LOWER_FIRST which does
+      * the opposite. The default setting in a Collator object depends on the
+      * locale data loaded from the resources. For most locales, the
+      * default is UCOL_OFF, but for others, such as "da" or "mt",
+      * the default could be UCOL_UPPER.
      * @stable ICU 2.0
      */
-     UCOL_CASE_FIRST, 
+     UCOL_CASE_FIRST,
     /** Controls whether an extra case level (positioned before the third
-      * level) is generated or not. Acceptable values are UCOL_OFF (default), 
+      * level) is generated or not. Acceptable values are UCOL_OFF,
      * when case level is not generated, and UCOL_ON which causes the case
      * level to be generated. Contents of the case level are affected by
-      * the value of UCOL_CASE_FIRST attribute. A simple way to ignore 
+      * the value of UCOL_CASE_FIRST attribute. A simple way to ignore
      * accent differences in a string is to set the strength to UCOL_PRIMARY
-      * and enable case level.
+      * and enable case level. The default setting in a Collator object depends
+      * on the locale data loaded from the resources.
      * @stable ICU 2.0
      */
     UCOL_CASE_LEVEL,
     /** Controls whether the normalization check and necessary normalizations
-      * are performed. When set to UCOL_OFF (default) no normalization check
-      * is performed. The correctness of the result is guaranteed only if the 
+      * are performed. When set to UCOL_OFF no normalization check
+      * is performed. The correctness of the result is guaranteed only if the
      * input data is in so-called FCD form (see users manual for more info).
      * When set to UCOL_ON, an incremental check is performed to see whether
      * the input data is in the FCD form. If the data is not in the FCD form,
-      * incremental NFD normalization is performed.
+      * incremental NFD normalization is performed. The default setting in a
+      * Collator object depends on the locale data loaded from the resources.
+      * For many locales, the default is UCOL_OFF, but for others, such as "hi"
+      * "vi', or "bn", * the default could be UCOL_ON.
      * @stable ICU 2.0
      */
     UCOL_NORMALIZATION_MODE, 
--- a/icu4j/main/collate/src/main/java/com/ibm/icu/text/RuleBasedCollator.java
+++ b/icu4j/main/collate/src/main/java/com/ibm/icu/text/RuleBasedCollator.java
@ -404,9 +404,11 @@ public final class RuleBasedCollator extends Collator {
    }

    /**
-     * Sets whether uppercase characters sort before lowercase characters or vice versa, in strength TERTIARY. The
-     * default mode is false, and so lowercase characters sort before uppercase characters. If true, sort upper case
-     * characters first.
+     * Sets whether uppercase characters sort before lowercase characters or vice versa, in strength TERTIARY.
+     * If false, lowercase characters sort before uppercase characters. If true, sort upper case
+     * characters first. The default setting in a Collator object depends on the
+     * locale data loaded from the resources. For most locales, the default is false,
+     * but for others, such as "da" or "mt", the default could be true.
     *
     * @param upperfirst
     *            true to sort uppercase characters before lowercase characters, false to sort lowercase characters
@ -426,9 +428,11 @@ public final class RuleBasedCollator extends Collator {
    }

    /**
-     * Sets the orders of lower cased characters to sort before upper cased characters, in strength TERTIARY. The
-     * default mode is false. If true is set, the RuleBasedCollator will sort lower cased characters before the upper
+     * Sets the orders of lower cased characters to sort before upper cased characters, in strength TERTIARY.
+     * If true is set, the RuleBasedCollator will sort lower cased characters before the upper
     * cased ones. Otherwise, if false is set, the RuleBasedCollator will ignore case preferences.
+     * The default default setting in a Collator object depends on the locale data loaded from
+     * the resources.
     *
     * @param lowerfirst
     *            true for sorting lower cased characters before upper cased characters, false to ignore case
@ -568,10 +572,11 @@ public final class RuleBasedCollator extends Collator {
    }

    /**
-     * Sets the mode for the direction of SECONDARY weights to be used in French collation. The default value is false,
+     * Sets the mode for the direction of SECONDARY weights to be used in French collation. If set to false,
     * which treats SECONDARY weights in the order they appear. If set to true, the SECONDARY weights will be sorted
     * backwards. See the section on <a href="https://unicode-org.github.io/icu/userguide/collation/architecture">
-     * French collation</a> for more information.
+     * French collation</a> for more information. The default setting in a Collator object depends on the
+     * locale data loaded from the resources. For example, for "fr_CA" locale, the default is true.
     *
     * @param flag
     *            true to set the French collation on, false to set it off
@ -590,11 +595,14 @@ public final class RuleBasedCollator extends Collator {
    /**
     * Sets the alternate handling for QUATERNARY strength to be either shifted or non-ignorable. See the UCA definition
     * on <a href="https://www.unicode.org/reports/tr10/#Variable_Weighting">Variable Weighting</a>. This
-     * attribute will only be effective when QUATERNARY strength is set. The default value for this mode is false,
-     * corresponding to the NON_IGNORABLE mode in UCA. In the NON_IGNORABLE mode, the RuleBasedCollator treats all
+     * attribute will only be effective when QUATERNARY strength is set. If the mode is set to
+     * false, it corresponds to the NON_IGNORABLE mode in UCA. In the NON_IGNORABLE mode, the RuleBasedCollator treats all
     * the code points with non-ignorable primary weights in the same way. If the mode is set to true, the behavior
     * corresponds to SHIFTED defined in UCA, this causes code points with PRIMARY orders that are equal or below the
     * variable top value to be ignored in PRIMARY order and moved to the QUATERNARY order.
+     * The default setting in a Collator object depends on the locale data loaded from the
+     * resources. For most locales, the default is false, but for others, such as "th",
+      * the default could be true.
     *
     * @param shifted
     *            true if SHIFTED behavior for alternate handling is desired, false for the NON_IGNORABLE behavior.
@ -614,10 +622,11 @@ public final class RuleBasedCollator extends Collator {
     * <p>
     * When case level is set to true, an additional weight is formed between the SECONDARY and TERTIARY weight, known
     * as the case level. The case level is used to distinguish large and small Japanese Kana characters. Case level
-     * could also be used in other situations. For example to distinguish certain Pinyin characters. The default value
-     * is false, which means the case level is not generated. The contents of the case level are affected by the case
+     * could also be used in other situations. For example to distinguish certain Pinyin characters. If the value
+     * is false, it means the case level is not generated. The contents of the case level are affected by the case
     * first mode. A simple way to ignore accent differences in a string is to set the strength to PRIMARY and enable
-     * case level.
+     * case level. The default setting in a Collator object depends
+     * on the locale data loaded from the resources.
     * <p>
     * See the section on <a href="https://unicode-org.github.io/icu/userguide/collation/architecture">case
     * level</a> for more information.