ICU-1080 Some updates to the What's New section

X-SVN-Rev: 8415
2025-04-17 18:56:53 +00:00 · 2002-04-09 16:01:00 +00:00 · 2002-04-09 16:01:00 +00:00 · 4f62829e39
commit 4f62829e39
parent 8465aa1354
1 changed files with 33 additions and 307 deletions
--- a/icu4c/readme.html
+++ b/icu4c/readme.html
@ -31,7 +31,7 @@
    <h1>International Components for Unicode<br>
     ICU 2.1 ReadMe</h1>

-    <p>Version: 2002-Apr-04<br>
+    <p>Version: 2002-Apr-07<br>
     Copyright &copy; 1997-2002 International Business Machines Corporation and
    others. All Rights Reserved.</p>
    <!-- Remember that there is a copyright at the end too -->
@ -100,7 +100,7 @@

      <li>Character set conversions, with support for over 200 codepages</li>

-      <li>Locale data for more than 160 locales</li>
+      <li>Locale data for more than 220 locales</li>

      <li>Text collation (sorting) based on the Unicode Collation Algorithm
      (=ISO 14651), customizable and tailored for national standards</li>
@ -187,7 +187,7 @@
      </tr>

      <tr>
-        <td>Contacts &amp; Bug Reports/Feature Requests</td>
+        <td>Contacts and Bug Reports/Feature Requests</td>

        <td><a href=
        "http://oss.software.ibm.com/icu/archives/">http://oss.software.ibm.com/icu/archives/</a></td>
@ -202,7 +202,7 @@
    <p>The following list concentrates on changes that affect existing
    applications migrating from previous ICU releases. For more news about this
    release, see the <a href=
-    "http://oss.software.ibm.com/icu/download/2.0/">ICU 2.0 download
+    "http://oss.software.ibm.com/icu/download/2.1/">ICU 2.1 download
    page</a>.</p>

    <h3>Support for Unicode 3.1.1</h3>
@ -218,54 +218,9 @@
    pairs). Especially, normalization is revamped for support of supplementary
    characters and higher performance.</p>

-    <h3>Euro transition</h3>
-
-    <p>Locale data for countries that are switching their national currencies
-    to the Euro is updated to use the Euro symbol and appropriate currency
-    formatting. The old data is available in _PREEURO locale variants. The
-    _EURO variant selector can still be used to unambiguously get Euro currency
-    symbol formatting. For some time around the transition, software should
-    explicitly specify _PREEURO and _EURO variants to make sure to get the
-    intended currency format.</p>
-
-    <p>For more on this topic see the <a href=
-    "http://www.ibm.com/developerworks/unicode/library/u-euro/">developerWorks
-    article "Are you really ready for the Euro?"</a>.</p>
-
-    <h3>API changes</h3>
-
-    <p>Functions that take C-style string input arguments with const UChar *src
-    and int32_t srcLength now consistently treat srcLength==-1 to mean that the
-    input string is NUL-terminated and get srcLength=u_strlen(src).</p>
-
-    <p>Functions that take C-style string output arguments with UChar *dest and
-    int32_t destCapacity now handle NUL-termination of the output string
-    consistently. If the output length is equal to destCapacity, then dest is
-    filled with the output string and a warning code is set. For details about
-    string handling see the <a href=
-    "http://oss.software.ibm.com/icu/userguide/strings.html">User's Guide
-    Strings chapter</a>.</p>
-
-    <p>Some APIs have been <i>deprecated</i> for a long time (more than a year)
-    and have been removed now.<br>
-     Some other APIs have been marked as <i>deprecated</i> because they are
-    replaced by improved APIs; the newly deprecated APIs will be available for
-    another year. In particular, the C++ classes UnicodeConverter, Unicode, and
-    BiDi are deprecated in favor of the equally powerful C APIs.<br>
-     A few <i>draft</i> APIs have changed, especially for transliteration.</p>
-
-    <p>APIs that take a rules or pattern string (for collation,
-    transliteration, message formats, etc.) now also take a
-    <code>UParseError</code> structure that is filled with useful debugging
-    information when a rule syntax error is detected. This makes it easier in
-    large rules to find problems. As a result, the signatures of some functions
-    have changed. The old signatures will be available for about a year by
-    #defining a constant. See affected header files for details.</p>
-
-    <p>The C++ Normalizer class had a partially broken model for iterative
-    normalization; this is redone in a more consistent way. See the <a href=
-    "http://oss.software.ibm.com/icu/apiref/class_Normalizer.html">Normalizer
-    API documentation</a> for details.</p>
+    <p>ICU 2.1 also includes <a
+    href="http://www.unicode.org/versions/corrigendum3.html">Corrigendum #3:
+    U+F951 Normalization</a>.

    <h3>Memory and resource cleanup</h3>

@ -282,24 +237,6 @@
    The ICU libraries can then even be unloaded cleanly without shutting down
    the process.</p>

-    <h3>ICU versioning - C++ namespaces</h3>
-
-    <p>Beginning with ICU 2.0, multiple releases of ICU can be used in the same
-    process. Together with an arbitrary number of post-2.0 releases, one
-    pre-2.0 release can be loaded and active.</p>
-
-    <p>This is achieved by renaming all library exports to include a release
-    number suffix. Each global function and each class is renamed in this way
-    using a header file with #defines. For C++, if the compiler supports
-    namespaces, all ICU C++ classes are defined in the "icu" namespace. If the
-    compiler does not support namespaces, then the classes are renamed instead.
-    This change also reduces the chance of naming collisions with other
-    libraries.</p>
-
-    <p>For details see the <a href=
-    "http://oss.software.ibm.com/icu/userguide/design.html">User's Guide Design
-    Chapter</a>.</p>
-
    <h3>Data loading changed</h3>

    <p>ICU data loading is simplified for most users. By default, the ICU build
@ -323,248 +260,37 @@
    "http://oss.software.ibm.com/icu/userguide/icudata.html">User's Guide ICU
    Data Chapter</a>.</p>

-    <h3>Collation improvements</h3>
-
-    <p>The performance of Japanese Katakana collation is improved, and the
-    Japanese collation is changed for conformance with the JIS X 4061 standard.
-    The improvement is in the handling of the length and iteration marks,
-    making the processing of regular letters faster.</p>
-
-    <p>The JIS X 4061 standard specifies a 5-level sorting algorithm. Sorting
-    with all five levels according to JIS is achieved in ICU 2.0 with the
-    "identical" strength. The fifth level distinguishes regular character codes
-    from compatibility variants.</p>
-
-    <p>There is special code to handle the fourth (quarternary) level of the
-    JIS standard, which distinguishes between Hiragana and Katakana letters. In
-    ICU 2.0 string comparisons (like ucol_strcoll), when using the "shifted"
-    option, this is slow because it generates complete sort keys for both
-    strings. This is not an issue if the "shifted" option is not used, or if
-    the string comparison is done with fewer levels.</p>
-
-    <p>Quarternary strength, without the "shifted" option, is the default for
-    Japanese collation in ICU 2.0.</p>
-
-    <p>Three-level sorting (tertiary strength) and lower &mdash; if sufficient
-    &mdash; is faster even with "shifted" on (for string comparisons:
-    <em>much</em> faster in this case).</p>
-
-    <h3>License Change (for ICU 1.8.1 and up)</h3>
-
-    <p>The ICU projects (ICU4C and ICU4J) have changed their licenses from the
-    IPL (IBM Public License) to the X license. The X license is a non-viral and
-    recommended free software license that is compatible with the GNU GPL
-    license. This is effective starting with release 1.8.1 of ICU4C and release
-    1.3.1 of ICU4J. All previous ICU releases will continue to utilize the IPL.
-    New ICU releases will adopt the X license. The users of previous releases
-    of ICU will need to accept the terms and conditions of the X license in
-    order to adopt the new ICU releases.</p>
-
-    <p>The main effect of the change is to provide GPL compatibility. The X
-    license is listed as GPL compatible, see the gnu page at <a href=
-    "http://www.gnu.org/philosophy/license-list.html#GPLCompatibleLicenses">http://www.gnu.org/philosophy/license-list.html#GPLCompatibleLicenses</a>.</p>
-
-    <p>The text of the X license is available at <a href=
-    "http://www.x.org/terms.htm">http://www.x.org/terms.htm</a>. The IBM
-    version contains the essential text of the license, omitting the X-specific
-    trademarks and copyright notices.</p>
-
-    <p>For more details please see the <a href=
-    "http://oss.software.ibm.com/icu/press.html">press announcement</a> and the
-    <a href="http://oss.software.ibm.com/icu/project_faq.html#license">Project
-    FAQ</a>.</p>
-
-    <h3>Transliterator improvements</h3>
-
-    <p>The transliterator service has undergone an extensive overhaul, in both
-    the rule-based engine and the built-in system rules. For a complete
-    description see the <a href=
-    "http://oss.software.ibm.com/icu/userguide/Transliteration.html">User's
-    Guide chapter on transliteration</a>.</p>
-
-    <ul>
-      <li><b>New or rewritten rules:</b> <tt>Any-Accents</tt>,
-      <tt>Any-Publishing</tt>, <tt>Cyrillic-Latin</tt>*, <tt>Greek-Latin</tt>*,
-      <tt>Greek-Latin/UNGEGN</tt> (aka <tt>el-Latin</tt>),
-      <tt>Hiragana-Latin</tt>*, and <tt>Latin-Katakana</tt>*. New algorithmic
-      rules include <tt>Any-Name</tt>*, the normalization rules
-      <tt>Any-NFC</tt>, <tt>Any-NFKC</tt>, <tt>Any-NFD</tt>, and
-      <tt>Any-NFKD</tt>, casing rules <tt>Any-Upper</tt>, <tt>Any-Lower</tt>,
-      and <tt>Any-Title</tt>. <tt>Unicode-Hex</tt>* has been renamed
-      <tt>Any-Hex</tt>*. <tt>Any-Remove</tt> deletes its input. [*<em>applies
-      to reverse rule as well</em>]</li>
-
-      <li><b>Indic script rules:</b> Transliterators between Indic scripts and
-      from each script to and from Latin have been completely revised. Scripts
-      included are Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam,
-      Oriya, Tamil, and Telugu. Taking Bengali as an example, transliterators
-      <tt>Bengali-X</tt> and <tt>X-Bengali</tt> exist, where X is any of the
-      other listed Indic scripts, or Latin.</li>
-
-      <li><b>Deleted rules:</b> <tt>UnicodeName-UnicodeChar</tt> has been
-      replaced by <tt>Any-Name</tt>*. <tt>Latin-Arabic</tt>* and
-      <tt>Latin-Hebrew</tt>* have been removed until they can be rewritten.
-      <tt>KeyboardEscape-Latin1</tt> has been replaced by <tt>Any-Accents</tt>
-      and <tt>Any-Publishing</tt>. <tt>Latin-Kana</tt>* has been replaced by
-      <tt>Latin-Katakana</tt>* and <tt>Latin-Hiragana</tt>*. [*<em>applies to
-      reverse rule as well</em>]</li>
-
-      <li><b>ID syntax changes:</b> Transliterator IDs ignore case and
-      whitespace now. They now have the standard form
-      <em>[filter]source-target/variant</em>. The "<em>[filter]</em>" element
-      is optional; if present, it limits the characters that the transliterator
-      operates on. The "<em>source-</em>" element is optional; if omitted, it
-      is taken to be <tt>Any</tt>. The "<em>/variant</em>" element is also
-      optional; if present, it selects between different flavors of a related
-      set of transliterators, for example, <tt>Greek-Latin</tt> and
-      <tt>Greek-Latin/UNGEGN</tt>. The source, target, and variant specifiers
-      are case-insensitive strings of the form
-      <tt>/[_[:L:]][_[:L:][:N:]]*/</tt>.</li>
-
-      <li>
-        <b>Locale support:</b> The source, target, or both may be locales. In
-        this case the transliterator rules will be looked up in the system
-        locale resource bundles. Rules are sought under three tags, listed
-        below. The text after the underscore in each tag is always
-        canonicalized to uppercase before lookup. <em>Note: The underscore is
-        currently omitted from ICU4C tags, but will be restored when
-        possible.</em> 
+    <h3>Library linking changed</h3>

+      <ul>
+        <li><b>Linkage improvement for HP/UX</b>
        <ul>
-          <li><tt>TransliterateTo_<em>SCRIPT</em></tt>: Unidirectional rules
-          from the enclosing locale to another script or specifier.</li>
-
-          <li><tt>TransliterateFrom_<em>SCRIPT</em></tt>: Unidirectional rules
-          from another script or specifier to the enclosing locale.</li>
-
-          <li><tt>Transliterate_<em>SCRIPT</em></tt>: Bidirectional rules, with
-          the forward direction being To and the reverse direction being
-          From.</li>
+            <li>The current directory (.) is now searched for libraries.</li>
+            <li>Where available, $ORIGIN is set in the embedded path so that if one ICU 
+                library is found, the system will be able to locate the others.</li>
        </ul>
-        Lookup proceeds in the following order: 
+    </li>
+    <li><b>Library Versioning for AIX (xlC and VisualAge)</b>
+          <ul>
+            <li>AIX does not have facilities to enable library versioning. With this patch, 
+            libraries will now be named for instance <tt>libicuuc<b>20.1.so</b></tt>
+            , however symlinks will allow applications to still link using <tt>-licuuc</tt>
+            (without the benefit of versioning).  To benefit from versioning, on AIX
+            link against the major and minor versions by using <tt>-licuuc20</tt>. 
+            </li>
+          </ul>
+          </li>
+        <li><b>Data Library Versioning for all platforms</b>
+      <ul><li>The versioned name for the data library will be linked against by the ICU libraries,
+         that is, libicudt20b.so instead of libicudata.so</li></ul>
+     </li>
+      </ul>

-        <ul>
-          <li>In the dynamic registry: <em>source-target</em></li>
+    <h3>Multithreaded usage is safer</h3>

-          <li>In the <em>source</em> locale:
-          <tt>TransliterateTo_<em>TARGET</em></tt> then
-          <tt>Transliterate_<em>TARGET</em></tt> (forward direction)</li>
+    <p>It was discovered that some parts of ICU were not initialized in a
+    thread safe manner. This has been fixed.</p>

-          <li>In the <em>target</em> locale:
-          <tt>TransliterateFrom_<em>SOURCE</em></tt> then
-          <tt>Transliterate_<em>SOURCE</em></tt> (reverse direction)</li>
-        </ul>
-        If either the source or target specifier is not a locale then the
-        corresponding locale lookup is skipped. If either is a locale, then
-        locale fallback from <tt>aa_BB_CCC</tt> to <tt>aa_BB</tt> to
-        <tt>aa</tt> is performed (where <tt>aa</tt>, <tt>BB</tt>, and
-        <tt>CCC</tt> are the locale language, country, and variant). The final
-        fallback is from the specifier, whether it is a locale or not (e.g.,
-        script abbreviation), to the long script name associated with that
-        specifier. If a tag lookup succeeds, the attached element should be a
-        string array of <i>2n</i> items where <i>n</i> &gt;= 1. Each pair of
-        strings is a variant name and rule string. The variants are matched
-        against the requested variant. If no variant is specified then the
-        first variant is considered to match.
-      </li>
-
-      <li><b>Filters on compounds IDs:</b> A filter on a compound
-      transliterator can now be specified by giving a leading entry that
-      contains a filter and no transliterator ID. For example, "<tt>[abc];
-      Latin-Katakana; Katakana-Hiragana</tt>" submits only the characters
-      contained in the UnicodeSet <tt>[abc]</tt> to the compound transliterator
-      <tt>Latin-Katakana; Katakana-Hiragana</tt>.</li>
-
-      <li><b>Explicit reverse IDs:</b> Typically if a transliterator
-      <tt>A-B</tt> is formed, and its inverse is requested, the system tries to
-      create <tt>B-A</tt>. That is, the source and target are exchanged. In
-      some cases, the user may wish a different transliterator to be considered
-      the reverse. In order to do this, the reverse ID is specified in
-      parentheses immediately following the ID. For example, "<tt>A-B
-      (B-C)</tt>" is a transliterator <tt>A-B</tt> whose inverse is
-      <tt>B-C</tt>. If the ID of the inverse is requested, "<tt>B-C (A-B)</tt>"
-      is returned. The forward or reverse component may be empty, so
-      "<tt>(B-C)</tt>" and "<tt>A-B()</tt>" are legal IDs with <tt>Null</tt>
-      transliterator for the forward and reverse direction, respectively. This
-      is most useful in compounds where one element has no inverse or where a
-      different inverse from the standard inverse is desired. For example,
-      "<tt>Any-Lower(); Latin-Cyrillic</tt>".</li>
-
-      <li><b>Quantifiers:</b> Transliterator rules may now contain quantifiers
-      '<tt>*</tt>', '<tt>+</tt>', and '<tt>?</tt>'. These indicate zero or
-      more, one or more, and zero or one matches, respectively. Quantifiers
-      apply to the last element, be it a single character, a UnicodeSet, a
-      segment definition, or a quote; the entire preceding element is repeated.
-      Quantifiers are implemented as greedy, non-backtracking matchers, unlike
-      their typical implementation in regular expressions. As a result,
-      expressions that match in a traditional regular expression engine (e.g.,
-      Perl) will not match in transliterator. E.g., "[a-z]+ q &gt; x;" will
-      <em>not</em> match "abcq", since the '<tt>+</tt>' quantifier consumes all
-      four characters.</li>
-
-      <li><b>Dot character:</b> A new special character is recognized in rules,
-      '<tt>.</tt>' (U+0020). This character matches any characters in the set
-      <tt>[^[:Zp:][:Zl:]\r\n$]</tt>. Note the trailing '<tt>$</tt>' in the set
-      pattern, which indicates that the ETHER character is <em>not</em> matched
-      by '<tt>.</tt>'.</li>
-
-      <li><b>::ID blocks in rules:</b> Transliterator IDs may now be included
-      in rule sets. These may occur in two locations: as one contiguous block
-      before any other rules, and as one contiguous block after all rules. The
-      effect of placing <tt>::ID</tt>s into a rule set is to enclose the
-      rule-based transliterator within a compound transliterator containing the
-      indicated IDs. The <tt>::ID</tt> syntax is exactly the same as the
-      standard ID syntax, with the difference that each ID element is preceded
-      by the special token "<tt>::</tt>".</li>
-
-      <li><b>Segment definitions more flexible:</b> Segment definitions may be
-      nested and are now unlimited in number. Prior to 2.0, segments could not
-      be nested and were limited to nine ($1 to $9).</li>
-
-      <li><b>Variable range pragma:</b> A new pragma is supported. This follows
-      the syntax:<code>use variable range 0xE800 0xEFFF;</code> (Any two code
-      points may be specified.) The code points are specified as decimal
-      constants, octal constants with a leading '0', or hexadecimal constants
-      with a leading "0x". The given range is used internally for stand-in
-      characters during processing. The default range is <b>0xF000..0xF8FF</b>.
-      If a rule set explicitly uses characters in the default variable range, a
-      new range, not containing any characters in use in the rule set, must be
-      specified. <em>Note:</em> This is the first of several planned
-      pragmas.</li>
-
-      <li><b>Factory method registration:</b> Factory methods (function
-      pointers in ICU4C; functor objects in ICU4J) may be registered against
-      transliterator IDs. This is generally more efficient than the
-      registration of singleton prototypes, since no actual transliterator
-      object need be created until the user requires one. See the
-      <tt>registerFactory()</tt> method in <tt>Transliterator</tt>.</li>
-
-      <li><b>Filtering semantics changed for subclasses:</b> Subclasses now
-      need not concern themselves with filters. Instead, they may assume that
-      all characters received by <tt>handleTransliterate()</tt> have already
-      passed through the filter. This simplifies subclass code greatly.</li>
-    </ul>
-
-    <h3><a name="NewsUnicodeSet">UnicodeSet Improvements</a></h3>
-
-    <ul>
-      <li><b><tt>[:Any:]</tt> set:</b> The set <tt>[:Any:]</tt> matches all
-      Unicode code points, that is, U+0000..U+10FFFF.</li>
-
-      <li><b><tt>\p{}</tt> syntax:</b> UnicodeSet now recognizes a Perlish
-      syntax for character properties. Any property designated as
-      <tt>[:Foo:]</tt> may equivalently be designated <tt>\p{Foo}</tt>.</li>
-
-      <li><b>Short, medium, and long property names:</b> In addition to the
-      short property names, such as <tt>[:Ll:]</tt>, equivalent medium (e.g.,
-      <tt>[:gc=Ll:]</tt>) and long (e.g.,
-      <tt>[:GeneralCategory=LowercaseLetter:]</tt>) forms are recognized. See
-      the <a href=
-      "http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/unicodeset_properties.html">
-      UnicodeSet Properties design document</a> for details. As of this
-      release, general categories, numeric value, and script are
-      supported.</li>
-    </ul>
    <hr>

    <h2><a name="Download" href="#Download">How to Download the Source
@ -605,7 +331,7 @@
    distribution archives) in your file system. You can also view the <a href=
    "http://oss.software.ibm.com/icu/userguide/design.html">User's Guide</a> to
    see which libraries you need for your software product. You need at least
-    the data (icudt) and the common (icuuc) libraries in order to use ICU.</p>
+    the data (<code>[lib]icudt</code>) and the common (<code>[lib]icuuc</code>) libraries in order to use ICU.</p>

    <table border="1" cellpadding="0" width="100%" summary="">
      <caption>