ICU-2881 add comment about invariant characters to readme, say that we do not test on platforms where they do not have the same codes as elsewhere for the charset family

X-SVN-Rev: 12012
2025-04-18 11:14:22 +00:00 · 2003-05-19 22:26:25 +00:00 · 2003-05-19 22:26:25 +00:00 · 8820a52d8e
commit 8820a52d8e
parent f42d20e808
1 changed files with 69 additions and 2 deletions
--- a/icu4c/readme.html
+++ b/icu4c/readme.html
@ -80,8 +80,9 @@

          <li><a href="#MakeICUSmaller">How to Make ICU Smaller</a></li>

-          <li><a href="#ImportantNotesDefaultCP">Using the Default
-          Codepage</a></li>
+          <li><a href="#CharStrings">char * strings in ICU</a></li>
+
+          <li><a href="#ImportantNotesDefaultCP">Using the Default Codepage</a></li>

          <li><a href="#ImportantNotesWindows">Windows Platform</a></li>

@ -1477,6 +1478,72 @@ del common/libicuuc.o
    href="http://oss.software.ibm.com/icu/userguide/">User Guide</a> "ICU Data"
    chapter.</p>

+    <h3><a name="#CharStrings" href="#CharStrings">char * strings in ICU</a></h3>
+
+    <p>The C/C++ languages do not provide a portable way to specify Unicode
+    code point or string literals other than with arrays of numeric constants.
+    For convenience, ICU4C tends to use char * strings in places where only
+    "invariant characters" are used &mdash; a portable subset of the 7-bit ASCII
+    repertoire &mdash; so that locale IDs, charset names, resource bundle item keys
+    and similar can be easily specified as string literals in the source code.
+    The same types of strings are also stored as "invariant character" char *
+    strings in the ICU data files.</p>
+
+    <p>ICU has hardcoded mapping tables in <code>source/common/putil.c</code>
+    to convert invariant characters to and from Unicode without using a full
+    ICU converter.
+    These tables must match the encoding of string literals in the ICU code
+    as well as in the ICU data files.</p>
+
+    <p><strong>Important: </strong>ICU assumes that at least the
+    invariant characters always have the same codes as is common on platforms
+    with the same charset family (ASCII vs. EBCDIC).
+    <em>ICU has not been tested on platforms where this is not the case.</em></p>
+
+    <p>Some usage of char * strings in ICU assumes the system charset
+    instead of invariant characters;
+    such strings are only handled with the default converter.
+    See the following section.
+    (The system charset is usually a superset of the invariant characters.)</p>
+
+    <p>The following are the ASCII and EBCDIC code values for all of the
+    invariant characters (see also unicode/utypes.h):</p>
+
+    <table border="1">
+      <tr><th>Character(s)</th><th>ASCII</th><th>EBCDIC</th></tr>
+      <tr><td>a..i</td><td>61..69</td><td>81..89</td></tr>
+      <tr><td>j..r</td><td>6A..72</td><td>91..99</td></tr>
+      <tr><td>s..z</td><td>73..7A</td><td>A2..A9</td></tr>
+
+      <tr><td>A..I</td><td>41..49</td><td>C1..C9</td></tr>
+      <tr><td>J..R</td><td>4A..52</td><td>D1..D9</td></tr>
+      <tr><td>S..Z</td><td>53..5A</td><td>E2..E9</td></tr>
+
+      <tr><td>0..9</td><td>30..39</td><td>F0..F9</td></tr>
+
+      <tr><td>(space)</td><td>20</td><td>40</td></tr>
+
+      <tr><td>"</td><td>22</td><td>7F</td></tr>
+      <tr><td>%</td><td>25</td><td>6C</td></tr>
+      <tr><td>&amp;</td><td>26</td><td>50</td></tr>
+      <tr><td>'</td><td>27</td><td>7D</td></tr>
+      <tr><td>(</td><td>28</td><td>4D</td></tr>
+      <tr><td>)</td><td>29</td><td>5D</td></tr>
+      <tr><td>*</td><td>2A</td><td>5C</td></tr>
+      <tr><td>+</td><td>2B</td><td>4E</td></tr>
+      <tr><td>,</td><td>2C</td><td>6B</td></tr>
+      <tr><td>-</td><td>2D</td><td>60</td></tr>
+      <tr><td>.</td><td>2E</td><td>4B</td></tr>
+      <tr><td>/</td><td>2F</td><td>61</td></tr>
+      <tr><td>:</td><td>3A</td><td>7A</td></tr>
+      <tr><td>;</td><td>3B</td><td>5E</td></tr>
+      <tr><td>&lt;</td><td>3C</td><td>4C</td></tr>
+      <tr><td>=</td><td>3D</td><td>7E</td></tr>
+      <tr><td>&gt;</td><td>3E</td><td>6E</td></tr>
+      <tr><td>?</td><td>3F</td><td>6F</td></tr>
+      <tr><td>_</td><td>5F</td><td>6D</td></tr>
+    </table>
+
    <h3><a name="ImportantNotesDefaultCP" href="#ImportantNotesDefaultCP">Using
    the default codepage</a></h3>