ICU-2881 add comment about invariant characters to readme, say that we do not test on platforms where they do not have the same codes as elsewhere for the charset family

X-SVN-Rev: 12012
This commit is contained in:
Markus Scherer 2003-05-19 22:26:25 +00:00
parent f42d20e808
commit 8820a52d8e

View file

@ -80,8 +80,9 @@
<li><a href="#MakeICUSmaller">How to Make ICU Smaller</a></li>
<li><a href="#ImportantNotesDefaultCP">Using the Default
Codepage</a></li>
<li><a href="#CharStrings">char * strings in ICU</a></li>
<li><a href="#ImportantNotesDefaultCP">Using the Default Codepage</a></li>
<li><a href="#ImportantNotesWindows">Windows Platform</a></li>
@ -1477,6 +1478,72 @@ del common/libicuuc.o
href="http://oss.software.ibm.com/icu/userguide/">User Guide</a> "ICU Data"
chapter.</p>
<h3><a name="#CharStrings" href="#CharStrings">char * strings in ICU</a></h3>
<p>The C/C++ languages do not provide a portable way to specify Unicode
code point or string literals other than with arrays of numeric constants.
For convenience, ICU4C tends to use char * strings in places where only
"invariant characters" are used &mdash; a portable subset of the 7-bit ASCII
repertoire &mdash; so that locale IDs, charset names, resource bundle item keys
and similar can be easily specified as string literals in the source code.
The same types of strings are also stored as "invariant character" char *
strings in the ICU data files.</p>
<p>ICU has hardcoded mapping tables in <code>source/common/putil.c</code>
to convert invariant characters to and from Unicode without using a full
ICU converter.
These tables must match the encoding of string literals in the ICU code
as well as in the ICU data files.</p>
<p><strong>Important: </strong>ICU assumes that at least the
invariant characters always have the same codes as is common on platforms
with the same charset family (ASCII vs. EBCDIC).
<em>ICU has not been tested on platforms where this is not the case.</em></p>
<p>Some usage of char * strings in ICU assumes the system charset
instead of invariant characters;
such strings are only handled with the default converter.
See the following section.
(The system charset is usually a superset of the invariant characters.)</p>
<p>The following are the ASCII and EBCDIC code values for all of the
invariant characters (see also unicode/utypes.h):</p>
<table border="1">
<tr><th>Character(s)</th><th>ASCII</th><th>EBCDIC</th></tr>
<tr><td>a..i</td><td>61..69</td><td>81..89</td></tr>
<tr><td>j..r</td><td>6A..72</td><td>91..99</td></tr>
<tr><td>s..z</td><td>73..7A</td><td>A2..A9</td></tr>
<tr><td>A..I</td><td>41..49</td><td>C1..C9</td></tr>
<tr><td>J..R</td><td>4A..52</td><td>D1..D9</td></tr>
<tr><td>S..Z</td><td>53..5A</td><td>E2..E9</td></tr>
<tr><td>0..9</td><td>30..39</td><td>F0..F9</td></tr>
<tr><td>(space)</td><td>20</td><td>40</td></tr>
<tr><td>"</td><td>22</td><td>7F</td></tr>
<tr><td>%</td><td>25</td><td>6C</td></tr>
<tr><td>&amp;</td><td>26</td><td>50</td></tr>
<tr><td>'</td><td>27</td><td>7D</td></tr>
<tr><td>(</td><td>28</td><td>4D</td></tr>
<tr><td>)</td><td>29</td><td>5D</td></tr>
<tr><td>*</td><td>2A</td><td>5C</td></tr>
<tr><td>+</td><td>2B</td><td>4E</td></tr>
<tr><td>,</td><td>2C</td><td>6B</td></tr>
<tr><td>-</td><td>2D</td><td>60</td></tr>
<tr><td>.</td><td>2E</td><td>4B</td></tr>
<tr><td>/</td><td>2F</td><td>61</td></tr>
<tr><td>:</td><td>3A</td><td>7A</td></tr>
<tr><td>;</td><td>3B</td><td>5E</td></tr>
<tr><td>&lt;</td><td>3C</td><td>4C</td></tr>
<tr><td>=</td><td>3D</td><td>7E</td></tr>
<tr><td>&gt;</td><td>3E</td><td>6E</td></tr>
<tr><td>?</td><td>3F</td><td>6F</td></tr>
<tr><td>_</td><td>5F</td><td>6D</td></tr>
</table>
<h3><a name="ImportantNotesDefaultCP" href="#ImportantNotesDefaultCP">Using
the default codepage</a></h3>