mirror of
https://github.com/unicode-org/icu.git
synced 2025-04-18 11:14:22 +00:00
ICU-2881 add comment about invariant characters to readme, say that we do not test on platforms where they do not have the same codes as elsewhere for the charset family
X-SVN-Rev: 12012
This commit is contained in:
parent
f42d20e808
commit
8820a52d8e
1 changed files with 69 additions and 2 deletions
|
@ -80,8 +80,9 @@
|
|||
|
||||
<li><a href="#MakeICUSmaller">How to Make ICU Smaller</a></li>
|
||||
|
||||
<li><a href="#ImportantNotesDefaultCP">Using the Default
|
||||
Codepage</a></li>
|
||||
<li><a href="#CharStrings">char * strings in ICU</a></li>
|
||||
|
||||
<li><a href="#ImportantNotesDefaultCP">Using the Default Codepage</a></li>
|
||||
|
||||
<li><a href="#ImportantNotesWindows">Windows Platform</a></li>
|
||||
|
||||
|
@ -1477,6 +1478,72 @@ del common/libicuuc.o
|
|||
href="http://oss.software.ibm.com/icu/userguide/">User Guide</a> "ICU Data"
|
||||
chapter.</p>
|
||||
|
||||
<h3><a name="#CharStrings" href="#CharStrings">char * strings in ICU</a></h3>
|
||||
|
||||
<p>The C/C++ languages do not provide a portable way to specify Unicode
|
||||
code point or string literals other than with arrays of numeric constants.
|
||||
For convenience, ICU4C tends to use char * strings in places where only
|
||||
"invariant characters" are used — a portable subset of the 7-bit ASCII
|
||||
repertoire — so that locale IDs, charset names, resource bundle item keys
|
||||
and similar can be easily specified as string literals in the source code.
|
||||
The same types of strings are also stored as "invariant character" char *
|
||||
strings in the ICU data files.</p>
|
||||
|
||||
<p>ICU has hardcoded mapping tables in <code>source/common/putil.c</code>
|
||||
to convert invariant characters to and from Unicode without using a full
|
||||
ICU converter.
|
||||
These tables must match the encoding of string literals in the ICU code
|
||||
as well as in the ICU data files.</p>
|
||||
|
||||
<p><strong>Important: </strong>ICU assumes that at least the
|
||||
invariant characters always have the same codes as is common on platforms
|
||||
with the same charset family (ASCII vs. EBCDIC).
|
||||
<em>ICU has not been tested on platforms where this is not the case.</em></p>
|
||||
|
||||
<p>Some usage of char * strings in ICU assumes the system charset
|
||||
instead of invariant characters;
|
||||
such strings are only handled with the default converter.
|
||||
See the following section.
|
||||
(The system charset is usually a superset of the invariant characters.)</p>
|
||||
|
||||
<p>The following are the ASCII and EBCDIC code values for all of the
|
||||
invariant characters (see also unicode/utypes.h):</p>
|
||||
|
||||
<table border="1">
|
||||
<tr><th>Character(s)</th><th>ASCII</th><th>EBCDIC</th></tr>
|
||||
<tr><td>a..i</td><td>61..69</td><td>81..89</td></tr>
|
||||
<tr><td>j..r</td><td>6A..72</td><td>91..99</td></tr>
|
||||
<tr><td>s..z</td><td>73..7A</td><td>A2..A9</td></tr>
|
||||
|
||||
<tr><td>A..I</td><td>41..49</td><td>C1..C9</td></tr>
|
||||
<tr><td>J..R</td><td>4A..52</td><td>D1..D9</td></tr>
|
||||
<tr><td>S..Z</td><td>53..5A</td><td>E2..E9</td></tr>
|
||||
|
||||
<tr><td>0..9</td><td>30..39</td><td>F0..F9</td></tr>
|
||||
|
||||
<tr><td>(space)</td><td>20</td><td>40</td></tr>
|
||||
|
||||
<tr><td>"</td><td>22</td><td>7F</td></tr>
|
||||
<tr><td>%</td><td>25</td><td>6C</td></tr>
|
||||
<tr><td>&</td><td>26</td><td>50</td></tr>
|
||||
<tr><td>'</td><td>27</td><td>7D</td></tr>
|
||||
<tr><td>(</td><td>28</td><td>4D</td></tr>
|
||||
<tr><td>)</td><td>29</td><td>5D</td></tr>
|
||||
<tr><td>*</td><td>2A</td><td>5C</td></tr>
|
||||
<tr><td>+</td><td>2B</td><td>4E</td></tr>
|
||||
<tr><td>,</td><td>2C</td><td>6B</td></tr>
|
||||
<tr><td>-</td><td>2D</td><td>60</td></tr>
|
||||
<tr><td>.</td><td>2E</td><td>4B</td></tr>
|
||||
<tr><td>/</td><td>2F</td><td>61</td></tr>
|
||||
<tr><td>:</td><td>3A</td><td>7A</td></tr>
|
||||
<tr><td>;</td><td>3B</td><td>5E</td></tr>
|
||||
<tr><td><</td><td>3C</td><td>4C</td></tr>
|
||||
<tr><td>=</td><td>3D</td><td>7E</td></tr>
|
||||
<tr><td>></td><td>3E</td><td>6E</td></tr>
|
||||
<tr><td>?</td><td>3F</td><td>6F</td></tr>
|
||||
<tr><td>_</td><td>5F</td><td>6D</td></tr>
|
||||
</table>
|
||||
|
||||
<h3><a name="ImportantNotesDefaultCP" href="#ImportantNotesDefaultCP">Using
|
||||
the default codepage</a></h3>
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue