ICU-6677 notice in readme.html about new validation in u_strFromUTF32() and u_strToUTF32()

X-SVN-Rev: 25449
This commit is contained in:
Markus Scherer 2009-02-19 22:50:02 +00:00
parent 311b29556f
commit c45ac93e07

View file

@ -6,7 +6,7 @@
<title>ReadMe for ICU</title>
<meta name="COPYRIGHT" content=
"Copyright (c) 1997-2008 IBM Corporation and others. All Rights Reserved." />
"Copyright (c) 1997-2009 IBM Corporation and others. All Rights Reserved." />
<meta name="KEYWORDS" content=
"ICU; International Components for Unicode; ICU4C; what's new; readme; read me; introduction; downloads; downloading; building; installation;" />
<meta name="DESCRIPTION" content=
@ -15,7 +15,7 @@
<style type="text/css">
/*<![CDATA[*/
h1 {border-width: 2px; border-style: solid; text-align: center; width: 100%; font-size: 200%; font-weight: bold}
h2 {margin-top: 3em; text-decoration: underline; page-break-before: always}
h2 {margin-top: 2em; text-decoration: underline; page-break-before: always}
h2.TOC {page-break-before: auto}
h3 {margin-top: 2em; text-decoration: underline}
h4 {text-decoration: underline}
@ -30,10 +30,10 @@
<body>
<h1>International Components for Unicode<br />
<abbr title="International Components for Unicode">ICU</abbr> 4.0 ReadMe</h1>
<abbr title="International Components for Unicode">ICU</abbr> 4.2 ReadMe</h1>
<p>Version: 2008 June 28th<br />
Copyright &copy; 1997-2008 International Business Machines Corporation and
<p>Version: 2009 February 19th<br />
Copyright &copy; 1997-2009 International Business Machines Corporation and
others. All Rights Reserved.</p>
<!-- Remember that there is a copyright at the end too -->
<hr />
@ -216,9 +216,40 @@
<p><!-- The following list concentrates on <em>changes that affect existing
applications migrating from previous ICU releases</em>. --> For more news about
this release, see the <a href="http://www.icu-project.org/download/">ICU 4.0
this release, see the <a href="http://www.icu-project.org/download/">ICU 4.2
download page</a>.</p>
<h3>u_strToUTF32() and u_strFromUTF32() validate input UTF strings</h3>
<p>
Before ICU 4.2, the ustring.h functions u_strToUTF32() and u_strFromUTF32()
were not fully validating their input strings.
In particular, u_strToUTF32() passed unpaired surrogates through as
surrogate code points, and u_strFromUTF32() accepted surrogate code points
and passed them through as unpaired surrogates
(which may by chance end up in a pair,
indistinguishable from a supplementary code point).
This is inconsistent with the function names,
with the use of "UTF-16" and "UTF-32" in the documentation,
and with their sibling UTF-8 functions which do validate fully.
</p>
<p>
ICU 4.2 changes the u_strToUTF32() and u_strFromUTF32() implementations
to treat malformed UTF input as an error.
The API documentation has been clarified.
</p>
<p>
Background: The implementation of these functions predates
Unicode's tightening of the UTF specifications.
We adapted the UTF-8 ustring.h functions and the ucnv_ converter functions
but not these UTF-32 ustring.h functions.
See the Unicode Standard chapter 3
<a href="http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf#G7404">section 3.9 Unicode Encoding Forms</a>
for details, in particular definitions D79 Unicode encoding form
and D80 Unicode string.
</p>
<h2><a name="Download" href="#Download" id="Download">How To Download the
Source Code</a></h2>