diff --git a/.gitattributes b/.gitattributes index 27993ee855e..a1ca97c2592 100644 --- a/.gitattributes +++ b/.gitattributes @@ -17,6 +17,7 @@ icu4c/docs/nameConv.html svneol=native#text/html icu4c/docs/number.html svneol=native#text/html icu4c/docs/supp_loc.html svneol=native#text/html icu4c/docs/tzClasses.html svneol=native#text/html +icu4c/docs/udata.html svneol=native#text/html icu4c/docs/utilCL.html svneol=native#text/html icu4c/license.html svneol=native#text/html icu4c/readme.html svneol=native#text/html diff --git a/icu4c/docs/udata.html b/icu4c/docs/udata.html new file mode 100644 index 00000000000..09f49e136fa --- /dev/null +++ b/icu4c/docs/udata.html @@ -0,0 +1,153 @@ + + + +ICU - Formats and API for Binary Data Files + + + + +

ICU - Formats and API for Binary Data Files

+ +

This is a raw draft.

+ +

Finding ICU data

+ +

ICU data, when stored in files, is loaded from the file system +directory that is returned by u_getDataDirectory(). +That directory is determined sequentially by +

+ +

When ICU data is loaded using the udata API functions, then +there is a defined sequence of file locations and entry point names that are +used to locate the data. See the description in icu/source/common/udata.h for +details. Note that the exact data finding depends on the implementation +of this API and may differ by platform and by build configuration. +See also icu/source/common/udata.c for implementation details.

+ + +

Binary Data File Formats

+ +

Data files for ICU and for applications loading their data with ICU, +should have a memory-mappable format. This means that the data should be +layed out in the file in an immediately useful way, so that the code that uses +the data does not need to parse it or copy it to allocated memory and +build additional structures (like Hashtables). +Here are some points to consider:

+ + + + +

Platform-dependency of Binary Data Files

+ +

Data files with formats as described above should be portable among +machines with the same set of relevant properties:

+ + + +

All of these properties can be verified by checking the +UDataInfo structure of the data, which is done +best in a UDataMemoryIsAcceptable() function passed into +the udata_openChoice() API function.

+ +

If a data file is loaded on a machine with different relevant properties +than the machine where the data file was generated, then the using +code could adapt by detecting the differences and reformatting the +data on the fly or in a copy in memory. +This would improve portability of the data files but significantly +decrease performance.

+ +

"Relevant" properties are those that affect the portability of the +data in the particular file.

+ +

For example, a flat (memory-mapped) binary data file +that contains 16-bit and 32-bit integers and is +created for a typical, big-endian Unix machine, can be used +on an OS/390 system or any other big-endian machine.
+If the file also contains char[] strings, +then it can be easily shared among all big-endian and +ASCII-based machines, but not with (e.g.) an OS/390.
+OS/390 and OS/400 systems, however, could easily share such +a data file.

+ +

To make sure that the relevant platform properties of +the data file and the loading machine match, the +udata_openChoice() API function should be used with a +UDataMemoryIsAcceptable() function that checks for +these properties.

+ +

Some data file loading mechanisms prevent using data files generated on +a different platform to begin with, especially data files packaged as DLLs +(shared libraries).

+ + +

Writing a binary data file

+ +

... Use icu/source/tools/toolutil/unewdata.h|.c to write data files, +can include a copyright statement or other comment...

+ + + +