libexpat/testdata/largefiles/README.txt
Snild Dolkow 3484383fa7 Add aaaaaa_*.xml with unreasonably large tokens
Some of these currently take a very long time to parse. I set those to
only run one loop in the run-benchmark make target.

4096 may be a fairly small buffer, and definitely make the problem worse
than it otherwise would've been, but similar sizes exist in real code:

 * 2048 bytes in cpython Modules/pyexpat.c
 * 4096 bytes in skia SkXMLParser.cpp
 * BUFSIZ bytes (8192 on my machine) in expat/examples

The files, too, are inspired by real-life examples: Android stores
depth and gain maps as base64-encoded JPEGs inside the XMP data of
other JPEGs. Sometimes as a text element, sometimes as an attribute
value. I've seen attribute values slightly over 5 MiB in size.
2024-01-29 17:09:35 +01:00

57 lines
2.2 KiB
Text

This directory contains some really large test files, mostly used to
benchmark various aspects of Expat's performance.
(As files are added, they should be described here, including what
benchmark program they're intended to be used with and what that
resulting measurements tell us.)
* nes96.xml (~2.8 MB):
- properties: no namespaces, mixed content, average nesting depth
- source: http://sda.berkeley.edu:7502/ddi/nes96/
(no indication of license or copyright there)
- purpose: mostly for performance testing with the benchmark utility
* wordnet_glossary-20010201.xml (~14.4 MB):
- properties: namespaces, element content, flat
- source: http://www.semanticweb.org/library/wordnet/
(license looks Open Source, see license.html file on the same page)
- purpose: mostly for performance testing with the benchmark utility
* recset.xml (~29.1 MB):
- properties: small portion with namespaces, bulk without, element
content, flat
- source: test data donated by Karl Waclawek
- purpose: mostly for performance testing with the benchmark utility
* ns_att_test.xml (~34.2 MB):
- properties: lots of prefixed attributes (28 on average), element
content, flat
- source: test data donated by Karl Waclawek
- purpose: mostly for performance testing with the benchmark
utility, specifically for testing the duplicate attribute check in
storeAttributes()
* aaaaaa_attr.xml (~10 MB):
- properties: trivial file with a huge attribute value
- source: generated by a simple shell script
- purpose: performance/regression test
* aaaaaa_cdata.xml (~10 MB):
- properties: trivial file with huge cdata content
- source: generated by a simple shell script
- purpose: performance/regression test
* aaaaaa_comment.xml (~10 MB):
- properties: trivial file with a huge comment
- source: generated by a simple shell script
- purpose: performance/regression test
* aaaaaa_tag.xml (~10 MB):
- properties: trivial file with a huge tag name
- source: generated by a simple shell script
- purpose: performance/regression test
* aaaaaa_text.xml (~10 MB):
- properties: trivial file with a huge text segment (no newlines)
- source: generated by a simple shell script
- purpose: performance/regression test