mirror of
https://github.com/libexpat/libexpat.git
synced 2025-04-04 12:54:58 +00:00
Some of these currently take a very long time to parse. I set those to only run one loop in the run-benchmark make target. 4096 may be a fairly small buffer, and definitely make the problem worse than it otherwise would've been, but similar sizes exist in real code: * 2048 bytes in cpython Modules/pyexpat.c * 4096 bytes in skia SkXMLParser.cpp * BUFSIZ bytes (8192 on my machine) in expat/examples The files, too, are inspired by real-life examples: Android stores depth and gain maps as base64-encoded JPEGs inside the XMP data of other JPEGs. Sometimes as a text element, sometimes as an attribute value. I've seen attribute values slightly over 5 MiB in size. |
||
---|---|---|
.. | ||
aaaaaa_attr.xml | ||
aaaaaa_cdata.xml | ||
aaaaaa_comment.xml | ||
aaaaaa_tag.xml | ||
aaaaaa_text.xml | ||
nes96.xml | ||
ns_att_test.xml | ||
README.txt | ||
recset.xml | ||
wordnet_glossary-20010201.rdf |
This directory contains some really large test files, mostly used to benchmark various aspects of Expat's performance. (As files are added, they should be described here, including what benchmark program they're intended to be used with and what that resulting measurements tell us.) * nes96.xml (~2.8 MB): - properties: no namespaces, mixed content, average nesting depth - source: http://sda.berkeley.edu:7502/ddi/nes96/ (no indication of license or copyright there) - purpose: mostly for performance testing with the benchmark utility * wordnet_glossary-20010201.xml (~14.4 MB): - properties: namespaces, element content, flat - source: http://www.semanticweb.org/library/wordnet/ (license looks Open Source, see license.html file on the same page) - purpose: mostly for performance testing with the benchmark utility * recset.xml (~29.1 MB): - properties: small portion with namespaces, bulk without, element content, flat - source: test data donated by Karl Waclawek - purpose: mostly for performance testing with the benchmark utility * ns_att_test.xml (~34.2 MB): - properties: lots of prefixed attributes (28 on average), element content, flat - source: test data donated by Karl Waclawek - purpose: mostly for performance testing with the benchmark utility, specifically for testing the duplicate attribute check in storeAttributes() * aaaaaa_attr.xml (~10 MB): - properties: trivial file with a huge attribute value - source: generated by a simple shell script - purpose: performance/regression test * aaaaaa_cdata.xml (~10 MB): - properties: trivial file with huge cdata content - source: generated by a simple shell script - purpose: performance/regression test * aaaaaa_comment.xml (~10 MB): - properties: trivial file with a huge comment - source: generated by a simple shell script - purpose: performance/regression test * aaaaaa_tag.xml (~10 MB): - properties: trivial file with a huge tag name - source: generated by a simple shell script - purpose: performance/regression test * aaaaaa_text.xml (~10 MB): - properties: trivial file with a huge text segment (no newlines) - source: generated by a simple shell script - purpose: performance/regression test