libexpat/testdata/largefiles
Snild Dolkow 3484383fa7 Add aaaaaa_*.xml with unreasonably large tokens
Some of these currently take a very long time to parse. I set those to
only run one loop in the run-benchmark make target.

4096 may be a fairly small buffer, and definitely make the problem worse
than it otherwise would've been, but similar sizes exist in real code:

 * 2048 bytes in cpython Modules/pyexpat.c
 * 4096 bytes in skia SkXMLParser.cpp
 * BUFSIZ bytes (8192 on my machine) in expat/examples

The files, too, are inspired by real-life examples: Android stores
depth and gain maps as base64-encoded JPEGs inside the XMP data of
other JPEGs. Sometimes as a text element, sometimes as an attribute
value. I've seen attribute values slightly over 5 MiB in size.
2024-01-29 17:09:35 +01:00
..
aaaaaa_attr.xml Add aaaaaa_*.xml with unreasonably large tokens 2024-01-29 17:09:35 +01:00
aaaaaa_cdata.xml Add aaaaaa_*.xml with unreasonably large tokens 2024-01-29 17:09:35 +01:00
aaaaaa_comment.xml Add aaaaaa_*.xml with unreasonably large tokens 2024-01-29 17:09:35 +01:00
aaaaaa_tag.xml Add aaaaaa_*.xml with unreasonably large tokens 2024-01-29 17:09:35 +01:00
aaaaaa_text.xml Add aaaaaa_*.xml with unreasonably large tokens 2024-01-29 17:09:35 +01:00
nes96.xml Fix typos and add ICPSR full name 2017-10-02 21:54:52 +02:00
ns_att_test.xml Added a file for testing the duplicate prefixed attribute check. 2003-09-05 00:40:23 +00:00
README.txt Add aaaaaa_*.xml with unreasonably large tokens 2024-01-29 17:09:35 +01:00
recset.xml Remove extraneous @ from test file (issue #120) 2017-08-31 14:05:25 +01:00
wordnet_glossary-20010201.rdf Added files for performance testing. 2003-09-04 21:31:32 +00:00

This directory contains some really large test files, mostly used to
benchmark various aspects of Expat's performance.

(As files are added, they should be described here, including what
benchmark program they're intended to be used with and what that
resulting measurements tell us.)

* nes96.xml (~2.8 MB): 
  - properties: no namespaces, mixed content, average nesting depth
  - source: http://sda.berkeley.edu:7502/ddi/nes96/
    (no indication of license or copyright there)
  - purpose: mostly for performance testing with the benchmark utility

* wordnet_glossary-20010201.xml (~14.4 MB): 
  - properties: namespaces, element content, flat 
  - source: http://www.semanticweb.org/library/wordnet/
    (license looks Open Source, see license.html file on the same page)
  - purpose: mostly for performance testing with the benchmark utility

* recset.xml (~29.1 MB): 
  - properties: small portion with namespaces, bulk without, element
    content, flat
  - source: test data donated by Karl Waclawek
  - purpose: mostly for performance testing with the benchmark utility

* ns_att_test.xml (~34.2 MB): 
  - properties: lots of prefixed attributes (28 on average), element
    content, flat
  - source: test data donated by Karl Waclawek
  - purpose: mostly for performance testing with the benchmark
    utility, specifically for testing the duplicate attribute check in
    storeAttributes()

* aaaaaa_attr.xml (~10 MB):
  - properties: trivial file with a huge attribute value
  - source: generated by a simple shell script
  - purpose: performance/regression test

* aaaaaa_cdata.xml (~10 MB):
  - properties: trivial file with huge cdata content
  - source: generated by a simple shell script
  - purpose: performance/regression test

* aaaaaa_comment.xml (~10 MB):
  - properties: trivial file with a huge comment
  - source: generated by a simple shell script
  - purpose: performance/regression test

* aaaaaa_tag.xml (~10 MB):
  - properties: trivial file with a huge tag name
  - source: generated by a simple shell script
  - purpose: performance/regression test

* aaaaaa_text.xml (~10 MB):
  - properties: trivial file with a huge text segment (no newlines)
  - source: generated by a simple shell script
  - purpose: performance/regression test