Symptom was this compile warning from MinGW GCC 12:
> [..]/expat/tests/memcheck.c: In function ‘tracking_realloc’:
> [..]/expat/tests/memcheck.c:169:25: error: pointer ‘ptr’ may be used after ‘realloc’ [-Werror=use-after-free]
> 169 | entry->allocation = ptr;
> | ~~~~~~~~~~~~~~~~~~^~~~~
> [..]/expat/tests/memcheck.c:166:25: note: call to ‘realloc’ here
> 166 | entry->allocation = realloc(ptr, size);
> | ^~~~~~~~~~~~~~~~~~
The warning is a false positive since the code was only using ptr
when realloc failed where it is documented that the original pointer is
not freed.
The workaround is to no longer override-and-restore entry->allocation
but to only write to it when realloc was successful. The original
value was equal to ptr so the result is the same.
- Start treating -DXML_CONTEXT_BYTES=0 as "no context"
rather than "context of size 0". Was documented as
"must be set to a positive integer", previously.
- Enforce that macro XML_CONTEXT_BYTES is defined at build time to
avoid accidental misbuilds lacking context in environments that
bypass both of Expats official build systems.
- Detect and reject use of negative context size at compile time.
...in addition to 1-to-5-byte chunks as we've done so far.
By starting g_chunkSize at 0, we get to run all the tests that call
_XML_Parse_SINGLE_BYTES() as if they just called XML_Parse(). This
gives us extra test coverage.
%pe2; would ultimately expand to a plain "ABCDEF...", which is not
valid in this context. This was not normally hit, since the test would
get its expected XML_ERROR_NO_MEMORY before expanding this far.
With g_chunkSize=0 and EXPAT_CONTEXT_BYTES=OFF, the number of allocs
required to reach that point becomes *just* low enough to reach the
final expansion, making the test fail with a very unexpected syntax
error.
Nesting %pe2; in another entity declaration avoids the problem.
When test_repeated_stop_parser_between_char_data_calls runs without
chunking the input -- which I am about to do in my next commit -- the
parser_stop_character_handler callback happens multiple times. This is
because stopping the parser doesn't stop all callbacks immediately,
which is valid (documented) behavior.
The second callback tried to stop the parser again, getting unexpected
errors. Let's check the parser status on entry and return early if it's
already finished.
Before a parse call with isFinal=XML_TRUE, there is no guarantee that
all supplied data has been parsed. Removing the first comment count
check removes the test's assumption of such a guarantee.
The _XML_Parse_SINGLE_BYTES function currently calls XML_Parse() one
byte at a time. This is useful to detect possible parsing bugs related
to having to exit parsing, wait for more data, and resume.
This commit makes SINGLE_BYTES even more useful by repeating all tests,
changing the chunk size every time. So instead of just one byte at a
time, we now also test two bytes at a time, and so on. Tests that don't
use the SINGLE_BYTES also run multiple times, but are otherwise not
affected.
This uncovered some issues, which have been fixed in preceding commits.
On failure, the chunk size is included in the "FAIL" log prints.
...instead of a full-string match.
These tests were depending on getting handler callbacks with exactly
one character of data at a time. For example, if test_abort_epilog got
"\n\r\n" in one callback, it would fail to match on the '\r', and would
not abort parsing as expected.
By searching the callback arg for the magic character rather than
expecting a full match, the test no longer depends on exact callback
timing.
`userData` is never NULL in these tests, so that check was left out of
the new version.
Instead of testing the exact number and sequence of callbacks, we now
test that we get the exact data lengths and sequence of callbacks. The
checks become much more verbose, but will now accept any buffer fill
strategy -- single bytes, multiple bytes, or any combination thereof.
The byte order mark is not correctly consumed when followed by an
incomplete token in a non-final parse. This results in the BOM staying
in the buffer, causing an invalid token error later.
This was not detected by existing tests because they either parse
everything in one call, or add a single byte at a time.
By moving `s` forward when we find a BOM, we make sure that the BOM
bytes are properly consumed in all cases.