libexpat

mirror of https://github.com/libexpat/libexpat.git synced 2025-04-11 07:20:36 +00:00

Author	SHA1	Message	Date
Snild Dolkow	09957b8ced	Allow XML_GetBuffer() with len=0 on a fresh parser len=0 was previously OK if there had previously been a non-zero call. It makes sense to allow an application to work the same way on a newly-created parser, and not have to care if its incoming buffer happens to be 0.	2024-01-29 17:09:36 +01:00
Snild Dolkow	f1eea784d0	tests: Add max_slowdown info in test_big_tokens_take_linear_time Suggested-by: Sebastian Pipping <sebastian@pipping.org>	2024-01-29 17:09:36 +01:00
Snild Dolkow	9fe3672459	tests: Run both with and without partial token heuristic If we always run with the heuristic enabled, it may hide some bugs by grouping up input into bigger parse attempts. CI-fighting-assistance-by: Sebastian Pipping <sebastian@pipping.org>	2024-01-29 17:09:36 +01:00
Snild Dolkow	1b9d398517	Don't update partial token heuristic on error Suggested-by: Sebastian Pipping <sebastian@pipping.org>	2024-01-29 17:09:35 +01:00
Snild Dolkow	9cdf9b8d77	Skip parsing after repeated partials on the same token When the parse buffer contains the starting bytes of a token but not all of them, we cannot parse the token to completion. We call this a partial token. When this happens, the parse position is reset to the start of the token, and the parse() call returns. The client is then expected to provide more data and call parse() again. In extreme cases, this means that the bytes of a token may be parsed many times: once for every buffer refill required before the full token is present in the buffer. Math: Assume there's a token of T bytes Assume the client fills the buffer in chunks of X bytes We'll try to parse X, 2X, 3X, 4X ... until mX == T (technically >=) That's (m²+m)X/2 = (T²/X+T)/2 bytes parsed (arithmetic progression) While it is alleviated by larger refills, this amounts to O(T²) Expat grows its internal buffer by doubling it when necessary, but has no way to inform the client about how much space is available. Instead, we add a heuristic that skips parsing when we've repeatedly stopped on an incomplete token. Specifically: * Only try to parse if we have a certain amount of data buffered * Every time we stop on an incomplete token, double the threshold * As soon as any token completes, the threshold is reset This means that when we get stuck on an incomplete token, the threshold grows exponentially, effectively making the client perform larger buffer fills, limiting how many times we can end up re-parsing the same bytes. Math: Assume there's a token of T bytes Assume the client fills the buffer in chunks of X bytes We'll try to parse X, 2X, 4X, 8X ... until (2^k)X == T (or larger) That's (2^(k+1)-1)X bytes parsed -- e.g. 15X if T = 8X This is equal to 2T-X, which amounts to O(T) We could've chosen a faster growth rate, e.g. 4 or 8. Those seem to increase performance further, at the cost of further increasing the risk of growing the buffer more than necessary. This can easily be adjusted in the future, if desired. This is all completely transparent to the client, except for: 1. possible delay of some callbacks (when our heuristic overshoots) 2. apps that never do isFinal=XML_TRUE could miss data at the end For the affected testdata, this change shows a 100-400x speedup. The recset.xml benchmark shows no clear change either way. Before: benchmark -n ../testdata/largefiles/recset.xml 65535 3 3 loops, with buffer size 65535. Average time per loop: 0.270223 benchmark -n ../testdata/largefiles/aaaaaa_attr.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 15.033048 benchmark -n ../testdata/largefiles/aaaaaa_cdata.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.018027 benchmark -n ../testdata/largefiles/aaaaaa_comment.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 11.775362 benchmark -n ../testdata/largefiles/aaaaaa_tag.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 11.711414 benchmark -n ../testdata/largefiles/aaaaaa_text.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.019362 After: ./run.sh benchmark -n ../testdata/largefiles/recset.xml 65535 3 3 loops, with buffer size 65535. Average time per loop: 0.269030 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_attr.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.044794 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_cdata.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.016377 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_comment.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.027022 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_tag.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.099360 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_text.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.017956	2024-01-29 17:09:35 +01:00
Snild Dolkow	60dffa148c	tests: Use normal XML_Parse in test_suspend_resume_internal_entity When the parser is suspended, _XML_Parse_SINGLE_BYTES() will return early. At that point, there could be some amount of bytes that haven't been fed into Expat at all yet. This leaves us with an incomplete document. Furthermore, the last internal XML_Parse() call with isFinal=XML_TRUE will not have happened, so the parser will not know that no more input is to be expected. This is what allowed the test to pass when it was originally changed to use SINGLE_BYTES. With the new partial token heuristic, the lack of a final parse call means that we don't even reach the "Ho" text, and fail the test. The simplest solution is to go back to using XML_Parse() in this test. Another option would be to let SINGLE_BYTES expose how far it got in its loop, allowing for later continuation, but it doesn't seem worth the extra complexity.	2024-01-29 17:09:35 +01:00
Snild Dolkow	3484383fa7	Add aaaaaa_.xml with unreasonably large tokens Some of these currently take a very long time to parse. I set those to only run one loop in the run-benchmark make target. 4096 may be a fairly small buffer, and definitely make the problem worse than it otherwise would've been, but similar sizes exist in real code: 2048 bytes in cpython Modules/pyexpat.c * 4096 bytes in skia SkXMLParser.cpp * BUFSIZ bytes (8192 on my machine) in expat/examples The files, too, are inspired by real-life examples: Android stores depth and gain maps as base64-encoded JPEGs inside the XMP data of other JPEGs. Sometimes as a text element, sometimes as an attribute value. I've seen attribute values slightly over 5 MiB in size.	2024-01-29 17:09:35 +01:00
Sebastian Pipping	183270d565	Merge pull request #810 from libexpat/clang-18 CI: Upgrade to Clang 18 (except clang-tidy and clang-format)	2024-01-26 19:10:31 +01:00
Sebastian Pipping	f7ada131b7	Merge pull request #808 from libexpat/clang-tidy-18 CI: Upgrade to clang-tidy 18	2024-01-26 18:30:17 +01:00
Sebastian Pipping	6880fe4948	CI: Upgrade to Clang 18 (except clang-tidy and clang-format)	2024-01-26 16:20:04 +01:00
Sebastian Pipping	fc0b026ce5	clang-format.yml: De-couple clang-format from Clang .. so that we can bump their versions independently	2024-01-26 16:19:59 +01:00
Sebastian Pipping	7acda8d16a	clang-tidy.yml: Upgrade to clang-tidy 18	2024-01-26 16:19:02 +01:00
Sebastian Pipping	737e8ea183	tests/misc_tests.c: Address clang-tidy 18 warning EnumCastOutOfRange clang-tidy output was: > [..]/libexpat/expat/tests/misc_tests.c:112:23: note: The value '-1' provided to the cast expression is not in the valid range of values for 'XML_Error' > 112 \| if (XML_ErrorString((enum XML_Error) - 1) != NULL) > \| ^~~~~~~~~~~~~~~~~~~~ > [..]/libexpat/expat/tests/misc_tests.c:114:23: error: The value '100' provided to the cast expression is not in the valid range of values for 'XML_Error' [clang-analyzer-optin.core.EnumCastOutOfRange,-warnings-as-errors] > 114 \| if (XML_ErrorString((enum XML_Error)100) != NULL) > \| ^~~~~~~~~~~~~~~~~~~	2024-01-26 16:19:02 +01:00
Sebastian Pipping	abd9542b32	Merge pull request #806 from libexpat/dependabot/github_actions/actions/upload-artifact-4.2.0 Actions(deps): Bump actions/upload-artifact from 4.0.0 to 4.2.0	2024-01-22 14:52:20 +01:00
dependabot[bot]	2c37fc7d7d	Actions(deps): Bump actions/upload-artifact from 4.0.0 to 4.2.0 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.0.0 to 4.2.0. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v4...694cdabd8bdb0f10b2cea11669e1bf5453eed0a6) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2024-01-22 12:09:43 +00:00
Sebastian Pipping	86a3623a9a	Merge pull request #801 from catenacyber/fuzzcovstop fuzz: Improve coverage by maybe stopping the parser	2024-01-17 17:36:43 +01:00
Sebastian Pipping	5b70d3ac44	fuzz/xml_parsebuffer_fuzzer.c: Be more robust towards ouf-of-memory	2024-01-17 10:08:42 +01:00
Philippe Antoine	34af886238	fuzz: improve coverage by maybe stopping parser	2024-01-16 11:08:44 +01:00
Sebastian Pipping	2640b1d97c	Merge pull request #799 from libexpat/ci-fuzzing Make CI run fuzzing regression tests (fixes #367)	2024-01-16 02:26:43 +01:00
Sebastian Pipping	c47e191797	Merge pull request #803 from libexpat/fix-cppcheck-ci Fix Cppcheck CI for Cppcheck 2.13.0	2024-01-16 01:14:12 +01:00
Sebastian Pipping	24ffba44bd	Make CI run fuzzing regression tests	2024-01-15 23:57:02 +01:00
Sebastian Pipping	73ebe0bfb3	fuzz: Address warning -Wunused-function with regard to sip24_valid	2024-01-15 23:57:02 +01:00
Sebastian Pipping	ed38687779	mass-cppcheck.sh: Fix for Cppcheck 2.13.0 Cppcheck output was: > expat/lib/xmlparse.c:67:4: error: #error XML_GE (for general entities) must be defined, [..] > # error XML_GE (for general entities) must be defined, [..] > ^	2024-01-15 23:29:19 +01:00
Sebastian Pipping	3ff1d00dc2	cppcheck.yml: Bump to macOS 12 Homebrew output was: > Warning: You are using macOS 11. > We (and Apple) do not provide support for this old version. > [..]	2024-01-15 23:29:19 +01:00
Sebastian Pipping	9e603b35e0	Merge pull request #802 from libexpat/dependabot/github_actions/actions/upload-artifact-4.1.0 Actions(deps): Bump actions/upload-artifact from 4.0.0 to 4.1.0	2024-01-15 17:06:21 +01:00
dependabot[bot]	2d9bc9aec6	Actions(deps): Bump actions/upload-artifact from 4.0.0 to 4.1.0 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.0.0 to 4.1.0. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](`c7d193f32e...1eb3cb2b3e`) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2024-01-15 12:39:38 +00:00
Sebastian Pipping	19af57f2dd	Merge pull request #800 from libexpat/clang-tidy-more clang-tidy: Address warnings `readability-avoid-const-params-in-decls` and `readability-named-parameter`	2024-01-13 01:59:22 +01:00
Sebastian Pipping	226a1527cf	clang-tidy: Address warning readability-named-parameter	2024-01-12 23:27:19 +01:00
Sebastian Pipping	225ebd45e1	clang-tidy: Address warning readability-avoid-const-params-in-decls clang-tidy output was: > [..]/tests/handlers.h:502:64: error: parameter 'index' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls,-warnings-as-errors] > 502 \| _handler_record_get(const struct handler_record_list storage, const int index, > \| ^~~~~ > [..]/tests/handlers.h:503:39: error: parameter 'line' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls,-warnings-as-errors] > 503 \| const char file, const int line); > \| ^~~~~	2024-01-12 22:19:05 +01:00
Sebastian Pipping	7664ecdbae	Merge pull request #798 from libexpat/clang-tidy Make GitHub Actions enforce clang-tidy clean code + address current clang-tidy warnings	2024-01-12 21:38:35 +01:00
Sebastian Pipping	f832f7b981	Make GitHub Actions enforce clang-tidy clean code	2024-01-12 17:26:50 +01:00
Sebastian Pipping	10cded2493	tests/basic_tests.c: Address clang-tidy warning clang-analyzer-core.NullDereference clang-tidy output was: > [..]/tests/basic_tests.c:2083:19: warning: Dereference of null pointer [clang-analyzer-core.NullDereference] > 2083 \| errorFlags \|= ((model[0].type == XML_CTYPE_SEQ) ? 0 : (1u << 2)); > \| ^~~~~~~~~~~~~	2024-01-12 17:25:27 +01:00
Sebastian Pipping	e23c300f25	tests/acc_tests.c: Address clang-tidy warning clang-analyzer-core.NonNullParamChecker clang-tidy output was: > [..]/tests/acc_tests.c:368:9: warning: Null pointer passed to 1st parameter expecting 'nonnull' [clang-analyzer-core.NonNullParamChecker] > 368 \| if (strlen(printable) < (size_t)1) > \| ^ ~~~~~~~~~ Note: It was harmless because fail(..) right before catches that case.	2024-01-12 17:25:27 +01:00
Sebastian Pipping	0b424cb9ae	examples/element_declarations.c: Simplify first call to stackPushMalloc .. where stackTop is NULL anyway	2024-01-12 17:25:27 +01:00
Sebastian Pipping	0ebca2b10f	examples/element_declarations.c: Fix memleak in dumpContentModel on OOM clang-tidy output was: > [..]/examples/element_declarations.c:163:16: warning: Potential leak of memory pointed to by 'stackTop' [clang-analyzer-unix.Malloc] > 163 \| return false; > \| ^	2024-01-12 04:46:47 +01:00
Sebastian Pipping	716fd10bd4	Merge pull request #797 from catenacyber/fuzzcov fuzz: improve coverage	2024-01-10 23:07:00 +01:00
Philippe Antoine	bb58abd4e0	fuzz: improve coverage	2024-01-10 22:06:37 +01:00
Sebastian Pipping	be47f6d5e8	Merge pull request #796 from libexpat/ci-control-flow-integrity Make CI cover Clang's Control Flow Integrity sanitizer	2023-12-19 18:39:44 +01:00
Sebastian Pipping	64912b70fb	Merge pull request #795 from libexpat/autotools-install-shipped-xmlwf-manpage Autotools: Make installation of shipped `doc/xmlwf.1` independent of docbook2man availability	2023-12-19 18:38:44 +01:00
Sebastian Pipping	18b44c980e	linux.yml: Cover Clang's Control Flow Integrity sanitizer	2023-12-19 01:31:10 +01:00
Sebastian Pipping	9495cefd94	qa.sh: Fix dropping of QA_SANITIZER	2023-12-19 01:31:10 +01:00
Sebastian Pipping	4b878938bb	qa.sh: Support Clang's Control Flow Integrity sanitizer https://clang.llvm.org/docs/ControlFlowIntegrity.html	2023-12-19 01:31:10 +01:00
Sebastian Pipping	7384c88f9a	configure.ac: Make installation of shipped doc/xmlwf.1 independent of docbook2man availability	2023-12-18 23:59:25 +01:00
Sebastian Pipping	822d1706b2	Merge pull request #794 from libexpat/dependabot/github_actions/actions/upload-artifact-4.0.0 Actions(deps): Bump actions/upload-artifact from 3.1.3 to 4.0.0	2023-12-18 17:48:34 +01:00
dependabot[bot]	8c87ca470d	Actions(deps): Bump actions/upload-artifact from 3.1.3 to 4.0.0 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3.1.3 to 4.0.0. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](`a8a3f3ad30...c7d193f32e`) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>	2023-12-18 12:12:08 +00:00
Sebastian Pipping	b9fcca0aaa	Merge pull request #793 from libexpat/fix-bug-report-target CMake\|Autotools: Fix `PACKAGE_BUGREPORT` variable to something working	2023-12-17 23:09:46 +01:00
Sebastian Pipping	5a3c419e6a	CMake\|Autotools: Fix PACKAGE_BUGREPORT variable to something working	2023-12-17 03:34:27 +01:00
Sebastian Pipping	85ee77d31f	Merge pull request #792 from libexpat/autotools-sync-cmake-files autotools: Sync CMake templates with CMake 3.26	2023-12-16 16:50:45 +01:00
Sebastian Pipping	141cdab714	autotools: Sync CMake templates with CMake 3.26	2023-12-15 05:02:23 +01:00
Sebastian Pipping	fb702e6c0e	Merge pull request #790 from libexpat/cmake-build-benchmark-also CMake: Build `tests/benchmark/benchmark.c` for `EXPAT_BUILD_TESTS`	2023-11-22 13:04:23 +01:00

1 2 3 4 5 ...

4087 commits