Commit graph

4140 commits

Author SHA1 Message Date
Sebastian Pipping
dfe043fe6a Bump version to 2.6.1 2024-02-28 23:41:31 +01:00
Sebastian Pipping
fbe7b9345b Bump version info from 10:0:9 to 10:1:9
See https://verbump.de/ for what these numbers do
2024-02-28 23:41:31 +01:00
Sebastian Pipping
3dc137ea05 Changes: Document changes in release Expat 2.6.1 2024-02-28 23:41:29 +01:00
Sebastian Pipping
ea52834709 doc/reference.html: Drop inaccurate statement about XML_* macros
The statement is falsified by these macros:
- XML_ATTR_INFO
- XML_DTD
- XML_GE
2024-02-28 20:47:45 +01:00
Sebastian Pipping
1e028f2ef7 lib/expat.h: Expose billion laughs API for XML_DTD without XML_GE
Regression from commit caa2719863 .
2024-02-28 20:47:45 +01:00
Sebastian Pipping
a387201ca4
Merge pull request #833 from libexpat/configure-ac-protect-multilib
`configure.ac`: Protect against `expat_config.h.in` defining `SIZEOF_VOID_P`
2024-02-28 00:55:34 +01:00
Sebastian Pipping
0106682ea6 configure.ac: Protect against expat_config.h.in defining SIZEOF_VOID_P 2024-02-27 00:33:53 +01:00
Sebastian Pipping
9dcb74f552
Merge pull request #829 from libexpat/hide-test-only-code-behind-new-macro
Hide test-only code behind new (internal) macro `XML_TESTING` (alternative to #826)
2024-02-26 21:41:30 +01:00
Sebastian Pipping
7e2a0da9ba lib: Hide some test-only code behind new macro XML_TESTING 2024-02-21 13:07:35 +01:00
Sebastian Pipping
a4a420eedc Autotools: Turn libexpatinternal.la into standalone library
.. so that we can now have code in say xmlparse.c that does not
end up in libexpat.so but still runs when executing the test suite.
2024-02-21 12:53:03 +01:00
Sebastian Pipping
5b940f4a65
Merge pull request #824 from libexpat/issue-821-improve-make-clean-for-configure-without-docbook
Autotools: Re-work handling of xmlwf.1 (fixes #821)
2024-02-20 20:40:41 +01:00
Sebastian Pipping
0f6b39d2f5 Autotools: Re-work handling of xmlwf.1
File "doc/xmlwf.1" should not be cleaned when building with
"./configure --without-docbook", and re-compilation of the file
should take precedence over a pre-built copy where available.

Also, variable CLEANFILES can be used to simplify things a bit
in Makefile.am.
2024-02-13 20:12:15 +01:00
Sebastian Pipping
b7e1a11011
Merge pull request #817 from SonyMobile/clockless-test
tests: Replace clock counting with counting scanned bytes
2024-02-13 18:30:35 +01:00
Snild Dolkow
dc8499f295 tests: Replace clock counting with scanned bytes in linear-time test
This removes the dependency on CLOCKS_PER_SEC that prevented this test
from running properly on some platforms, as well as the inherent
flakiness of time measurements.

Since later commits have introduced g_bytesScanned (and before that,
g_parseAttempts), we can use that value as a proxy for parse time
instead of clock().
2024-02-13 14:05:44 +01:00
Snild Dolkow
fe0177cd3f tests: Replace g_parseAttempts with g_bytesScanned
This was used to estimate the number of scanned bytes. Just exposing
that number directly will be more precise.
2024-02-13 13:57:35 +01:00
Sebastian Pipping
4ff4c544aa
Merge pull request #820 from libexpat/dependabot/github_actions/actions/upload-artifact-4.3.1
Actions(deps): Bump actions/upload-artifact from 4.3.0 to 4.3.1
2024-02-12 14:52:18 +01:00
dependabot[bot]
aed1ed769d
Actions(deps): Bump actions/upload-artifact from 4.3.0 to 4.3.1
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.0 to 4.3.1.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](26f96dfa69...5d5d22a312)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-12 12:09:58 +00:00
Sebastian Pipping
226201d10d
Merge pull request #819 from th1722/patch-1
Fix compiler warnings
2024-02-11 16:45:16 +01:00
Taichi Haradaguchi
3f60a47cb5 Fix compiler warnings
> In file included from ./../lib/internal.h:149,
>                  from codepage.c:38:
> ./../lib/expat.h:1045:5: warning: "XML_GE" is not defined, evaluates to 0 [-Wundef]
>  1045 | #if XML_GE == 1
>       |     ^~~~~~
> ./../lib/internal.h:158:5: warning: "XML_GE" is not defined, evaluates to 0 [-Wundef]
>   158 | #if XML_GE == 1
>       |     ^~~~~~
2024-02-10 23:08:03 +09:00
Sebastian Pipping
4033d6dc57
Merge pull request #818 from libexpat/fix-clang-format-ci
Get clang-format CI back in sync
2024-02-08 17:24:26 +01:00
clang-format 18.1.0
d4f958e345 Mass-apply clang-format 18.1.0 2024-02-08 15:21:53 +01:00
Sebastian Pipping
849da3e3fe
Merge pull request #776 from libexpat/issue-775-prepare-release
Prepare release 2.6.0 (part of #775, ETA is 2024-02-07)
2024-02-06 17:49:41 +01:00
Sebastian Pipping
2a10e173ab Sync file headers 2024-02-06 14:13:00 +01:00
Sebastian Pipping
92f10eb800 .mailmap: Add Joyce Brum and Owain Davies 2024-02-06 14:08:05 +01:00
Sebastian Pipping
b5ae2481b0 Set release date for 2.6.0 2024-02-06 14:08:05 +01:00
Sebastian Pipping
310a1977f4 Bump version to 2.6.0 2024-02-06 14:08:05 +01:00
Sebastian Pipping
b9fd465231 Bump version info from 9:10:8 to 10:0:9
See https://verbump.de/ for what these numbers do
2024-02-06 14:08:05 +01:00
Sebastian Pipping
ae06168b64 Changes: Document changes in release Expat 2.6.0 2024-02-06 14:08:05 +01:00
Sebastian Pipping
8198e4bfed
Merge pull request #815 from libexpat/fix-pkg-config-file-for-static-build-on-windows
pkg-config: Add missing `-DXML_STATIC` for Windows (alternative to #805)
2024-02-06 11:09:42 +01:00
Sebastian Pipping
9c16d1c5b4 pkg-config: Add missing -DXML_STATIC (for Windows)
This affects the output of command "pkg-config --cflags --static expat".
2024-02-06 00:17:30 +01:00
Sebastian Pipping
9944b71234
Merge pull request #813 from libexpat/issue-812-protect-against-closing-entities-out-of-order
Protect against closing entities out of order (fixes #812)
2024-02-06 00:16:23 +01:00
Sebastian Pipping
b6243248a9
Merge pull request #814 from libexpat/fix-make-check-for-arm64-freebsd
tests: Fix `CLOCKS_PER_SEC` guard for arm64 FreeBSD reality
2024-02-06 00:00:00 +01:00
Sebastian Pipping
aba268e2c0 tests/basic_tests.c: Fix CLOCKS_PER_SEC guard for arm64 FreeBSD reality
CLOCKS_PER_SEC turned out to be as small as 128 in practice
on machine cfarm240.cfarm.net .
2024-02-02 18:11:12 +01:00
Sebastian Pipping
127aa340d3
Merge pull request #809 from libexpat/clang-format-18
CI: Upgrade to clang-format 18
2024-01-31 01:49:59 +01:00
Sebastian Pipping
7352d3035b clang-*.yml: Fix accidental trailing whitespace 2024-01-30 22:58:48 +01:00
Sebastian Pipping
37d0184781 clang-format.yml: Bump to clang-format 18 2024-01-30 22:57:10 +01:00
clang-format 18.1.0
137a578087 Mass-apply clang-format 18.1.0 2024-01-30 22:57:09 +01:00
Sebastian Pipping
c594eedfa8 apply-clang-format.sh: Drop workaround for lib/siphash.h
Does not seem needed anymore (or running the script would
produce a diff).
2024-01-30 22:57:09 +01:00
Sebastian Pipping
5d2a438af2 apply-clang-format.sh: Use "git ls-files" rather than "find"
.. and reduce difference with sibling script apply-clang-tidy.sh .
2024-01-30 22:57:09 +01:00
Sebastian Pipping
34b598c5f5
Merge pull request #789 from SonyMobile/partial-token-perf
Speed up parsing of big tokens
2024-01-30 22:54:37 +01:00
Sebastian Pipping
bc7490a4a7 tests/misc_tests.c: Add regression test for closing entities out of order 2024-01-30 03:39:46 +01:00
Sebastian Pipping
c4208e7fd1 lib/xmlparse.c: Protect against closing entities out of order 2024-01-30 02:40:31 +01:00
Sebastian Pipping
d5b02e96ab xmlwf: Document argument "-q"
Rebased-and-adapted-by: Snild Dolkow <snild@sony.com>
2024-01-29 19:59:18 +01:00
Sebastian Pipping
09fdf998e7 xmlwf: Support disabling reparse deferral
Rebased-and-adapted-by: Snild Dolkow <snild@sony.com>
2024-01-29 19:59:18 +01:00
Snild Dolkow
8f8aaf5c8e tests: Check heuristic bypass with varying buffer fill sizes
The bypass works on the assumption that the application uses a
consistent fill size. Let's make some assertions about what should
happen when the application doesn't do that -- most importantly,
that parsing does happen eventually, and that the number of scanned
bytes doesn't explode.
2024-01-29 19:59:18 +01:00
Snild Dolkow
182bbc350e tests: Make it clear to clang-tidy that assert_true may not return
The key is to have __attribute__((noreturn)) somewhere that clang-tidy
can see it. In this case, this is the _fail() function, which is
conditionally called from the assert_true() macro.

This will ensure that clang-tidy doesn't complain about NULL values
that we've asserted against in tests.
2024-01-29 19:57:54 +01:00
Sebastian Pipping
2becc8a81d
Merge pull request #811 from libexpat/dependabot/github_actions/actions/upload-artifact-4.3.0
Actions(deps): Bump actions/upload-artifact from 4.2.0 to 4.3.0
2024-01-29 17:58:59 +01:00
Snild Dolkow
3d8141d26a Bypass partial token heuristic when nearing full buffer
...instead of only when approaching the maximum buffer size INT/2+1.

We'd like to give applications a chance to finish parsing a large token
before buffer reallocation, in case the reallocation fails.

By bypassing the reparse deferral heuristic when getting close to the
filling the buffer, we give them this chance -- if the whole token is
present in the buffer, it will be parsed at that time.

This may come at the cost of some extra reparse attempts. For a token
of n bytes, these extra parses cause us to scan over a maximum of
2n bytes (... + n/8 + n/4 + n/2 + n). Therefore, parsing of big tokens
remains O(n) in regard how many bytes we scan in attempts to parse. The
cost in reality is lower than that, since the reparses that happen due
to the bypass will affect m_partialTokenBytesBefore, delaying the next
ratio-based reparse. Furthermore, only the first token that "breaks
through" a buffer ceiling takes that extra reparse attempt; subsequent
large tokens will only bypass the heuristic if they manage to hit the
new buffer ceiling.

Note that this cost analysis depends on the assumption that Expat grows
its buffer by doubling it (or, more generally, grows it exponentially).
If this changes, the cost of this bypass may increase. Hopefully, this
would be caught by test_big_tokens_take_linear_time or the new test.

The bypass logic assumes that the application uses a consistent fill.
If the app increases its fill size, it may miss the bypass (and the
normal heuristic will apply). If the app decreases its fill size, the
bypass may be hit multiple times for the same buffer size. The very
worst case would be to always fill half of the remaining buffer space,
in which case parsing of a large n-byte token becomes O(n log n).

As an added bonus, the new test case should be faster than the old one,
since it doesn't have to go all the way to 1GiB to check the behavior.

Finally, this change necessitated a small modification to two existing
tests related to reparse deferral. These tests are testing the deferral
enabled setting, and assume that reparsing will not happen for any other
reason. By pre-growing the buffer, we make sure that this new deferral
does not affect those test cases.
2024-01-29 17:09:36 +01:00
Snild Dolkow
60b7420989 Bypass partial token heuristic when close to maximum buffer size
For huge tokens, we may end up in a situation where the partial token
parse deferral heuristic demands more bytes than Expat's maximum buffer
size (currently ~half of INT_MAX) could fit.

INT_MAX/2 is 1024 MiB on most systems. Clearly, a token of 950 MiB could
fit in that buffer, but the reparse threshold might be such that
callProcessor() will defer it, allowing the app to keep filling the
buffer until XML_GetBuffer() eventually returns a memory error.

By bypassing the heuristic when we're getting close to the maximum
buffer size, it will once again be possible to parse tokens in the size
range INT_MAX/2/ratio < size < INT_MAX/2 reliably.

We subtract the last buffer fill size as a way to detect that the next
XML_GetBuffer() call has a risk of returning a memory error -- assuming
that the application is likely to keep using the same (or smaller) fill.

We subtract XML_CONTEXT_BYTES because that's the maximum amount of bytes
that could remain at the start of the buffer, preceding the partial
token. Technically, it could be fewer bytes, but XML_CONTEXT_BYTES is
normally small relative to INT_MAX, and is much simpler to use.

Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
2024-01-29 17:09:36 +01:00
Snild Dolkow
ad9c01be8e Make external entity parser inherit partial token heuristic setting
The test is essentially a copy of the existing test for the setter,
adapted to run on the external parser instead of the original one.

Suggested-by: Sebastian Pipping <sebastian@pipping.org>
CI-fighting-assistance-by: Sebastian Pipping <sebastian@pipping.org>
2024-01-29 17:09:36 +01:00