Commit graph

902 commits

Author SHA1 Message Date
Sebastian Pipping
fe04a7f0ff lib/xmlparse.c: Address clang-tidy warning misc-no-recursion 2025-03-30 20:54:46 +02:00
Sebastian Pipping
004e55a7e1 lib/xmlparse.c: Make function getRootParserOf available to XML_GE != 1 2025-03-30 20:54:46 +02:00
Sebastian Pipping
25ec4f1b29 lib/xmlparse.c: Address clang-tidy warning bugprone-narrowing-conversions
The symptom was:
> [..]/expat/lib/xmlparse.c:826:9: error: narrowing conversion from 'ssize_t' (aka 'long') to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
>   826 |         getrandom(currentTarget, bytesToWrite, getrandomFlags);
>       |         ^
> [..]/expat/lib/xmlparse.c:2765:19: error: narrowing conversion from 'unsigned long' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
>  2765 |     int nameLen = sizeof(XML_Char) * (tag->name.strLen + 1);
>       |                   ^
> [..]/expat/lib/xmlparse.c:3734:16: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
>  3734 |       for (j = nsAttsSize; j != 0;)
>       |                ^
> [..]/expat/lib/xmlparse.c:3800:15: error: narrowing conversion from 'unsigned long' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
>  3800 |           j = uriHash & mask; /* index into hash table */
>       |               ^
> [..]/expat/lib/xmlparse.c:3814:30: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
>  3814 |             j < step ? (j += nsAttsSize - step) : (j -= step);
>       |                              ^
> [..]/expat/lib/xmlparse.c:6309:13: error: narrowing conversion from 'int' to signed type 'char' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
>  6309 |             parser->m_prologState.documentEntity &&
>       |             ^
> [..]/expat/lib/xmlparse.c:6314:27: error: narrowing conversion from 'int' to signed type 'char' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
>  6314 |         checkEntityDecl = ! dtd->hasParamEntityRefs || dtd->standalone;
>       |                           ^
> [..]/expat/lib/xmlparse.c:7897:10: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
>  7897 |   next = dtd->scaffCount++;
>       |          ^
> [..]/expat/lib/xmlparse.c:8096:16: error: narrowing conversion from 'XmlBigCount' (aka 'unsigned long long') to 'float' [bugprone-narrowing-conversions,-warnings-as-errors]
>  8096 |             ? (countBytesOutput
>       |                ^
> [..]/expat/lib/xmlparse.c:8098:16: error: narrowing conversion from 'XmlBigCount' (aka 'unsigned long long') to 'float' [bugprone-narrowing-conversions,-warnings-as-errors]
>  8098 |             : ((lenOfShortestInclude
>       |                ^
2025-03-30 18:52:09 +02:00
Sebastian Pipping
2b2a24691a Sync file headers 2025-03-27 17:45:25 +01:00
Sebastian Pipping
a6497c3004 Bump version to 2.7.1 2025-03-27 17:45:25 +01:00
Berkay Eren Ürün
bcf353990c Make parser->m_eventPtr handling clearer 2025-03-25 18:30:47 +01:00
Berkay Eren Ürün
89a9c6807c Stop updating m_eventPtr on exit for reentry
The fix for recursive entity processing introduced a reenter flag that
returns the execution from the current function and switches to entity
processing.

The same fix also updates the m_eventPtr during this switch. However
this update changes the behaviour in certain cases as the older version
does not update the m_eventPtr while recursing into entity processing.

This commit removes the pointer update and restores the old behaviour.
2025-03-25 18:30:47 +01:00
Sebastian Pipping
ea3b852acf lib/xmlparse.c: Address warning from frama-c 30.0
The symptom was:
> [variadic:typing] lib/xmlparse.c:8242: Warning:
>   Incorrect type for argument 7. The argument will be cast from unsigned int to int.
2025-03-21 23:02:05 +01:00
Sebastian Pipping
41fcb44549 lib/internal.h: Fix printf format specifiers for 32bit mode Emscripten
When compiling with Emscripten 3.1.6, the symptom was:
> [..]
> /usr/bin/emcc  @CMakeFiles/expat.dir/includes_C.rsp -fno-strict-aliasing -fvisibility=hidden -std=c99 -MD -MT CMakeFiles/expat.dir/lib/xmlparse.c.o -MF CMakeFiles/expat.dir/lib/xmlparse.c.o.d -o CMakeFiles/expat.dir/lib/xmlparse.c.o -c /libexpat/expat/lib/xmlparse.c
> /libexpat/expat/lib/xmlparse.c:8132:11: warning: format specifies type 'int' but the argument has type 'ptrdiff_t' (aka 'long') [-Wformat]
>           bytesMore, (account == XML_ACCOUNT_DIRECT) ? "DIR" : "EXP",
>           ^~~~~~~~~
> 1 warning generated.
> [..]
> /usr/bin/emcc -DXML_TESTING @CMakeFiles/runtests.dir/includes_C.rsp -fno-strict-aliasing -fvisibility=hidden -std=c99 -MD -MT CMakeFiles/runtests.dir/tests/acc_tests.c.o -MF CMakeFiles/runtests.dir/tests/acc_tests.c.o.d -o CMakeFiles/runtests.dir/tests/acc_tests.c.o -c /libexpat/expat/tests/acc_tests.c
> /libexpat/expat/tests/acc_tests.c:279:11: warning: format specifies type 'unsigned int' but the argument has type 'unsigned long' [-Wformat]
>           u + 1, countCases, expectedCountBytesDirect, actualCountBytesDirect);
>           ^~~~~
> /libexpat/expat/tests/acc_tests.c:279:18: warning: format specifies type 'unsigned int' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
>           u + 1, countCases, expectedCountBytesDirect, actualCountBytesDirect);
>                  ^~~~~~~~~~
> /libexpat/expat/tests/acc_tests.c:288:11: warning: format specifies type 'unsigned int' but the argument has type 'unsigned long' [-Wformat]
>           u + 1, countCases, expectedCountBytesIndirect,
>           ^~~~~
> /libexpat/expat/tests/acc_tests.c:288:18: warning: format specifies type 'unsigned int' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
>           u + 1, countCases, expectedCountBytesIndirect,
>                  ^~~~~~~~~~
> 4 warnings generated.
> [..]
> /usr/bin/emcc -DXML_TESTING @CMakeFiles/runtests.dir/includes_C.rsp -fno-strict-aliasing -fvisibility=hidden -std=c99 -MD -MT CMakeFiles/runtests.dir/lib/xmlparse.c.o -MF CMakeFiles/runtests.dir/lib/xmlparse.c.o.d -o CMakeFiles/runtests.dir/lib/xmlparse.c.o -c /libexpat/expat/lib/xmlparse.c
> /libexpat/expat/lib/xmlparse.c:8132:11: warning: format specifies type 'int' but the argument has type 'ptrdiff_t' (aka 'long') [-Wformat]
>           bytesMore, (account == XML_ACCOUNT_DIRECT) ? "DIR" : "EXP",
>           ^~~~~~~~~
> 1 warning generated.
> [..]
> /usr/bin/em++ -DXML_TESTING @CMakeFiles/runtests_cxx.dir/includes_CXX.rsp -fno-strict-aliasing -fvisibility=hidden -std=c++11 -MD -MT CMakeFiles/runtests_cxx.dir/tests/acc_tests_cxx.cpp.o -MF CMakeFiles/runtests_cxx.dir/tests/acc_tests_cxx.cpp.o.d -o CMakeFiles/runtests_cxx.dir/tests/acc_tests_cxx.cpp.o -c /libexpat/expat/tests/acc_tests_cxx.cpp
> In file included from /libexpat/expat/tests/acc_tests_cxx.cpp:32:
> /libexpat/expat/tests/acc_tests.c:279:11: warning: format specifies type 'unsigned int' but the argument has type 'unsigned long' [-Wformat]
>           u + 1, countCases, expectedCountBytesDirect, actualCountBytesDirect);
>           ^~~~~
> /libexpat/expat/tests/acc_tests.c:279:18: warning: format specifies type 'unsigned int' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
>           u + 1, countCases, expectedCountBytesDirect, actualCountBytesDirect);
>                  ^~~~~~~~~~
> /libexpat/expat/tests/acc_tests.c:288:11: warning: format specifies type 'unsigned int' but the argument has type 'unsigned long' [-Wformat]
>           u + 1, countCases, expectedCountBytesIndirect,
>           ^~~~~
> /libexpat/expat/tests/acc_tests.c:288:18: warning: format specifies type 'unsigned int' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
>           u + 1, countCases, expectedCountBytesIndirect,
>                  ^~~~~~~~~~
> 4 warnings generated.
> [..]
> /usr/bin/emcc -DXML_TESTING @CMakeFiles/runtests_cxx.dir/includes_C.rsp -fno-strict-aliasing -fvisibility=hidden -std=c99 -MD -MT CMakeFiles/runtests_cxx.dir/lib/xmlparse.c.o -MF CMakeFiles/runtests_cxx.dir/lib/xmlparse.c.o.d -o CMakeFiles/runtests_cxx.dir/lib/xmlparse.c.o -c /libexpat/expat/lib/xmlparse.c
> /libexpat/expat/lib/xmlparse.c:8132:11: warning: format specifies type 'int' but the argument has type 'ptrdiff_t' (aka 'long') [-Wformat]
>           bytesMore, (account == XML_ACCOUNT_DIRECT) ? "DIR" : "EXP",
>           ^~~~~~~~~
> 1 warning generated.
2025-03-17 03:49:10 +01:00
Sebastian Pipping
2ae4c8afd3 Sync file headers 2025-03-13 20:41:33 +01:00
Sebastian Pipping
0a6cbff62c Bump version to 2.7.0 2025-03-13 20:41:33 +01:00
Sebastian Pipping
bbd413a808 Sync file headers 2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
f2edeaaece Delete the check that prevents reentry
The early return in case of zero open internal entities and matching
end/nextPtr pointers cause the parser to miss XML_ERROR_NO_ELEMENTS
error.

The reason is that the internalEntityProcessor does not set the
m_reenter flag in such a case, which results in skipping the
prologProcessor or contentProcessor depending on wheter is_param is set
or not. However, this last skipped call to mentioned processors can
detect the non-existence of elements when some are expected.
2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
7b9758517b Remove unnecessary triggerReenter calls
callStoreEntityValue and storeAttributeValue call triggerReenter just
before continuing with their main loop. This call does not have any
use for the these functions as the continuity of their loop is already
achieved by the continue key word.

Only side effect these triggerReenter calls bring is that they cause a
return to the the callProcessor, only to reenter to the same point again,
wasting some time.

This commit removes these unnecessary calls.
2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
c25f0cef93 Handle unreachable return locations
At some points we check m_reenter flag for return. However this flag can
never be true at these points. Therefore body of the check is never
executed. This commit excludes the body from test coverage, removes the
nextPtr update (since we faced an error, no need to update it) and
lastly makes them return XML_ERROR_UNEXPECTED_STATE as a safety net.
2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
e41a398e3f Remove unnecessary checks before entity removal
After the commit "Fix entity debug order", the interaction with the open
entity lists has been changed.

Before the commit, during processing of an entity, if an inner entity was
found, it was pushed to the head of the list. This made the entity we were
processing the second in the list. So, in order the remove this entity,
we either remove the head or the second element, depending on if an inner
entity is found during processing.

After the commit, since we delay the removal of entities until their
inner entity references are resolved, the entity we want to remove will
always be on the head of the list. Thus the removal of the check.
2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
92f66bb50e Fix up on commit "Add next pointer to appendAttributeValue"
Remove unrequired nextPtr assignments
2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
495fb53b16 Fix infinite loop with indirectly recursive entities
Detection of recursive entity references are currently failing because
we process and close entities before their inner references are
processed. Since the detection works by checking wheter the referenced
entity is already open, this early close leads to wrong results. This
commit delays closing entities until their inner entities are processed
and closed. This is achieved by postponing the unsetting of the open
flag and using a new hasMore flag to check if the entity has more
elements to process.
2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
c20ce3aaa3 Fix entity debug order
The fix for entity processing changes the order of opening and closing
of entities. The reason is the new iterative approach that closes
and removes entities as soon as they are fully processed, unlike
the recursive approach that, as a side effect of the recursion, waits
the inner entities before closing and removal.

This commit delays the removal and the call to entityTrackingOnClose
until the current entities inner entities are processed, which in turn
allows us to have the correct debug order again.
2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
16a3b9d356 Merge entity processors 2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
fabae41d28 Fix storeEntityValue recursion 2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
bf97ac5081 Add next pointer to storeEntityValue
This commit introduces a new nextPtr parameter to storeEntityValue.
After finishing its execution, storeEntityValue function sets this
parameter in way that it points to the next token to process.

This is useful when we want to leave and reenter storeEntityValue during
its execution since nextPtr will point where we left.

This commit is base to the following commit.
2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
da10ca2328 Break cyclic appendAttributeValue recursion
During processing attributes with entity references,
appendAttributeValue can reach high recursion depths that can lead to
a crash.

This commit switches the processing to an iterative approach similar to
the fix for internal entity processing. A new m_openAttributeEntities
list is introduced to keep track of entity references that need
processing. When a new entity reference is detected, instead of calling
appendAttributeValue recursively, the entity will be added to open
entities list and the execution will return to storeAttributeValue,
where newly added entity will be handled. After the entity processing
is done, appendAttributeValue will be called by using the next token.
2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
74308916d9 Add next pointer to appendAttributeValue
This commits extends appendAttributeValue by introducing a new parameter
that will be set to the next token to process.

Having such a parameter allows us to reenter the function after an exit
and continue from the last token pointed by the pointer.
2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
a910fbc0e1 Fix internal entity processing
Co-authored-by: Jann Horn <jannh@google.com>

This avoids unbounded recursion in internal entity processing
2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
6edca2c37e Switch allowClosingDoctype
This change of allowClosingDoctype has no effect and only serves as a
preparation for the upcoming changes.
2025-03-13 14:01:31 +01:00
Berkay Eren Ürün
5e16cd6d07 Introduce reenter flag
Co-authored-by: Jann Horn <jannh@google.com>

Add a new reenter flag. This flag acts like XML_SUSPENDED,
except that instead of returning out of the library, we
only return back to the main parse function, then re-enter
the processor function.
2025-03-13 14:01:31 +01:00
Jann Horn
dd982e3950 Refactor guards against busy parser reconfiguration
Rebased-and-adapted-by: Berkay Eren Ürün <berkay.ueruen@siemens.com>
2025-03-13 14:01:31 +01:00
Sebastian Pipping
03966546fa lib: Mark dead code in unsignedCharToPrintable as dead to LCOV 2024-12-09 23:33:57 +01:00
Sebastian Pipping
750c985f11 Sync file headers 2024-11-06 17:42:20 +01:00
Sebastian Pipping
c15ac3b307 Bump version to 2.6.4 2024-11-06 17:42:20 +01:00
Sebastian Pipping
ef485e96a6
Merge pull request #915 from libexpat/stop-resumeparser-from-crashing
[CVE-2024-50602] Stop `XML_ResumeParser` from crashing
2024-10-28 15:14:02 +01:00
Sebastian Pipping
5fb89e7b3a lib: Be explicit about XML_PARSING in XML_StopParser 2024-10-21 18:27:46 +02:00
Sebastian Pipping
51c7019069 lib: Make XML_StopParser refuse to stop/suspend an unstarted parser 2024-10-21 18:27:46 +02:00
Hanno Böck
424dd12400 Fix signedness of format strings
Format strings used %d to print variables with unsigned values.
Changing to %u to match signedness.
Fixes "clang -Wformat-signedness" warnings.
2024-10-20 08:45:59 +02:00
Sebastian Pipping
f9cfbb7fce Sync file headers 2024-09-03 18:19:25 +02:00
Sebastian Pipping
8707e02e1f Bump version to 2.6.3 2024-09-03 18:19:25 +02:00
Sebastian Pipping
29ef43a0ba
Merge pull request #892 from libexpat/taiyou-nextscaffoldpart-overflow
[CVE-2024-45492] lib: Detect integer overflow in function `nextScaffoldPart` (fixes #889)
2024-09-03 18:18:03 +02:00
Sebastian Pipping
b8a7dca467
Merge pull request #891 from libexpat/taiyou-dtdcopy-malloc-overflow
[CVE-2024-45491] lib: Detect integer overflow in `dtdCopy` (fixes #888)
2024-09-03 18:17:46 +02:00
Sebastian Pipping
e5d6bf015e
Merge pull request #890 from libexpat/taiyou-xml-parsebuffer-len
[CVE-2024-45490] lib: Reject negative len for `XML_ParseBuffer` (fixes #887)
2024-09-03 18:17:32 +02:00
Sebastian Pipping
8e439a9947 lib: Detect integer overflow in dtdCopy
Reported by TaiYou
2024-08-26 22:35:54 +02:00
Sebastian Pipping
5c1a31642e lib: Reject negative len for XML_ParseBuffer
Reported by TaiYou
2024-08-26 22:25:19 +02:00
Sebastian Pipping
9bf0f2c16e lib: Detect integer overflow in function nextScaffoldPart
Reported by TaiYou
2024-08-26 22:25:15 +02:00
Berkay Eren Ürün
c158a62e57 Remove XML_DTD guards before is_param accesses
As a part of the ENTITY struct, is_param is correctly initialized even
when XML_DTD is not defined. This can be seen in the 'lookup' function,
which sets all the ENTITY memory, including the is_param flag, to zero
during the ENTITY creation. Additionally, is_param can only be assigned
XML_TRUE when XML_DTD is defined, which makes XML_DTD checks before
is_param accesses not necessary.

Currently, some of the is_param accesses are guarded by the XML_DTD and
some not. This commit removes all XML_DTD guards that are meant for
is_param accesses.
2024-08-22 13:38:04 +02:00
Sebastian Pipping
35753a8ccc lib: Fix typo in a code comment 2024-08-20 21:16:57 +02:00
Sebastian Pipping
b0e673830e lib/siphash.h: Apply clang-format 18.1.5 2024-05-13 22:00:56 +02:00
Sebastian Pipping
13cff445fa Bump version to 2.6.2 2024-03-13 14:37:05 +01:00
Sebastian Pipping
5026213864
Merge pull request #842 from libexpat/issue-839-billion-laughs-isolated-external-parser
Prevent billion laughs attacks in isolated external parser (part of #839)
2024-03-07 22:14:09 +01:00
Sebastian Pipping
1d50b80cf3 lib/xmlparse.c: Detect billion laughs attack with isolated external parser
When parsing DTD content with code like ..

  XML_Parser parser = XML_ParserCreate(NULL);
  XML_Parser ext_parser = XML_ExternalEntityParserCreate(parser, NULL, NULL);
  enum XML_Status status = XML_Parse(ext_parser, doc, (int)strlen(doc), XML_TRUE);

.. there are 0 bytes accounted as direct input and all input from `doc` accounted
as indirect input.  Now function accountingGetCurrentAmplification cannot calculate
the current amplification ratio as "(direct + indirect) / direct", and it did refuse
to divide by 0 as one would expect, but it returned 1.0 for this case to indicate
no amplification over direct input.  As a result, billion laughs attacks from
DTD-only input were not detected with this isolated way of using an external parser.

The new approach is to assume direct input of length not 0 but 22 -- derived from
ghost input "<!ENTITY a SYSTEM 'b'>", the shortest possible way to include an external
DTD --, and do the usual "(direct + indirect) / direct" math with "direct := 22".

GitHub issue #839 has more details on this issue and its origin in ClusterFuzz
finding 66812.
2024-03-06 23:41:07 +01:00
Sebastian Pipping
a4c86a395e lib/xmlparse.c: Reject directly recursive parameter entities 2024-03-06 22:34:26 +01:00