The symptom was:
> [..]/expat/lib/xmlparse.c:826:9: error: narrowing conversion from 'ssize_t' (aka 'long') to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
> 826 | getrandom(currentTarget, bytesToWrite, getrandomFlags);
> | ^
> [..]/expat/lib/xmlparse.c:2765:19: error: narrowing conversion from 'unsigned long' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
> 2765 | int nameLen = sizeof(XML_Char) * (tag->name.strLen + 1);
> | ^
> [..]/expat/lib/xmlparse.c:3734:16: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
> 3734 | for (j = nsAttsSize; j != 0;)
> | ^
> [..]/expat/lib/xmlparse.c:3800:15: error: narrowing conversion from 'unsigned long' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
> 3800 | j = uriHash & mask; /* index into hash table */
> | ^
> [..]/expat/lib/xmlparse.c:3814:30: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
> 3814 | j < step ? (j += nsAttsSize - step) : (j -= step);
> | ^
> [..]/expat/lib/xmlparse.c:6309:13: error: narrowing conversion from 'int' to signed type 'char' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
> 6309 | parser->m_prologState.documentEntity &&
> | ^
> [..]/expat/lib/xmlparse.c:6314:27: error: narrowing conversion from 'int' to signed type 'char' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
> 6314 | checkEntityDecl = ! dtd->hasParamEntityRefs || dtd->standalone;
> | ^
> [..]/expat/lib/xmlparse.c:7897:10: error: narrowing conversion from 'unsigned int' to signed type 'int' is implementation-defined [bugprone-narrowing-conversions,-warnings-as-errors]
> 7897 | next = dtd->scaffCount++;
> | ^
> [..]/expat/lib/xmlparse.c:8096:16: error: narrowing conversion from 'XmlBigCount' (aka 'unsigned long long') to 'float' [bugprone-narrowing-conversions,-warnings-as-errors]
> 8096 | ? (countBytesOutput
> | ^
> [..]/expat/lib/xmlparse.c:8098:16: error: narrowing conversion from 'XmlBigCount' (aka 'unsigned long long') to 'float' [bugprone-narrowing-conversions,-warnings-as-errors]
> 8098 | : ((lenOfShortestInclude
> | ^
The fix for recursive entity processing introduced a reenter flag that
returns the execution from the current function and switches to entity
processing.
The same fix also updates the m_eventPtr during this switch. However
this update changes the behaviour in certain cases as the older version
does not update the m_eventPtr while recursing into entity processing.
This commit removes the pointer update and restores the old behaviour.
The symptom was:
> [variadic:typing] lib/xmlparse.c:8242: Warning:
> Incorrect type for argument 7. The argument will be cast from unsigned int to int.
When compiling with Emscripten 3.1.6, the symptom was:
> [..]
> /usr/bin/emcc @CMakeFiles/expat.dir/includes_C.rsp -fno-strict-aliasing -fvisibility=hidden -std=c99 -MD -MT CMakeFiles/expat.dir/lib/xmlparse.c.o -MF CMakeFiles/expat.dir/lib/xmlparse.c.o.d -o CMakeFiles/expat.dir/lib/xmlparse.c.o -c /libexpat/expat/lib/xmlparse.c
> /libexpat/expat/lib/xmlparse.c:8132:11: warning: format specifies type 'int' but the argument has type 'ptrdiff_t' (aka 'long') [-Wformat]
> bytesMore, (account == XML_ACCOUNT_DIRECT) ? "DIR" : "EXP",
> ^~~~~~~~~
> 1 warning generated.
> [..]
> /usr/bin/emcc -DXML_TESTING @CMakeFiles/runtests.dir/includes_C.rsp -fno-strict-aliasing -fvisibility=hidden -std=c99 -MD -MT CMakeFiles/runtests.dir/tests/acc_tests.c.o -MF CMakeFiles/runtests.dir/tests/acc_tests.c.o.d -o CMakeFiles/runtests.dir/tests/acc_tests.c.o -c /libexpat/expat/tests/acc_tests.c
> /libexpat/expat/tests/acc_tests.c:279:11: warning: format specifies type 'unsigned int' but the argument has type 'unsigned long' [-Wformat]
> u + 1, countCases, expectedCountBytesDirect, actualCountBytesDirect);
> ^~~~~
> /libexpat/expat/tests/acc_tests.c:279:18: warning: format specifies type 'unsigned int' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
> u + 1, countCases, expectedCountBytesDirect, actualCountBytesDirect);
> ^~~~~~~~~~
> /libexpat/expat/tests/acc_tests.c:288:11: warning: format specifies type 'unsigned int' but the argument has type 'unsigned long' [-Wformat]
> u + 1, countCases, expectedCountBytesIndirect,
> ^~~~~
> /libexpat/expat/tests/acc_tests.c:288:18: warning: format specifies type 'unsigned int' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
> u + 1, countCases, expectedCountBytesIndirect,
> ^~~~~~~~~~
> 4 warnings generated.
> [..]
> /usr/bin/emcc -DXML_TESTING @CMakeFiles/runtests.dir/includes_C.rsp -fno-strict-aliasing -fvisibility=hidden -std=c99 -MD -MT CMakeFiles/runtests.dir/lib/xmlparse.c.o -MF CMakeFiles/runtests.dir/lib/xmlparse.c.o.d -o CMakeFiles/runtests.dir/lib/xmlparse.c.o -c /libexpat/expat/lib/xmlparse.c
> /libexpat/expat/lib/xmlparse.c:8132:11: warning: format specifies type 'int' but the argument has type 'ptrdiff_t' (aka 'long') [-Wformat]
> bytesMore, (account == XML_ACCOUNT_DIRECT) ? "DIR" : "EXP",
> ^~~~~~~~~
> 1 warning generated.
> [..]
> /usr/bin/em++ -DXML_TESTING @CMakeFiles/runtests_cxx.dir/includes_CXX.rsp -fno-strict-aliasing -fvisibility=hidden -std=c++11 -MD -MT CMakeFiles/runtests_cxx.dir/tests/acc_tests_cxx.cpp.o -MF CMakeFiles/runtests_cxx.dir/tests/acc_tests_cxx.cpp.o.d -o CMakeFiles/runtests_cxx.dir/tests/acc_tests_cxx.cpp.o -c /libexpat/expat/tests/acc_tests_cxx.cpp
> In file included from /libexpat/expat/tests/acc_tests_cxx.cpp:32:
> /libexpat/expat/tests/acc_tests.c:279:11: warning: format specifies type 'unsigned int' but the argument has type 'unsigned long' [-Wformat]
> u + 1, countCases, expectedCountBytesDirect, actualCountBytesDirect);
> ^~~~~
> /libexpat/expat/tests/acc_tests.c:279:18: warning: format specifies type 'unsigned int' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
> u + 1, countCases, expectedCountBytesDirect, actualCountBytesDirect);
> ^~~~~~~~~~
> /libexpat/expat/tests/acc_tests.c:288:11: warning: format specifies type 'unsigned int' but the argument has type 'unsigned long' [-Wformat]
> u + 1, countCases, expectedCountBytesIndirect,
> ^~~~~
> /libexpat/expat/tests/acc_tests.c:288:18: warning: format specifies type 'unsigned int' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
> u + 1, countCases, expectedCountBytesIndirect,
> ^~~~~~~~~~
> 4 warnings generated.
> [..]
> /usr/bin/emcc -DXML_TESTING @CMakeFiles/runtests_cxx.dir/includes_C.rsp -fno-strict-aliasing -fvisibility=hidden -std=c99 -MD -MT CMakeFiles/runtests_cxx.dir/lib/xmlparse.c.o -MF CMakeFiles/runtests_cxx.dir/lib/xmlparse.c.o.d -o CMakeFiles/runtests_cxx.dir/lib/xmlparse.c.o -c /libexpat/expat/lib/xmlparse.c
> /libexpat/expat/lib/xmlparse.c:8132:11: warning: format specifies type 'int' but the argument has type 'ptrdiff_t' (aka 'long') [-Wformat]
> bytesMore, (account == XML_ACCOUNT_DIRECT) ? "DIR" : "EXP",
> ^~~~~~~~~
> 1 warning generated.
The early return in case of zero open internal entities and matching
end/nextPtr pointers cause the parser to miss XML_ERROR_NO_ELEMENTS
error.
The reason is that the internalEntityProcessor does not set the
m_reenter flag in such a case, which results in skipping the
prologProcessor or contentProcessor depending on wheter is_param is set
or not. However, this last skipped call to mentioned processors can
detect the non-existence of elements when some are expected.
callStoreEntityValue and storeAttributeValue call triggerReenter just
before continuing with their main loop. This call does not have any
use for the these functions as the continuity of their loop is already
achieved by the continue key word.
Only side effect these triggerReenter calls bring is that they cause a
return to the the callProcessor, only to reenter to the same point again,
wasting some time.
This commit removes these unnecessary calls.
At some points we check m_reenter flag for return. However this flag can
never be true at these points. Therefore body of the check is never
executed. This commit excludes the body from test coverage, removes the
nextPtr update (since we faced an error, no need to update it) and
lastly makes them return XML_ERROR_UNEXPECTED_STATE as a safety net.
After the commit "Fix entity debug order", the interaction with the open
entity lists has been changed.
Before the commit, during processing of an entity, if an inner entity was
found, it was pushed to the head of the list. This made the entity we were
processing the second in the list. So, in order the remove this entity,
we either remove the head or the second element, depending on if an inner
entity is found during processing.
After the commit, since we delay the removal of entities until their
inner entity references are resolved, the entity we want to remove will
always be on the head of the list. Thus the removal of the check.
Detection of recursive entity references are currently failing because
we process and close entities before their inner references are
processed. Since the detection works by checking wheter the referenced
entity is already open, this early close leads to wrong results. This
commit delays closing entities until their inner entities are processed
and closed. This is achieved by postponing the unsetting of the open
flag and using a new hasMore flag to check if the entity has more
elements to process.
The fix for entity processing changes the order of opening and closing
of entities. The reason is the new iterative approach that closes
and removes entities as soon as they are fully processed, unlike
the recursive approach that, as a side effect of the recursion, waits
the inner entities before closing and removal.
This commit delays the removal and the call to entityTrackingOnClose
until the current entities inner entities are processed, which in turn
allows us to have the correct debug order again.
This commit introduces a new nextPtr parameter to storeEntityValue.
After finishing its execution, storeEntityValue function sets this
parameter in way that it points to the next token to process.
This is useful when we want to leave and reenter storeEntityValue during
its execution since nextPtr will point where we left.
This commit is base to the following commit.
During processing attributes with entity references,
appendAttributeValue can reach high recursion depths that can lead to
a crash.
This commit switches the processing to an iterative approach similar to
the fix for internal entity processing. A new m_openAttributeEntities
list is introduced to keep track of entity references that need
processing. When a new entity reference is detected, instead of calling
appendAttributeValue recursively, the entity will be added to open
entities list and the execution will return to storeAttributeValue,
where newly added entity will be handled. After the entity processing
is done, appendAttributeValue will be called by using the next token.
This commits extends appendAttributeValue by introducing a new parameter
that will be set to the next token to process.
Having such a parameter allows us to reenter the function after an exit
and continue from the last token pointed by the pointer.
Co-authored-by: Jann Horn <jannh@google.com>
Add a new reenter flag. This flag acts like XML_SUSPENDED,
except that instead of returning out of the library, we
only return back to the main parse function, then re-enter
the processor function.
As a part of the ENTITY struct, is_param is correctly initialized even
when XML_DTD is not defined. This can be seen in the 'lookup' function,
which sets all the ENTITY memory, including the is_param flag, to zero
during the ENTITY creation. Additionally, is_param can only be assigned
XML_TRUE when XML_DTD is defined, which makes XML_DTD checks before
is_param accesses not necessary.
Currently, some of the is_param accesses are guarded by the XML_DTD and
some not. This commit removes all XML_DTD guards that are meant for
is_param accesses.
When parsing DTD content with code like ..
XML_Parser parser = XML_ParserCreate(NULL);
XML_Parser ext_parser = XML_ExternalEntityParserCreate(parser, NULL, NULL);
enum XML_Status status = XML_Parse(ext_parser, doc, (int)strlen(doc), XML_TRUE);
.. there are 0 bytes accounted as direct input and all input from `doc` accounted
as indirect input. Now function accountingGetCurrentAmplification cannot calculate
the current amplification ratio as "(direct + indirect) / direct", and it did refuse
to divide by 0 as one would expect, but it returned 1.0 for this case to indicate
no amplification over direct input. As a result, billion laughs attacks from
DTD-only input were not detected with this isolated way of using an external parser.
The new approach is to assume direct input of length not 0 but 22 -- derived from
ghost input "<!ENTITY a SYSTEM 'b'>", the shortest possible way to include an external
DTD --, and do the usual "(direct + indirect) / direct" math with "direct := 22".
GitHub issue #839 has more details on this issue and its origin in ClusterFuzz
finding 66812.