After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 719350 - Stack overflow in vfprintf on a very large fuzzed ods file
Stack overflow in vfprintf on a very large fuzzed ods file
Status: RESOLVED OBSOLETE
Product: libxml2
Classification: Platform
Component: general
2.7.8
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
: 749238 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2013-11-26 12:41 UTC by jutaky
Modified: 2021-07-05 13:27 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description jutaky 2013-11-26 12:41:32 UTC
Stack overflow in vfprintf on a very large fuzzed ods file

Stack overflow in vfprintf on a very large fuzzed ods file.

Git versions of glib, goffice, gnumeric, libgsf and libxml2.

Test case: http://jutaky.com/fuzzing/gnumeric_case_21934_5407.ods

==12789== Stack overflow in thread 1: can't grow stack to 0xffe801fd8
==12789== 
==12789== Process terminating with default action of signal 11 (SIGSEGV)
==12789==  Access not within mapped region at address 0xFFE801FD8
==12789==    at 0x686B69D: vfprintf (in /usr/lib/libc-2.18.so)

Backtrace is alternating between two entries:

  • #9064 xmlParseContent__internal_alias
    at parser.c line 9885
  • #9065 xmlParseElement__internal_alias
    at parser.c line 10058

The test case seems to be extremely large when unzipped.

--
Juha Kylmänen
Research Assistant, OUSPG
Comment 1 Morten Welinder 2013-11-26 13:22:39 UTC
I somehow cannot trigger this via Gnumeric, but I can trigger it with
xmllint.

    valgrind xmllint --huge content.xml

(where content.xml is the big file inside the ods archive).

--> libxml2
Comment 2 Gaurav 2014-01-22 10:03:28 UTC
Can you share sample code of your xml usage?
Comment 3 Morten Welinder 2014-01-22 14:33:43 UTC
Juha: do you still have them, or can you recreate?
I *might* have them at home, but not here.
Comment 4 Nick Wellnhofer 2017-06-05 20:47:20 UTC
Since libxml2 uses recursion to parse nested elements, this can probably be reproduced with a file containing a couple of thousand <a>s, like

<a><a><a><a><a><a><a><a><a><a><a><a><a><a><a><a><a><a>

Possible solutions:

- Use an iterative approach. We'd have to store the element's QNames in our own stack on the heap.
- Limit the nesting depth.
Comment 5 Morten Welinder 2017-06-05 21:20:46 UTC
The file is there right now.  I would attach content.xml here, but at
36M it's too big.

# valgrind xmllint --huge content.xml
[...]
==26855== Stack overflow in thread #1: can't grow stack to 0xffe801000
==26855== 
==26855== Process terminating with default action of signal 11 (SIGSEGV)
==26855==  Access not within mapped region at address 0xFFE801FE8
==26855== Stack overflow in thread #1: can't grow stack to 0xffe801000
==26855==    at 0x5242203: __find_specmb (printf-parse.h:108)
==26855==    by 0x5242203: vfprintf (vfprintf.c:1312)
==26855==  If you believe this happened as a result of a stack
==26855==  overflow in your program's main thread (unlikely but
==26855==  possible), you can try to increase the size of the
==26855==  main thread stack using the --main-stacksize= flag.
==26855==  The main thread stack size used in this run was 8388608.
==26855== Stack overflow in thread #1: can't grow stack to 0xffe801000
Comment 6 Nick Wellnhofer 2017-06-10 17:05:17 UTC
OK, it seems that libxml2 already limits the recursion depth in this case. See the global variable 'xmlParserMaxDepth' which defaults to 256. But the '--huge' option disables these checks, so the result is completely expected.

It also depends on the maximum stack size on your particular system, compilation flags, sanitizers like ASan etc. which explains why these bugs are sometimes hard to reproduce. In this case, simply don't set XML_PARSE_HUGE.

Unfortunately, there are other recursive function calls where the recursion depth isn't checked like 'xmlParseConditionalSections' or 'xmlSnprintfElementContent'. It's probably only a matter of time until someone comes up with some fuzzed input resulting in more stack overflows errors.

Maybe we should make this a tracking bug for this kind of issue. Otherwise, I'm inclined to close as NOTABUG.
Comment 7 Morten Welinder 2017-06-10 19:32:27 UTC
The problem here is that "--huge" is one giant monolithic option.

It would be much less of a problem if libxml allowed setting the limits
individually instead of "--huge, take it or leave it".

At the time it was added, Gnumeric already was using xml files that didn't
fit the new limits.  We had the option of using XML_PARSE_HUGE or not
being able to read people's existing files.

Actually, we didn't have that choice.  One day we woke up and libxml was
patched, i.e., we couldn't read our files.  That's a bit like disabling
printf in libc -- not great customer service.
Comment 8 Nick Wellnhofer 2017-06-10 23:33:01 UTC
From a quick look, these are the limits that XML_PARSE_HUGE lifts:

- XML_MAX_TEXT_LENGTH. Maximum size for text, comment, and PI nodes. Hardcoded to 10 million bytes. This is a limit that many people have issues with and that could be easily made configurable at runtime.

- xmlParserMaxDepth. Maximum nesting level of XML elements. This is configurable but doesn't help when you really run out of stack. OTOH, setting xmlParserMaxDepth to say 1000000 shouldn't change libxml2's behavior compared to older version. Should be a per-parser setting, though.

- A few other hardcoded recursion depth limits.  Probably not a concern for you.

- Some limits regarding size and nesting of entities. Probably not a concern for you.

- Some limits regarding maximum length of XML Names. Probably not a concern for you.

- XML_MAX_LOOKUP_LIMIT. Probably not a concern for you.

Would you be happy if you could set XML_MAX_TEXT_LENGTH and xmlParserMaxDepth per parser context or do you have a different problem? The original report is about a fuzzed file which is hardly representative.
Comment 9 Nick Wellnhofer 2017-06-10 23:47:45 UTC
*** Bug 749238 has been marked as a duplicate of this bug. ***
Comment 10 Morten Welinder 2017-06-11 02:27:25 UTC
XML_MAX_TEXT_LENGTH (formerly XML_MAX_TEXT_LENGHT) is precisely what
bit us.

> Would you be happy if you could set XML_MAX_TEXT_LENGTH and xmlParserMaxDepth
> per parser context or do you have a different problem? 

It'd be very happy.  I'd simply set it to file size and avoid the HUGE flag.
Comment 11 Nick Wellnhofer 2017-06-11 15:34:54 UTC
Here's my proposal for making XML_MAX_TEXT_LENGTH a runtime setting:

https://github.com/nwellnhof/libxml2/commit/7343130
Comment 12 Morten Welinder 2017-06-11 16:32:10 UTC
Looks visually good.  Are you sure that limit is never (explicitly or
implicitly) cast to an signed type?  That would make it -1 which would
be bad.
Comment 13 Nick Wellnhofer 2017-06-17 22:01:28 UTC
> Are you sure that limit is never (explicitly or
implicitly) cast to an signed type?

In the C language, unsigned types like size_t are never implicitly cast to signed types that can't represent all values of the unsigned type. There is also no explicit cast to a signed type (the result of which would be implementation-defined, not -1).
Comment 14 Morten Welinder 2017-06-17 22:43:51 UTC
Maybe my terminology isn't right, but you can definitely get surprises
with a function taking a signed argument yet given an unsigned value.
The situation also arises with assignment of an unsigned value to a signed
variable.

> value.implementation-defined, not -1

Indeed.  -1 just happens to be the most likely implementation-defined
result.

Anyway, I just wanted to make sure you thought about it.
Comment 15 GNOME Infrastructure Team 2021-07-05 13:27:05 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/libxml2/-/issues/

Thank you for your understanding and your help.