After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 690202 - Buffer overflow errors originating from xmlBufGetInputBase in 2.9.0, ToT
Buffer overflow errors originating from xmlBufGetInputBase in 2.9.0, ToT
Product: libxml2
Classification: Platform
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Reported: 2012-12-14 10:18 UTC by Zan Dobersek
Modified: 2021-07-05 13:25 UTC
See Also:
GNOME target: ---
GNOME version: ---

GDB output (7.44 KB, text/plain)
2012-12-14 10:19 UTC, Zan Dobersek
Standalone reproducible case (68.57 KB, application/octet-stream)
2013-01-31 09:48 UTC, Mark Rowe

Description Zan Dobersek 2012-12-14 10:18:10 UTC
WebKitGTK+ recently bumped the libxml2 dependency to 2.9.0, and immediately buffer overflow errors started appearing, signalled by the following stderr output in some test cases:
internal buffer error : No error message provided

The error was traced back into xmlBufGetInputBase. I'll upload the backtrace and memory examination shortly.
Comment 1 Zan Dobersek 2012-12-14 10:19:37 UTC
Created attachment 231560 [details]
GDB output
Comment 3 Daniel Veillard 2012-12-21 03:54:21 UTC
Hum, it seems that somehow WebKitGTK+ is messing up with the
parser input, probably doing so using the old xmlBuffer APIs.
As such the gdb output doesn't really help, the data were modified
before as part of WebKitGTK handling for those XSS cases.

As a result some of the additional checks added to the new buffer
code in xmlbuf.c fail.

Double check where and how WebKitGTK+ is modifying the current XML
parser input, the main change at the libxml2 level is:

if you are touching ctxt->input->buf->buffer or ctxt->input->buf->raw
directly, well you probably should not modify the parser internal data
to handle those denied requests.

The following commit switching the XML parser internals to the new structure

can probably give you hints on how to fix that code.

Comment 4 Dan Winship 2012-12-23 17:43:02 UTC
(In reply to comment #3)
> Double check where and how WebKitGTK+ is modifying the current XML
> parser input, the main change at the libxml2 level is:
> if you are touching ctxt->input->buf->buffer or ctxt->input->buf->raw
> directly, well you probably should not modify the parser internal data
> to handle those denied requests.

Hm. Nope, doesn't seem to be doing that.

The weirdest thing I see in the code is that apparently WebKit will have always already converted the XML to UTF-16 before libxml2 ever sees it, but the xml may still have some encoding="UTF-8" or whatever declaration in it, which libxml2 will try to take into account, and so WebKit hacks around this by calling xmlSwitchEncoding(ctxt, XML_CHAR_ENCODING_UTF16LE) before every xmlParseChunk() call. Could that cause problems? (And is there any saner way of dealing with this issue?)
Comment 5 Mark Rowe 2013-01-31 09:48:06 UTC
I saw this while I was investigating bug 692915. I've managed to construct a standalone, reproducible case of this outside of WebKit. It does appear to be related to the use of "xmlSwitchEncoding(ctxt, XML_CHAR_ENCODING_UTF16LE);", though in my standalone test case this is only called a single time before the intial chunk is passed to xmlParseChunk.

The attached standlone test case can be compiled with a command like:

cc -g -lxml2 -I/usr/include/libxml2 -o libxml2-decoding-bug-2 libxml2-decoding-bug-2.c

The output with libxml2 v2.9.0 and TOT looks like so:

Character count: 64400
startElement: root
xmlParseChunk result: 0
internal buffer error : No error message provided
Extra content at the end of the document
xmlParseChunk result: 5

With a debug malloc implementation, OS X's guard malloc, enabled I reproducibly see a crash inside xmlParseGetLasts prior to any overflow error being detected by libxml2.

-> 10930		while ((tmp >= ctxt->input->base) && (*tmp != '<')) tmp--;

ctxt->input->base and ctxt->input->end both appear to point to deallocated memory.

There's clearly some form of serious memory management bug here.
Comment 6 Mark Rowe 2013-01-31 09:48:33 UTC
Created attachment 234891 [details]
Standalone reproducible case
Comment 7 Mark Rowe 2013-01-31 11:37:58 UTC
It looks like the input buffer is being grown by a call to xmlBufGrow within xmlCharEncInput. This results in realloc moving the buffer, while the base / end pointers still point in to the old buffer. I think the fix may look something like:

> diff --git a/parser.c b/parser.c
> index 31f90d6..08f1a6b 100644
> --- a/parser.c
> +++ b/parser.c
> @@ -12146,16 +12146,17 @@ xmldecl_done:
>                 nbchars = xmlCharEncInput(in);
>                 if (nbchars < 0) {
>                     /* TODO 2.6.0 */
>                     xmlGenericError(xmlGenericErrorContext,
>                                     "xmlParseChunk: encoder error\n");
>                     return(XML_ERR_INVALID_ENCODING);
>                 }
> +               xmlBufResetInput(in->buffer, ctxt->input);
>             }
>         }
>      }
>      if (remain != 0) {
>          xmlParseTryOrFinish(ctxt, 0);
>      } else {
>          if ((ctxt->input != NULL) && (ctxt->input->buf != NULL))
>              avail = xmlBufUse(ctxt->input->buf->buffer);

There's an almost identical codepath in HTMLparser.c that probably needs the same fix. There are a few other calls to xmlCharEncInput that aren't obviously followed by a call to xmlBufResetInput, which suggests they may be have the same problem.
Comment 8 Mark Rowe 2013-02-15 22:13:25 UTC
This looks to have been fixed by <>.
Comment 9 Dan Winship 2013-02-16 16:17:29 UTC
That patch fixes your test case, but I still see "internal buffer error : No error message provided" when running the tests from comment 2, so it doesn't seem to fix everything.
Comment 10 Daniel Veillard 2013-03-27 03:30:26 UTC
My problem is that i don't see how to reproduce them outside of the
whole webkit, looking at the trace, i have tried to make the same input
and feed it to xmllint push parser but i don't get any trouble with it:

thinkpad:~/XML -> iconv -f UTF-8 -t UTF-16 < webkit.xml > webkit_16.xml
thinkpad:~/XML -> xmllint --push webkit_16.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="resources/xsl-using-document-redirect.xsl"?>
FAIL: XML stylesheet did not run.
thinkpad:~/XML -> xmllint --stream webkit_16.xml
thinkpad:~/XML -> ls -l webkit_16.xml
-rw-rw-r--. 1 veillard veillard 342 Mar 27 11:24 webkit_16.xml
thinkpad:~/XML -> ls -l webkit.xml
-rw-rw-r--. 1 veillard veillard 170 Mar 27 11:21 webkit.xml
thinkpad:~/XML ->

 I'm afraid that without some kind of test case based on libxml2
I will have a hard time debugging this. This actually call for kind of
instrumentation of libxml2 allowing to reproduce the set of calls made
by an application but I can't guarantee I will get time to do this considering
my current crazy schedule !

Comment 11 Gilboa Davara 2013-05-26 16:29:56 UTC
Not sure if it's the same bug, but at least in my case, the same error was triggered when faced with large (>64KB) XSL files. Breaking the large XSL into smaller ones solved the problem.
Comment 12 GNOME Infrastructure Team 2021-07-05 13:25:57 UTC
GNOME is going to shut down in favor of
As part of that, we are mass-closing older open tickets in
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
and create a new ticket at

Thank you for your understanding and your help.