GNOME Bugzilla – Bug 690202
Buffer overflow errors originating from xmlBufGetInputBase in 2.9.0, ToT
Last modified: 2021-07-05 13:25:57 UTC
WebKitGTK+ recently bumped the libxml2 dependency to 2.9.0, and immediately buffer overflow errors started appearing, signalled by the following stderr output in some test cases: internal buffer error : No error message provided The error was traced back into xmlBufGetInputBase. I'll upload the backtrace and memory examination shortly.
Created attachment 231560 [details] GDB output
The error is currently reported in two layout tests: http://trac.webkit.org/browser/trunk/LayoutTests/http/tests/security/xss-DENIED-xsl-document-redirect.xml http://trac.webkit.org/browser/trunk/LayoutTests/http/tests/security/xss-DENIED-xsl-external-entity-redirect.xml
Hum, it seems that somehow WebKitGTK+ is messing up with the parser input, probably doing so using the old xmlBuffer APIs. As such the gdb output doesn't really help, the data were modified before as part of WebKitGTK handling for those XSS cases. As a result some of the additional checks added to the new buffer code in xmlbuf.c fail. Double check where and how WebKitGTK+ is modifying the current XML parser input, the main change at the libxml2 level is: http://git.gnome.org/browse/libxml2/commit/?id=65c7d3b2e6506283eecd19a23dcf0122fbcdac33 if you are touching ctxt->input->buf->buffer or ctxt->input->buf->raw directly, well you probably should not modify the parser internal data to handle those denied requests. The following commit switching the XML parser internals to the new structure http://git.gnome.org/browse/libxml2/commit/?id=768eb3b82d25f9635a68cc06385d9e64b3c4dd18 can probably give you hints on how to fix that code. Daniel
(In reply to comment #3) > Double check where and how WebKitGTK+ is modifying the current XML > parser input, the main change at the libxml2 level is: > > http://git.gnome.org/browse/libxml2/commit/?id=65c7d3b2e6506283eecd19a23dcf0122fbcdac33 > > if you are touching ctxt->input->buf->buffer or ctxt->input->buf->raw > directly, well you probably should not modify the parser internal data > to handle those denied requests. Hm. Nope, doesn't seem to be doing that. The weirdest thing I see in the code is that apparently WebKit will have always already converted the XML to UTF-16 before libxml2 ever sees it, but the xml may still have some encoding="UTF-8" or whatever declaration in it, which libxml2 will try to take into account, and so WebKit hacks around this by calling xmlSwitchEncoding(ctxt, XML_CHAR_ENCODING_UTF16LE) before every xmlParseChunk() call. Could that cause problems? (And is there any saner way of dealing with this issue?)
I saw this while I was investigating bug 692915. I've managed to construct a standalone, reproducible case of this outside of WebKit. It does appear to be related to the use of "xmlSwitchEncoding(ctxt, XML_CHAR_ENCODING_UTF16LE);", though in my standalone test case this is only called a single time before the intial chunk is passed to xmlParseChunk. The attached standlone test case can be compiled with a command like: cc -g -lxml2 -I/usr/include/libxml2 -o libxml2-decoding-bug-2 libxml2-decoding-bug-2.c The output with libxml2 v2.9.0 and TOT looks like so: Character count: 64400 startElement: root xmlParseChunk result: 0 internal buffer error : No error message provided Extra content at the end of the document xmlParseChunk result: 5 With a debug malloc implementation, OS X's guard malloc, enabled I reproducibly see a crash inside xmlParseGetLasts prior to any overflow error being detected by libxml2. -> 10930 while ((tmp >= ctxt->input->base) && (*tmp != '<')) tmp--; ctxt->input->base and ctxt->input->end both appear to point to deallocated memory. There's clearly some form of serious memory management bug here.
Created attachment 234891 [details] Standalone reproducible case
It looks like the input buffer is being grown by a call to xmlBufGrow within xmlCharEncInput. This results in realloc moving the buffer, while the base / end pointers still point in to the old buffer. I think the fix may look something like: > diff --git a/parser.c b/parser.c > index 31f90d6..08f1a6b 100644 > --- a/parser.c > +++ b/parser.c > @@ -12146,16 +12146,17 @@ xmldecl_done: > > nbchars = xmlCharEncInput(in); > if (nbchars < 0) { > /* TODO 2.6.0 */ > xmlGenericError(xmlGenericErrorContext, > "xmlParseChunk: encoder error\n"); > return(XML_ERR_INVALID_ENCODING); > } > + xmlBufResetInput(in->buffer, ctxt->input); > } > } > } > if (remain != 0) { > xmlParseTryOrFinish(ctxt, 0); > } else { > if ((ctxt->input != NULL) && (ctxt->input->buf != NULL)) > avail = xmlBufUse(ctxt->input->buf->buffer); > There's an almost identical codepath in HTMLparser.c that probably needs the same fix. There are a few other calls to xmlCharEncInput that aren't obviously followed by a call to xmlBufResetInput, which suggests they may be have the same problem.
This looks to have been fixed by <http://git.gnome.org/browse/libxml2/commit/?id=de0cc20c29cb3f056062925395e0f68d2250a46f>.
That patch fixes your test case, but I still see "internal buffer error : No error message provided" when running the tests from comment 2, so it doesn't seem to fix everything.
My problem is that i don't see how to reproduce them outside of the whole webkit, looking at the trace, i have tried to make the same input and feed it to xmllint push parser but i don't get any trouble with it: thinkpad:~/XML -> iconv -f UTF-8 -t UTF-16 < webkit.xml > webkit_16.xml thinkpad:~/XML -> xmllint --push webkit_16.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="resources/xsl-using-document-redirect.xsl"?> <xml> FAIL: XML stylesheet did not run. </xml> thinkpad:~/XML -> xmllint --stream webkit_16.xml thinkpad:~/XML -> ls -l webkit_16.xml -rw-rw-r--. 1 veillard veillard 342 Mar 27 11:24 webkit_16.xml thinkpad:~/XML -> ls -l webkit.xml -rw-rw-r--. 1 veillard veillard 170 Mar 27 11:21 webkit.xml thinkpad:~/XML -> I'm afraid that without some kind of test case based on libxml2 I will have a hard time debugging this. This actually call for kind of instrumentation of libxml2 allowing to reproduce the set of calls made by an application but I can't guarantee I will get time to do this considering my current crazy schedule ! Daniel
Not sure if it's the same bug, but at least in my case, the same error was triggered when faced with large (>64KB) XSL files. Breaking the large XSL into smaller ones solved the problem.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.