GNOME Bugzilla – Bug 756372
Buffer overead with HTML parser in push mode in xmlSAX2TextNode, causes segfault when compiled with ASAN
Last modified: 2015-11-21 04:25:37 UTC
This was found on git master, commit cf77e60515045bdd66f2c59c69a06e603b470eae When compiled with ASAN, it gives the following crash $ ./xmllint --html --push ~/fuzzing/xml-html-push-crashes-tmin/id\:000000* /root/fuzzing/xml-html-push-crashes-tmin/id:000000,sig:06,src:000000,op:flip1,pos:5:1: HTML parser error : htmlParseStartTag: invalid element name <>0 ^ ================================================================= ==17867== ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe56c100c1 at pc 0xb6d62c bp 0x7ffe56c0fdd0 sp 0x7ffe56c0fdc8 READ of size 1 at 0x7ffe56c100c1 thread T0 #0 0xb6d62b in xmlSAX2TextNode /root/libxml2/SAX2.c:1868 #1 0xb7159a in xmlSAX2Characters /root/libxml2/SAX2.c:2550 #2 0x6b53ee in htmlParseTryOrFinish /root/libxml2/HTMLparser.c:5740 #3 0x6b53ee in htmlParseChunk /root/libxml2/HTMLparser.c:6097 #4 0x416aae in parseAndPrintFile /root/libxml2/xmllint.c:2215 #5 0x40e119 in main /root/libxml2/xmllint.c:3759 #6 0x7fa7dec00ec4 (/lib/x86_64-linux-gnu/libc.so.6+0x21ec4) #7 0x40fc0c in _start (/root/libxml2/xmllint+0x40fc0c) Address 0x7ffe56c100c1 is located at offset 33 in frame <htmlParseChunk> of T0's stack: This frame has 3 object(s): [32, 33) 'cur' [96, 136) 'node_info' [192, 194) 'chr' HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions *are* supported) SUMMARY: AddressSanitizer: stack-buffer-overflow /root/libxml2/SAX2.c:1880 xmlSAX2TextNode Shadow bytes around the buggy address: 0x10004ad79fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10004ad79fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10004ad79fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10004ad79ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10004ad7a000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x10004ad7a010: 00 00 00 00 f1 f1 f1 f1[01]f4 f4 f4 f2 f2 f2 f2 0x10004ad7a020: 00 00 00 00 00 f4 f4 f4 f2 f2 f2 f2 02 f4 f4 f4 0x10004ad7a030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10004ad7a040: f1 f1 f1 f1 04 f4 f4 f4 f2 f2 f2 f2 00 f4 f4 f4 0x10004ad7a050: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 0x10004ad7a060: 00 00 00 00 00 00 f4 f4 f2 f2 f2 f2 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap righ redzone: fb Freed Heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 ASan internal: fe ==17867== ABORTING Aborted The testfile I used was generated with afl-fuzz, and then minimized. It's hex representation is as follows: $ xxd -g 1 ~/fuzzing/xml-html-push-crashes-tmin/id\:000000* 0000000: 3c 3e 30 <>0 Bisected back to the following commit commit 826bc320206f70fccd2941a77d363e95e8076898 Author: Arnold Hendriks <a.hendriks@b-lex.nl> Date: Fri Nov 29 14:12:12 2013 +0800 Fix HTML push parser to accept HTML_PARSE_NODEFDTD For https://bugzilla.gnome.org/show_bug.cgi?id=719515 fixes htmlParseTryOrFinish to interpret HTML_PARSE_NODEFDTD, and updates xmllint to actually pass --nodefdtd to the push version of the HTML parser diff --git a/HTMLparser.c b/HTMLparser.c index dd0c1ea..7694962 100644 --- a/HTMLparser.c +++ b/HTMLparser.c @@ -5991,7 +5991,7 @@ done: ctxt->sax->endDocument(ctxt->userData); } } - if ((ctxt->myDoc != NULL) && + if ((!(ctxt->options & HTML_PARSE_NODEFDTD)) && (ctxt->myDoc != NULL) && ((terminate) || (ctxt->instate == XML_PARSER_EOF) || (ctxt->instate == XML_PARSER_EPILOG))) { xmlDtdPtr dtd; diff --git a/xmllint.c b/xmllint.c index d69722c..ebd36ab 100644 --- a/xmllint.c +++ b/xmllint.c @@ -2206,6 +2206,7 @@ static void parseAndPrintFile(char *filename, xmlParserCtxtPtr if (res > 0) { ctxt = htmlCreatePushParserCtxt(NULL, NULL, chars, res, filename, XML_CHAR_ENCODING_NONE); + xmlCtxtUseOptions(ctxt, options); while ((res = fread(chars, 1, pushsize, f)) > 0) { htmlParseChunk(ctxt, chars, res, 0); } Running under gdb: $ gdb -ex "run" -ex "backtrace" --args ./xmllint --html --push ~/fuzzing/xml-html-push-crashes-tmin/id\:000000* GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./xmllint...done. Starting program: /root/libxml2/xmllint --html --push /root/fuzzing/xml-html-push-crashes-tmin/id:000000,sig:06,src:000000,op:flip1,pos:5 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". /root/fuzzing/xml-html-push-crashes-tmin/id:000000,sig:06,src:000000,op:flip1,pos:5:1: HTML parser error : htmlParseStartTag: invalid element name <>0 ^ ================================================================= ==18254== ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffffffc811 at pc 0xb6d62c bp 0x7fffffffc520 sp 0x7fffffffc518 READ of size 1 at 0x7fffffffc811 thread T0 #0 0xb6d62b in xmlSAX2TextNode /root/libxml2/SAX2.c:1868 #1 0xb7159a in xmlSAX2Characters /root/libxml2/SAX2.c:2550 #2 0x6b53ee in htmlParseTryOrFinish /root/libxml2/HTMLparser.c:5740 #3 0x6b53ee in htmlParseChunk /root/libxml2/HTMLparser.c:6097 #4 0x416aae in parseAndPrintFile /root/libxml2/xmllint.c:2215 #5 0x40e119 in main /root/libxml2/xmllint.c:3759 #6 0x7ffff4588ec4 (/lib/x86_64-linux-gnu/libc.so.6+0x21ec4) #7 0x40fc0c in _start (/root/libxml2/xmllint+0x40fc0c) Address 0x7fffffffc811 is located at offset 33 in frame <htmlParseChunk> of T0's stack: This frame has 3 object(s): [32, 33) 'cur' [96, 136) 'node_info' [192, 194) 'chr' HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions *are* supported) SUMMARY: AddressSanitizer: stack-buffer-overflow /root/libxml2/SAX2.c:1880 xmlSAX2TextNode Shadow bytes around the buggy address: 0x10007fff78b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007fff78c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007fff78d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007fff78e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007fff78f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 =>0x10007fff7900: f1 f1[01]f4 f4 f4 f2 f2 f2 f2 00 00 00 00 00 f4 0x10007fff7910: f4 f4 f2 f2 f2 f2 02 f4 f4 f4 00 00 00 00 00 00 0x10007fff7920: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f4 0x10007fff7930: f4 f4 f2 f2 f2 f2 00 f4 f4 f4 f2 f2 f2 f2 00 00 0x10007fff7940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10007fff7950: f4 f4 f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap righ redzone: fb Freed Heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 ASan internal: fe ==18254== ABORTING Program received signal SIGABRT, Aborted. 0x00007ffff459dcc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
+ Trace 235568
Looking at htmlParseTryOrFinish, line 5740, we see that it calls a function saved in the sax struct, which happens to be xmlSAX2Characters in this case. Note that the "length" parameter is hard coded to 1. There are similar entries on lines 5730 and 5734 which could be taken on different input paths. If there was not a "lastchild", then the xmlSAX2TextNode function is called, with same length (hard coded to 1). That then tries to read str[len], which is out of bounds if the input has reached end of string. This causes a crash with libxml2 compiled with ASAN. Without ASAN, there is potential to get input that could cause out of bounds memory to be returned to userspace through the use of libxml2, which could be used to cause denial of service attacks, or gain sensitive information. The fix for this is to test whether reading len characters will overrun the buffer in xmlSAX2Characters, which will then stop this crash and potential security issues. A patch to do so is included below: diff --git a/SAX2.c b/SAX2.c index ffef3e1..e711bbc 100644 --- a/SAX2.c +++ b/SAX2.c @@ -2523,6 +2523,13 @@ xmlSAX2Characters(void *ctx, const xmlChar *ch, int len) xmlGenericError(xmlGenericErrorContext, "SAX.xmlSAX2Characters(%.30s, %d)\n", ch, len); #endif + if (ch + len >= ctxt->input->end) { +#ifdef DEBUG_SAX_TREE + xmlGenericError(xmlGenericErrorContext, + "add chars: ctxt->input->cur + len >= ctxt->input->end !\n"); +#endif + return; + } /* * Handle the data if any. If there is no child * add it as content, otherwise if the last child is text,
Sorry, can't even get an option to limit visibility to libxml devs after creating one!
Created attachment 313038 [details] Input file causing crash when compiled with ASAN Added minimized input causing a crash. Required flags on xmllint are --html --push. Also ASAN compiled binary is required.
Created attachment 313039 [details] [review] Patch to fix Added patch to fix
This patch makes tests fail, working out why.
Ok, so most of the time, it is not coming from ctxt->input, so a patch should check it in the places it calls it with hard coded 1 to see that there is indeed another character! Patch to follow in a bit.
New patch (replaces old one): diff --git a/HTMLparser.c b/HTMLparser.c index 19c10c3..5d1ed27 100644 --- a/HTMLparser.c +++ b/HTMLparser.c @@ -5723,7 +5723,8 @@ htmlParseTryOrFinish(htmlParserCtxtPtr ctxt, int terminate) { if ((avail == 1) && (terminate)) { cur = in->cur[0]; if ((cur != '<') && (cur != '&')) { - if (ctxt->sax != NULL) { + if (ctxt->sax != NULL && + &cur + 1 < in->end) { if (IS_BLANK_CH(cur)) { if (ctxt->keepBlanks) { if (ctxt->sax->characters != NULL)
Created attachment 313046 [details] [review] Patch to fix (attempt #2)
Created attachment 313047 [details] Input file causing crash when compiled with ASAN Whoops, obsoleted the wrong one!
Ok, so now this fixes a crashing test input where it now no longer puts in a new line. Looking into this. $ ./xmllint --html --push test/HTML/autoclose.html <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body><hr></body></html> $ cat result/HTML/autoclose.html <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body> <hr> </body></html>
Played round a bit, and found that this patch didn't crash, and lets the test cases pass. I guess ASAN just needs to know that the input is in->cur instead of copying to a different scope. diff --git a/HTMLparser.c b/HTMLparser.c index 19c10c3..0cc8225 100644 --- a/HTMLparser.c +++ b/HTMLparser.c @@ -5728,17 +5728,17 @@ htmlParseTryOrFinish(htmlParserCtxtPtr ctxt, int terminate) { if (ctxt->keepBlanks) { if (ctxt->sax->characters != NULL) ctxt->sax->characters( - ctxt->userData, &cur, 1); + ctxt->userData, &in->cur[0], 1); } else { if (ctxt->sax->ignorableWhitespace != NULL) ctxt->sax->ignorableWhitespace( - ctxt->userData, &cur, 1); + ctxt->userData, &in->cur[0], 1); } } else { htmlCheckParagraph(ctxt); if (ctxt->sax->characters != NULL) ctxt->sax->characters( - ctxt->userData, &cur, 1); + ctxt->userData, &in->cur[0], 1); } } ctxt->token = 0;
Created attachment 313049 [details] [review] Patch to fix (attempt #3) Should be the final patch, as tests work, and no crash.
Hey Daniel, Just checking you've seen this. I can't limit the visibility and it still is showing to the public. Got a few more I want to file as well. Cheers, Hugh
CVE Assigned - CVE-2015-8242
Fixed upstream, in commit 8fb4a770075628d6441fb17a1e435100e2f3b1a2 and part of 2.9.3 security release. Patch unmodified as #3 is really the correct way to fix it. thanks Hugh ! Daniel