After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 756372 - Buffer overead with HTML parser in push mode in xmlSAX2TextNode, causes segfault when compiled with ASAN
Buffer overead with HTML parser in push mode in xmlSAX2TextNode, causes segfa...
Status: RESOLVED FIXED
Product: libxml2
Classification: Platform
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2015-10-11 03:18 UTC by Hugh Davenport
Modified: 2015-11-21 04:25 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Input file causing crash when compiled with ASAN (3 bytes, text/plain)
2015-10-11 03:26 UTC, Hugh Davenport
  Details
Patch to fix (1.12 KB, patch)
2015-10-11 03:27 UTC, Hugh Davenport
none Details | Review
Patch to fix (attempt #2) (1.03 KB, patch)
2015-10-11 06:33 UTC, Hugh Davenport
none Details | Review
Input file causing crash when compiled with ASAN (3 bytes, text/plain)
2015-10-11 06:34 UTC, Hugh Davenport
  Details
Patch to fix (attempt #3) (1.30 KB, patch)
2015-10-11 07:12 UTC, Hugh Davenport
none Details | Review

Description Hugh Davenport 2015-10-11 03:18:11 UTC
This was found on git master, commit cf77e60515045bdd66f2c59c69a06e603b470eae

When compiled with ASAN, it gives the following crash

 $ ./xmllint --html --push ~/fuzzing/xml-html-push-crashes-tmin/id\:000000*
/root/fuzzing/xml-html-push-crashes-tmin/id:000000,sig:06,src:000000,op:flip1,pos:5:1: HTML parser error : htmlParseStartTag: invalid element name
<>0
 ^
=================================================================
==17867== ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe56c100c1 at pc 0xb6d62c bp 0x7ffe56c0fdd0 sp 0x7ffe56c0fdc8
READ of size 1 at 0x7ffe56c100c1 thread T0
    #0 0xb6d62b in xmlSAX2TextNode /root/libxml2/SAX2.c:1868
    #1 0xb7159a in xmlSAX2Characters /root/libxml2/SAX2.c:2550
    #2 0x6b53ee in htmlParseTryOrFinish /root/libxml2/HTMLparser.c:5740
    #3 0x6b53ee in htmlParseChunk /root/libxml2/HTMLparser.c:6097
    #4 0x416aae in parseAndPrintFile /root/libxml2/xmllint.c:2215
    #5 0x40e119 in main /root/libxml2/xmllint.c:3759
    #6 0x7fa7dec00ec4 (/lib/x86_64-linux-gnu/libc.so.6+0x21ec4)
    #7 0x40fc0c in _start (/root/libxml2/xmllint+0x40fc0c)
Address 0x7ffe56c100c1 is located at offset 33 in frame <htmlParseChunk> of T0's stack:
  This frame has 3 object(s):
    [32, 33) 'cur'
    [96, 136) 'node_info'
    [192, 194) 'chr'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /root/libxml2/SAX2.c:1880 xmlSAX2TextNode
Shadow bytes around the buggy address:
  0x10004ad79fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ad79fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ad79fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ad79ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ad7a000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10004ad7a010: 00 00 00 00 f1 f1 f1 f1[01]f4 f4 f4 f2 f2 f2 f2
  0x10004ad7a020: 00 00 00 00 00 f4 f4 f4 f2 f2 f2 f2 02 f4 f4 f4
  0x10004ad7a030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ad7a040: f1 f1 f1 f1 04 f4 f4 f4 f2 f2 f2 f2 00 f4 f4 f4
  0x10004ad7a050: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ad7a060: 00 00 00 00 00 00 f4 f4 f2 f2 f2 f2 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:     fa
  Heap righ redzone:     fb
  Freed Heap region:     fd
  Stack left redzone:    f1
  Stack mid redzone:     f2
  Stack right redzone:   f3
  Stack partial redzone: f4
  Stack after return:    f5
  Stack use after scope: f8
  Global redzone:        f9
  Global init order:     f6
  Poisoned by user:      f7
  ASan internal:         fe
==17867== ABORTING
Aborted

The testfile I used was generated with afl-fuzz, and then minimized. It's hex representation is as follows:

 $ xxd -g 1 ~/fuzzing/xml-html-push-crashes-tmin/id\:000000*
0000000: 3c 3e 30                                         <>0

Bisected back to the following commit

commit 826bc320206f70fccd2941a77d363e95e8076898
Author: Arnold Hendriks <a.hendriks@b-lex.nl>
Date:   Fri Nov 29 14:12:12 2013 +0800

    Fix HTML push parser to accept HTML_PARSE_NODEFDTD
    
    For https://bugzilla.gnome.org/show_bug.cgi?id=719515
    
    fixes htmlParseTryOrFinish to interpret HTML_PARSE_NODEFDTD,
    and updates xmllint to actually pass --nodefdtd to the push
    version of the HTML parser

diff --git a/HTMLparser.c b/HTMLparser.c
index dd0c1ea..7694962 100644
--- a/HTMLparser.c
+++ b/HTMLparser.c
@@ -5991,7 +5991,7 @@ done:
                ctxt->sax->endDocument(ctxt->userData);
        }
     }
-    if ((ctxt->myDoc != NULL) &&
+    if ((!(ctxt->options & HTML_PARSE_NODEFDTD)) && (ctxt->myDoc != NULL) &&
        ((terminate) || (ctxt->instate == XML_PARSER_EOF) ||
         (ctxt->instate == XML_PARSER_EPILOG))) {
        xmlDtdPtr dtd;
diff --git a/xmllint.c b/xmllint.c
index d69722c..ebd36ab 100644
--- a/xmllint.c
+++ b/xmllint.c
@@ -2206,6 +2206,7 @@ static void parseAndPrintFile(char *filename, xmlParserCtxtPtr
             if (res > 0) {
                 ctxt = htmlCreatePushParserCtxt(NULL, NULL,
                             chars, res, filename, XML_CHAR_ENCODING_NONE);
+                xmlCtxtUseOptions(ctxt, options);
                 while ((res = fread(chars, 1, pushsize, f)) > 0) {
                     htmlParseChunk(ctxt, chars, res, 0);
                 }

Running under gdb:

 $ gdb -ex "run" -ex "backtrace" --args ./xmllint --html --push ~/fuzzing/xml-html-push-crashes-tmin/id\:000000*
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./xmllint...done.
Starting program: /root/libxml2/xmllint --html --push /root/fuzzing/xml-html-push-crashes-tmin/id:000000,sig:06,src:000000,op:flip1,pos:5
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
/root/fuzzing/xml-html-push-crashes-tmin/id:000000,sig:06,src:000000,op:flip1,pos:5:1: HTML parser error : htmlParseStartTag: invalid element name
<>0
 ^
=================================================================
==18254== ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffffffc811 at pc 0xb6d62c bp 0x7fffffffc520 sp 0x7fffffffc518
READ of size 1 at 0x7fffffffc811 thread T0
    #0 0xb6d62b in xmlSAX2TextNode /root/libxml2/SAX2.c:1868
    #1 0xb7159a in xmlSAX2Characters /root/libxml2/SAX2.c:2550
    #2 0x6b53ee in htmlParseTryOrFinish /root/libxml2/HTMLparser.c:5740
    #3 0x6b53ee in htmlParseChunk /root/libxml2/HTMLparser.c:6097
    #4 0x416aae in parseAndPrintFile /root/libxml2/xmllint.c:2215
    #5 0x40e119 in main /root/libxml2/xmllint.c:3759
    #6 0x7ffff4588ec4 (/lib/x86_64-linux-gnu/libc.so.6+0x21ec4)
    #7 0x40fc0c in _start (/root/libxml2/xmllint+0x40fc0c)
Address 0x7fffffffc811 is located at offset 33 in frame <htmlParseChunk> of T0's stack:
  This frame has 3 object(s):
    [32, 33) 'cur'
    [96, 136) 'node_info'
    [192, 194) 'chr'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /root/libxml2/SAX2.c:1880 xmlSAX2TextNode
Shadow bytes around the buggy address:
  0x10007fff78b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff78c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff78d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff78e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff78f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
=>0x10007fff7900: f1 f1[01]f4 f4 f4 f2 f2 f2 f2 00 00 00 00 00 f4
  0x10007fff7910: f4 f4 f2 f2 f2 f2 02 f4 f4 f4 00 00 00 00 00 00
  0x10007fff7920: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f4
  0x10007fff7930: f4 f4 f2 f2 f2 f2 00 f4 f4 f4 f2 f2 f2 f2 00 00
  0x10007fff7940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7950: f4 f4 f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:     fa
  Heap righ redzone:     fb
  Freed Heap region:     fd
  Stack left redzone:    f1
  Stack mid redzone:     f2
  Stack right redzone:   f3
  Stack partial redzone: f4
  Stack after return:    f5
  Stack use after scope: f8
  Global redzone:        f9
  Global init order:     f6
  Poisoned by user:      f7
  ASan internal:         fe
==18254== ABORTING

Program received signal SIGABRT, Aborted.
0x00007ffff459dcc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
  • #0 __GI_raise
    at ../nptl/sysdeps/unix/sysv/linux/raise.c line 56
  • #1 __GI_abort
    at abort.c line 89
  • #2 ??
    from /usr/lib/x86_64-linux-gnu/libasan.so.0
  • #3 ??
    from /usr/lib/x86_64-linux-gnu/libasan.so.0
  • #4 ??
    from /usr/lib/x86_64-linux-gnu/libasan.so.0
  • #5 __asan_report_error
    from /usr/lib/x86_64-linux-gnu/libasan.so.0
  • #6 __asan_report_load1
    from /usr/lib/x86_64-linux-gnu/libasan.so.0
  • #7 xmlSAX2TextNode
    at SAX2.c line 1868
  • #8 xmlSAX2Characters
    at SAX2.c line 2550
  • #9 htmlParseTryOrFinish
    at HTMLparser.c line 5740
  • #10 htmlParseChunk
    at HTMLparser.c line 6097
  • #11 parseAndPrintFile
    at xmllint.c line 2215
  • #12 main
    at xmllint.c line 3759


Looking at htmlParseTryOrFinish, line 5740, we see that it calls a function saved in the sax struct, which happens to be xmlSAX2Characters in this case. Note that the "length" parameter is hard coded to 1. There are similar entries on lines 5730 and 5734 which could be taken on different input paths. If there was not a "lastchild", then the xmlSAX2TextNode function is called, with same length (hard coded to 1). That then tries to read str[len], which is out of bounds if the input has reached end of string. This causes a crash with libxml2 compiled with ASAN. Without ASAN, there is potential to get input that could cause out of bounds memory to be returned to userspace through the use of libxml2, which could be used to cause denial of service attacks, or gain sensitive information.

The fix for this is to test whether reading len characters will overrun the buffer in xmlSAX2Characters, which will then stop this crash and potential security issues. A patch to do so is included below:

diff --git a/SAX2.c b/SAX2.c
index ffef3e1..e711bbc 100644
--- a/SAX2.c
+++ b/SAX2.c
@@ -2523,6 +2523,13 @@ xmlSAX2Characters(void *ctx, const xmlChar *ch, int len)
     xmlGenericError(xmlGenericErrorContext,
            "SAX.xmlSAX2Characters(%.30s, %d)\n", ch, len);
 #endif
+    if (ch + len >= ctxt->input->end) {
+#ifdef DEBUG_SAX_TREE
+       xmlGenericError(xmlGenericErrorContext,
+               "add chars: ctxt->input->cur + len >= ctxt->input->end !\n");
+#endif
+        return;
+    }
     /*
      * Handle the data if any. If there is no child
      * add it as content, otherwise if the last child is text,
Comment 1 Hugh Davenport 2015-10-11 03:19:52 UTC
Sorry, can't even get an option to limit visibility to libxml devs after creating one!
Comment 2 Hugh Davenport 2015-10-11 03:26:57 UTC
Created attachment 313038 [details]
Input file causing crash when compiled with ASAN

Added minimized input causing a crash. Required flags on xmllint are --html --push. Also ASAN compiled binary is required.
Comment 3 Hugh Davenport 2015-10-11 03:27:21 UTC
Created attachment 313039 [details] [review]
Patch to fix

Added patch to fix
Comment 4 Hugh Davenport 2015-10-11 05:39:14 UTC
This patch makes tests fail, working out why.
Comment 5 Hugh Davenport 2015-10-11 06:08:48 UTC
Ok, so most of the time, it is not coming from ctxt->input, so a patch should check it in the places it calls it with hard coded 1 to see that there is indeed another character! Patch to follow in a bit.
Comment 6 Hugh Davenport 2015-10-11 06:31:29 UTC
New patch (replaces old one):

diff --git a/HTMLparser.c b/HTMLparser.c
index 19c10c3..5d1ed27 100644
--- a/HTMLparser.c
+++ b/HTMLparser.c
@@ -5723,7 +5723,8 @@ htmlParseTryOrFinish(htmlParserCtxtPtr ctxt, int terminate) {
                if ((avail == 1) && (terminate)) {
                    cur = in->cur[0];
                    if ((cur != '<') && (cur != '&')) {
-                       if (ctxt->sax != NULL) {
+                       if (ctxt->sax != NULL &&
+                            &cur + 1 < in->end) {
                            if (IS_BLANK_CH(cur)) {
                                if (ctxt->keepBlanks) {
                                    if (ctxt->sax->characters != NULL)
Comment 7 Hugh Davenport 2015-10-11 06:33:03 UTC
Created attachment 313046 [details] [review]
Patch to fix (attempt #2)
Comment 8 Hugh Davenport 2015-10-11 06:34:00 UTC
Created attachment 313047 [details]
Input file causing crash when compiled with ASAN

Whoops, obsoleted the wrong one!
Comment 9 Hugh Davenport 2015-10-11 07:06:21 UTC
Ok, so now this fixes a crashing test input where it now no longer puts in a new line. Looking into this.

$ ./xmllint --html --push test/HTML/autoclose.html 
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><hr></body></html>

$ cat result/HTML/autoclose.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<hr>
</body></html>
Comment 10 Hugh Davenport 2015-10-11 07:12:26 UTC
Played round a bit, and found that this patch didn't crash, and lets the test cases pass. I guess ASAN just needs to know that the input is in->cur instead of copying to a different scope.

diff --git a/HTMLparser.c b/HTMLparser.c
index 19c10c3..0cc8225 100644
--- a/HTMLparser.c
+++ b/HTMLparser.c
@@ -5728,17 +5728,17 @@ htmlParseTryOrFinish(htmlParserCtxtPtr ctxt, int terminate) {
                                if (ctxt->keepBlanks) {
                                    if (ctxt->sax->characters != NULL)
                                        ctxt->sax->characters(
-                                               ctxt->userData, &cur, 1);
+                                               ctxt->userData, &in->cur[0], 1);
                                } else {
                                    if (ctxt->sax->ignorableWhitespace != NULL)
                                        ctxt->sax->ignorableWhitespace(
-                                               ctxt->userData, &cur, 1);
+                                               ctxt->userData, &in->cur[0], 1);
                                }
                            } else {
                                htmlCheckParagraph(ctxt);
                                if (ctxt->sax->characters != NULL)
                                    ctxt->sax->characters(
-                                           ctxt->userData, &cur, 1);
+                                           ctxt->userData, &in->cur[0], 1);
                            }
                        }
                        ctxt->token = 0;
Comment 11 Hugh Davenport 2015-10-11 07:12:59 UTC
Created attachment 313049 [details] [review]
Patch to fix (attempt #3)

Should be the final patch, as tests work, and no crash.
Comment 12 Hugh Davenport 2015-10-14 22:04:20 UTC
Hey Daniel,

Just checking you've seen this. I can't limit the visibility and it still is showing to the public. Got a few more I want to file as well.

Cheers,

Hugh
Comment 13 Gaurav 2015-11-19 02:20:00 UTC
CVE Assigned - CVE-2015-8242
Comment 14 Daniel Veillard 2015-11-21 04:25:37 UTC
Fixed upstream, in commit 8fb4a770075628d6441fb17a1e435100e2f3b1a2
and part of 2.9.3 security release. Patch unmodified as #3 is really
the correct way to fix it.

  thanks Hugh !

Daniel