After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 764615 - libxml2 heap-buffer-overflow in htmlCurrentChar
libxml2 heap-buffer-overflow in htmlCurrentChar
Status: RESOLVED FIXED
Product: libxml2
Classification: Platform
Component: htmlparser
git master
Other All
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
: 763686 765468 769185 769186 769187 775065 (view as bug list)
Depends on:
Blocks: 765977
 
 
Reported: 2016-04-04 20:49 UTC by David Kilzer
Modified: 2018-08-17 15:40 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
PoC (not reduced) (3.83 KB, text/html)
2016-04-05 03:10 UTC, David Kilzer
  Details
Patch v1 (2.92 KB, patch)
2016-04-05 15:37 UTC, David Kilzer
none Details | Review
PoC #1 triggering assert() with Patch v1 (38.78 KB, text/html)
2016-04-28 20:52 UTC, David Kilzer
  Details
PoC #2 triggering assert() with Patch v1 (128 bytes, text/html)
2016-04-28 20:52 UTC, David Kilzer
  Details
Patch v2 (1.41 KB, patch)
2016-05-03 04:13 UTC, David Kilzer
none Details | Review
Patch v2 combined with fix for Bug 765468 (12.09 KB, patch)
2016-05-17 12:18 UTC, David Kilzer
none Details | Review

Description David Kilzer 2016-04-04 20:49:50 UTC
Placeholder for security bug.
Comment 1 David Kilzer 2016-04-05 03:09:36 UTC
=================================================================
==48940==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100001a100 at pc 0x0001000feffc bp 0x7fff5fbfed30 sp 0x7fff5fbfed28
READ of size 1 at 0x62100001a100 thread T0
    #0 0x1000feffb in htmlCurrentChar (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xfbffb)
    #1 0x1000ffe60 in htmlParseCharDataInternal (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xfce60)
    #2 0x1000eafe3 in htmlParseContentInternal (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xe7fe3)
    #3 0x1000ee273 in htmlParseDocument (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xeb273)
    #4 0x1000fdaf6 in htmlDoRead (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xfaaf6)
    #5 0x100000c0c in main libxml_internal/libxml2/parse2.cpp:34:11
LLVMSymbolizer: error reading file: No object file for requested architecture.
    #6 0x7fff992735ac  (/usr/lib/system/libdyld.dylib+0x35ac)
    #7 0x1  (libxml_internal/libxml2/./parse2cpp+0x1)

0x62100001a100 is located 0 bytes to the right of 4096-byte region [0x621000019100,0x62100001a100)
allocated by thread T0 here:
    #0 0x1004cab20 in wrap_malloc (/usr/local/lib/clang/3.9.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib+0x4ab20)
    #1 0x1001ae4a3 in xmlBufCreate (libxml_internal/libxml2/.libs/libxml2.2.dylib+0x1ab4a3)
    #2 0x10001dccc in xmlSwitchInputEncodingInt (libxml_internal/libxml2/.libs/libxml2.2.dylib+0x1accc)
    #3 0x10001e04e in xmlSwitchToEncoding (libxml_internal/libxml2/.libs/libxml2.2.dylib+0x1b04e)
    #4 0x1000fee81 in htmlCurrentChar (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xfbe81)
    #5 0x1000ffe60 in htmlParseCharDataInternal (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xfce60)
    #6 0x1000eafe3 in htmlParseContentInternal (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xe7fe3)
    #7 0x1000ee273 in htmlParseDocument (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xeb273)
    #8 0x1000fdaf6 in htmlDoRead (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xfaaf6)
    #9 0x100000c0c in main libxml_internal/libxml2/parse2.cpp:34:11
LLVMSymbolizer: error reading file: No object file for requested architecture.
    #10 0x7fff992735ac  (/usr/lib/system/libdyld.dylib+0x35ac)
    #11 0x1  (libxml_internal/libxml2/./parse2cpp+0x1)
Comment 2 David Kilzer 2016-04-05 03:10:24 UTC
Created attachment 325399 [details]
PoC (not reduced)
Comment 3 David Kilzer 2016-04-05 15:08:08 UTC
Steps to reproduce (with ASan):
./xmllint --html --memory crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html

Does not reproduce with:
./xmllint --html crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html
./xmllint --html --push crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html
./xmllint --html --pushsmall crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html

NOTE: Parsed output for "xmllint --html" vs "xmlllint --html --push[small]" differs, indicating that there is a bug in the HTML parser that causes this fuzzer document to be parsed differently depending on the mode.  For that reason, we can't land a regression test in test/HTML/764615.html for this because one test result can't match two different expected outputs.
Comment 4 David Kilzer 2016-04-05 15:09:40 UTC
Note that runtest doesn't currently have a mode to test htmlReadMemory(), so I filed:

Bug 764651: runtest should test htmlReadMemory() via "HTML regression tests on memory" suite
<https://bugzilla.gnome.org/show_bug.cgi?id=764651>

That bug has a patch attached that implements this test mode, but there are one or two additional bugs that need to be fixed before it lands (since two tests fail in that test mode).
Comment 5 David Kilzer 2016-04-05 15:37:15 UTC
Created attachment 325441 [details] [review]
Patch v1

Patch without regression test.

The PoC can't be used as a regression test because parsing results differ between "HTML regression tests" and "Push HTML regression tests" in runtest (see Comment #3).

Also, runtest won't exercise the broken code path because it doesn't test htmlReadMemory() (see Comment #4).

I don't think Pranjal or I attempted to reduce this particular test case.
Comment 6 David Kilzer 2016-04-28 20:51:47 UTC
(In reply to David Kilzer from comment #5)
> Created attachment 325441 [details] [review] [review]
> Patch v1
> 
> Patch without regression test.
> 
> The PoC can't be used as a regression test because parsing results differ
> between "HTML regression tests" and "Push HTML regression tests" in runtest
> (see Comment #3).
> 
> Also, runtest won't exercise the broken code path because it doesn't test
> htmlReadMemory() (see Comment #4).
> 
> I don't think Pranjal or I attempted to reduce this particular test case.

This fix is not a complete fix, an may have caused regressions (or simply shifted the issue).

We have additional fuzzer test cases that still trigger the assert() added with this patch, and leads to an infinite loop when parsing.  In a nutshell, xmlParserInputGrow() does not manage xmlParserInputPtr state properly when xmlParserInputBufferGrow() hits parsing errors.

Pranjal and I are still investigating.
Comment 7 David Kilzer 2016-04-28 20:52:25 UTC
Created attachment 326966 [details]
PoC #1 triggering assert() with Patch v1
Comment 8 David Kilzer 2016-04-28 20:52:44 UTC
Created attachment 326967 [details]
PoC #2 triggering assert() with Patch v1
Comment 9 David Kilzer 2016-05-03 04:13:36 UTC
Created attachment 327191 [details] [review]
Patch v2

Much better patch that addresses the crashes in all the PoCs.
Comment 10 David Kilzer 2016-05-03 04:19:42 UTC
(In reply to David Kilzer from comment #9)
> Created attachment 327191 [details] [review] [review]
> Patch v2
> 
> Much better patch that addresses the crashes in all the PoCs.

Note that these two command-line invocations produce different results, though, which appears to be a different bug (but orthogonal to this issue) with Attachment #325399 [details]:

./xmllint --html crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html
./xmllint --html --memory crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html

Because of this, I can't add crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html as a test case.

Attachment #326966 [details] also exhibits different results using "xmllint --html" vs. "xmllint --html --memory".

Attachment #326967 [details] produces the same results for both.
Comment 11 David Kilzer 2016-05-03 10:35:50 UTC
(In reply to David Kilzer from comment #10)
> (In reply to David Kilzer from comment #9)
> > Created attachment 327191 [details] [review] [review] [review]
> > Patch v2
> > 
> > Much better patch that addresses the crashes in all the PoCs.
> 
> Note that these two command-line invocations produce different results,
> though, which appears to be a different bug (but orthogonal to this issue)
> with Attachment #325399 [details]:
> 
> ./xmllint --html crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html
> ./xmllint --html --memory crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html
> 
> Because of this, I can't add
> crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html as a test case.
> 
> Attachment #326966 [details] also exhibits different results using "xmllint
> --html" vs. "xmllint --html --memory".
> 
> Attachment #326967 [details] produces the same results for both.

I forgot to mention:  I think the reason the results are different between "--html" and "--html --memory" is that one command assumes the XML document being parsed is UTF-8 (until proven otherwise) before it even starts looking at the contents, while the other command does not.  I don't recall which is which, but stepping through the initialization code for each document parser will give the answer.
Comment 12 David Kilzer 2016-05-03 10:39:05 UTC
(In reply to David Kilzer from comment #9)
> Created attachment 327191 [details] [review] [review]
> Patch v2
> 
> Much better patch that addresses the crashes in all the PoCs.

One way that xmlSwitchInputEncodingInt() could be improved is to "undo" the attempt at switching input encoding when it fails.  I'm not sure what the implications of this are, though, and that change is much more risky than this one (especially if you're porting the change back to older versions of libxml2), so we did not attempt such a change.
Comment 13 David Kilzer 2016-05-15 02:44:51 UTC
The follow-up fix is tracked by Bug 765468.

Please hold this fix and Bug 765468 until July.
Comment 14 Daniel Veillard 2016-05-16 02:32:36 UTC
David, w.r.t. Bug 765468 see the fix I just provided, and please comment
on my request to be able to push that one, as the function is broken it
doesn't follow its documented behaviour in case of conversion error,

 thanks,

Daniel
Comment 15 David Kilzer 2016-05-17 11:20:57 UTC
(In reply to David Kilzer from comment #12)
> (In reply to David Kilzer from comment #9)
> > Created attachment 327191 [details] [review] [review] [review]
> > Patch v2
> > 
> > Much better patch that addresses the crashes in all the PoCs.
> 
> One way that xmlSwitchInputEncodingInt() could be improved is to "undo" the
> attempt at switching input encoding when it fails.  I'm not sure what the
> implications of this are, though, and that change is much more risky than
> this one (especially if you're porting the change back to older versions of
> libxml2), so we did not attempt such a change.

As it turns out, Bug 765468 requires us to make the above change (undo-ing the attempt to switch input encoding) to fix another PoC found by fuzzing.

(In reply to David Kilzer from comment #13)
> The follow-up fix is tracked by Bug 765468.

More specifically, the fix for Bug 765468 depends on Attachment 327191 [details] (Patch v2) being applied first.

(In reply to Daniel Veillard from comment #14)
> David, w.r.t. Bug 765468 see the fix I just provided, and please comment
> on my request to be able to push that one, as the function is broken it
> doesn't follow its documented behaviour in case of conversion error,

Okay, it sounds like we should probably dupe Bug 765468 to this one and combine the patches.

I'll post the fix for Bug 765468 in a few hours so you can look, then post a combined one here.
Comment 16 David Kilzer 2016-05-17 12:18:37 UTC
Created attachment 328069 [details] [review]
Patch v2 combined with fix for Bug 765468

This is Patch v2 (Attachment 327191 [details]) combined with fix for Bug 765468 (Attachment 328068 [details]).

I'm not terribly happy with the dual "if (nbchars < 0)" checks.  It may be clearer just to have one check and duplicate the xmlBufResetInput() calls instead.
Comment 17 David Kilzer 2017-02-08 20:52:41 UTC
*** Bug 763686 has been marked as a duplicate of this bug. ***
Comment 18 David Kilzer 2017-02-08 20:54:25 UTC
*** Bug 765468 has been marked as a duplicate of this bug. ***
Comment 19 David Kilzer 2017-02-08 20:57:31 UTC
*** Bug 775065 has been marked as a duplicate of this bug. ***
Comment 20 David Kilzer 2017-02-09 00:50:39 UTC
*** Bug 769185 has been marked as a duplicate of this bug. ***
Comment 21 David Kilzer 2017-02-09 01:14:38 UTC
*** Bug 769187 has been marked as a duplicate of this bug. ***
Comment 22 David Kilzer 2017-03-19 09:25:51 UTC
*** Bug 769186 has been marked as a duplicate of this bug. ***
Comment 23 David Kilzer 2017-06-12 02:09:12 UTC
Nick landed a partial fix for this bug here:

https://git.gnome.org/browse/libxml2/commit/?id=f9e7997e803457b714352c4d51a96104ae298d94
Comment 24 Daniel Veillard 2017-07-25 07:54:03 UTC
David, I tried to reproduce, basically "make asan", then ran the
 ./xmllint  --html --memory crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html

command, no complain from asan as far as I can tell. Same for the two other
reproducers crash-46da6... and crash-c728.

Is Asan just failing on me or did we ended up plugging all holes for this ?

Daniel
Comment 25 Tobias Mueller 2018-08-17 15:40:00 UTC
Let's assume this to be fixed. Otherwise, please reopen.