GNOME Bugzilla – Bug 764615
libxml2 heap-buffer-overflow in htmlCurrentChar
Last modified: 2018-08-17 15:40:00 UTC
Placeholder for security bug.
================================================================= ==48940==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100001a100 at pc 0x0001000feffc bp 0x7fff5fbfed30 sp 0x7fff5fbfed28 READ of size 1 at 0x62100001a100 thread T0 #0 0x1000feffb in htmlCurrentChar (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xfbffb) #1 0x1000ffe60 in htmlParseCharDataInternal (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xfce60) #2 0x1000eafe3 in htmlParseContentInternal (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xe7fe3) #3 0x1000ee273 in htmlParseDocument (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xeb273) #4 0x1000fdaf6 in htmlDoRead (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xfaaf6) #5 0x100000c0c in main libxml_internal/libxml2/parse2.cpp:34:11 LLVMSymbolizer: error reading file: No object file for requested architecture. #6 0x7fff992735ac (/usr/lib/system/libdyld.dylib+0x35ac) #7 0x1 (libxml_internal/libxml2/./parse2cpp+0x1) 0x62100001a100 is located 0 bytes to the right of 4096-byte region [0x621000019100,0x62100001a100) allocated by thread T0 here: #0 0x1004cab20 in wrap_malloc (/usr/local/lib/clang/3.9.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib+0x4ab20) #1 0x1001ae4a3 in xmlBufCreate (libxml_internal/libxml2/.libs/libxml2.2.dylib+0x1ab4a3) #2 0x10001dccc in xmlSwitchInputEncodingInt (libxml_internal/libxml2/.libs/libxml2.2.dylib+0x1accc) #3 0x10001e04e in xmlSwitchToEncoding (libxml_internal/libxml2/.libs/libxml2.2.dylib+0x1b04e) #4 0x1000fee81 in htmlCurrentChar (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xfbe81) #5 0x1000ffe60 in htmlParseCharDataInternal (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xfce60) #6 0x1000eafe3 in htmlParseContentInternal (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xe7fe3) #7 0x1000ee273 in htmlParseDocument (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xeb273) #8 0x1000fdaf6 in htmlDoRead (libxml_internal/libxml2/.libs/libxml2.2.dylib+0xfaaf6) #9 0x100000c0c in main libxml_internal/libxml2/parse2.cpp:34:11 LLVMSymbolizer: error reading file: No object file for requested architecture. #10 0x7fff992735ac (/usr/lib/system/libdyld.dylib+0x35ac) #11 0x1 (libxml_internal/libxml2/./parse2cpp+0x1)
Created attachment 325399 [details] PoC (not reduced)
Steps to reproduce (with ASan): ./xmllint --html --memory crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html Does not reproduce with: ./xmllint --html crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html ./xmllint --html --push crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html ./xmllint --html --pushsmall crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html NOTE: Parsed output for "xmllint --html" vs "xmlllint --html --push[small]" differs, indicating that there is a bug in the HTML parser that causes this fuzzer document to be parsed differently depending on the mode. For that reason, we can't land a regression test in test/HTML/764615.html for this because one test result can't match two different expected outputs.
Note that runtest doesn't currently have a mode to test htmlReadMemory(), so I filed: Bug 764651: runtest should test htmlReadMemory() via "HTML regression tests on memory" suite <https://bugzilla.gnome.org/show_bug.cgi?id=764651> That bug has a patch attached that implements this test mode, but there are one or two additional bugs that need to be fixed before it lands (since two tests fail in that test mode).
Created attachment 325441 [details] [review] Patch v1 Patch without regression test. The PoC can't be used as a regression test because parsing results differ between "HTML regression tests" and "Push HTML regression tests" in runtest (see Comment #3). Also, runtest won't exercise the broken code path because it doesn't test htmlReadMemory() (see Comment #4). I don't think Pranjal or I attempted to reduce this particular test case.
(In reply to David Kilzer from comment #5) > Created attachment 325441 [details] [review] [review] > Patch v1 > > Patch without regression test. > > The PoC can't be used as a regression test because parsing results differ > between "HTML regression tests" and "Push HTML regression tests" in runtest > (see Comment #3). > > Also, runtest won't exercise the broken code path because it doesn't test > htmlReadMemory() (see Comment #4). > > I don't think Pranjal or I attempted to reduce this particular test case. This fix is not a complete fix, an may have caused regressions (or simply shifted the issue). We have additional fuzzer test cases that still trigger the assert() added with this patch, and leads to an infinite loop when parsing. In a nutshell, xmlParserInputGrow() does not manage xmlParserInputPtr state properly when xmlParserInputBufferGrow() hits parsing errors. Pranjal and I are still investigating.
Created attachment 326966 [details] PoC #1 triggering assert() with Patch v1
Created attachment 326967 [details] PoC #2 triggering assert() with Patch v1
Created attachment 327191 [details] [review] Patch v2 Much better patch that addresses the crashes in all the PoCs.
(In reply to David Kilzer from comment #9) > Created attachment 327191 [details] [review] [review] > Patch v2 > > Much better patch that addresses the crashes in all the PoCs. Note that these two command-line invocations produce different results, though, which appears to be a different bug (but orthogonal to this issue) with Attachment #325399 [details]: ./xmllint --html crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html ./xmllint --html --memory crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html Because of this, I can't add crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html as a test case. Attachment #326966 [details] also exhibits different results using "xmllint --html" vs. "xmllint --html --memory". Attachment #326967 [details] produces the same results for both.
(In reply to David Kilzer from comment #10) > (In reply to David Kilzer from comment #9) > > Created attachment 327191 [details] [review] [review] [review] > > Patch v2 > > > > Much better patch that addresses the crashes in all the PoCs. > > Note that these two command-line invocations produce different results, > though, which appears to be a different bug (but orthogonal to this issue) > with Attachment #325399 [details]: > > ./xmllint --html crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html > ./xmllint --html --memory crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html > > Because of this, I can't add > crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html as a test case. > > Attachment #326966 [details] also exhibits different results using "xmllint > --html" vs. "xmllint --html --memory". > > Attachment #326967 [details] produces the same results for both. I forgot to mention: I think the reason the results are different between "--html" and "--html --memory" is that one command assumes the XML document being parsed is UTF-8 (until proven otherwise) before it even starts looking at the contents, while the other command does not. I don't recall which is which, but stepping through the initialization code for each document parser will give the answer.
(In reply to David Kilzer from comment #9) > Created attachment 327191 [details] [review] [review] > Patch v2 > > Much better patch that addresses the crashes in all the PoCs. One way that xmlSwitchInputEncodingInt() could be improved is to "undo" the attempt at switching input encoding when it fails. I'm not sure what the implications of this are, though, and that change is much more risky than this one (especially if you're porting the change back to older versions of libxml2), so we did not attempt such a change.
The follow-up fix is tracked by Bug 765468. Please hold this fix and Bug 765468 until July.
David, w.r.t. Bug 765468 see the fix I just provided, and please comment on my request to be able to push that one, as the function is broken it doesn't follow its documented behaviour in case of conversion error, thanks, Daniel
(In reply to David Kilzer from comment #12) > (In reply to David Kilzer from comment #9) > > Created attachment 327191 [details] [review] [review] [review] > > Patch v2 > > > > Much better patch that addresses the crashes in all the PoCs. > > One way that xmlSwitchInputEncodingInt() could be improved is to "undo" the > attempt at switching input encoding when it fails. I'm not sure what the > implications of this are, though, and that change is much more risky than > this one (especially if you're porting the change back to older versions of > libxml2), so we did not attempt such a change. As it turns out, Bug 765468 requires us to make the above change (undo-ing the attempt to switch input encoding) to fix another PoC found by fuzzing. (In reply to David Kilzer from comment #13) > The follow-up fix is tracked by Bug 765468. More specifically, the fix for Bug 765468 depends on Attachment 327191 [details] (Patch v2) being applied first. (In reply to Daniel Veillard from comment #14) > David, w.r.t. Bug 765468 see the fix I just provided, and please comment > on my request to be able to push that one, as the function is broken it > doesn't follow its documented behaviour in case of conversion error, Okay, it sounds like we should probably dupe Bug 765468 to this one and combine the patches. I'll post the fix for Bug 765468 in a few hours so you can look, then post a combined one here.
Created attachment 328069 [details] [review] Patch v2 combined with fix for Bug 765468 This is Patch v2 (Attachment 327191 [details]) combined with fix for Bug 765468 (Attachment 328068 [details]). I'm not terribly happy with the dual "if (nbchars < 0)" checks. It may be clearer just to have one check and duplicate the xmlBufResetInput() calls instead.
*** Bug 763686 has been marked as a duplicate of this bug. ***
*** Bug 765468 has been marked as a duplicate of this bug. ***
*** Bug 775065 has been marked as a duplicate of this bug. ***
*** Bug 769185 has been marked as a duplicate of this bug. ***
*** Bug 769187 has been marked as a duplicate of this bug. ***
*** Bug 769186 has been marked as a duplicate of this bug. ***
Nick landed a partial fix for this bug here: https://git.gnome.org/browse/libxml2/commit/?id=f9e7997e803457b714352c4d51a96104ae298d94
David, I tried to reproduce, basically "make asan", then ran the ./xmllint --html --memory crash-75c7037618af3ab444fe6562deec83ebb5502fbf.html command, no complain from asan as far as I can tell. Same for the two other reproducers crash-46da6... and crash-c728. Is Asan just failing on me or did we ended up plugging all holes for this ? Daniel
Let's assume this to be fixed. Otherwise, please reopen.