GNOME Bugzilla – Bug 749115
Heap-buffer overread in libxml2/dict.c on fuzzed html input
Last modified: 2016-05-27 22:04:06 UTC
Git versions of gtk, glib, goffice, gnumeric, libgsf and libxml2. Test case: http://jutaky.com/fuzzing/gnumeric_case_24050_1738.html The HTML file is the test case, use wget or similar to download it. Xmllint doesn't seem to crash or misbehave on this file. ssconvert gnumeric_case_24050_1738.html /tmp/out.gnumeric ==25770==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100007d0fc at pc 0x7fbdb6bae1a9 bp 0x7fff49ff9000 sp 0x7fff49ff8ff0 READ of size 1 at 0x62100007d0fc thread T0 #0 0x7fbdb6bae1a8 in xmlDictComputeFastKey /gnumeric/libxml2/dict.c:448 #1 0x7fbdb6bb06b2 in xmlDictLookup__internal_alias /gnumeric/libxml2/dict.c:848 #2 0x7fbdb69af56b in htmlParseNameComplex /gnumeric/libxml2/HTMLparser.c:2517 #3 0x7fbdb69aeb9d in htmlParseName /gnumeric/libxml2/HTMLparser.c:2483 #4 0x7fbdb69b99b1 in htmlParseDocTypeDecl /gnumeric/libxml2/HTMLparser.c:3398 #5 0x7fbdb69c8264 in htmlParseTryOrFinish /gnumeric/libxml2/HTMLparser.c:5370 #6 0x7fbdb69cccd4 in htmlParseChunk__internal_alias /gnumeric/libxml2/HTMLparser.c:6070 #7 0x7fbd930defa3 in html_file_open /gnumeric/gnumeric/plugins/html/html_read.c:553 #8 0x7fbdb7c063af in go_plugin_loader_module_func_file_open app/go-plugin-loader-module.c:282 #9 0x7fbdb7c0c4fa in go_plugin_file_opener_open app/go-plugin-service.c:685 #10 0x7fbdb7c14550 in go_file_opener_open app/file.c:417 #11 0x7fbdb8af229a in workbook_view_new_from_input /gnumeric/gnumeric/src/workbook-view.c:1278 #12 0x7fbdb8af2734 in workbook_view_new_from_uri /gnumeric/gnumeric/src/workbook-view.c:1337 #13 0x4080cb in convert /gnumeric/gnumeric/src/ssconvert.c:715 #14 0x409439 in main /gnumeric/gnumeric/src/ssconvert.c:903 #15 0x7fbdb122c7ff in __libc_start_main (/usr/lib/libc.so.6+0x207ff) #16 0x4040f8 in _start (/apps/bin/ssconvert+0x4040f8) 0x62100007d0fc is located 4 bytes to the left of 4096-byte region [0x62100007d100,0x62100007e100) allocated by thread T0 here: #0 0x7fbdb95dd7a7 in malloc (/usr/lib/libasan.so.1+0x577a7) #1 0x7fbdb6a88271 in xmlBufCreate /gnumeric/libxml2/buf.c:137 #2 0x7fbdb68b5197 in xmlSwitchInputEncodingInt /gnumeric/libxml2/parserInternals.c:1196 #3 0x7fbdb68b555a in xmlSwitchToEncodingInt /gnumeric/libxml2/parserInternals.c:1272 #4 0x7fbdb68b4a95 in xmlSwitchEncoding__internal_alias /gnumeric/libxml2/parserInternals.c:1100 #5 0x7fbdb69a7e40 in htmlCurrentChar /gnumeric/libxml2/HTMLparser.c:518 #6 0x7fbdb69af2a3 in htmlParseNameComplex /gnumeric/libxml2/HTMLparser.c:2515 #7 0x7fbdb69aeb9d in htmlParseName /gnumeric/libxml2/HTMLparser.c:2483 #8 0x7fbdb69b99b1 in htmlParseDocTypeDecl /gnumeric/libxml2/HTMLparser.c:3398 #9 0x7fbdb69c8264 in htmlParseTryOrFinish /gnumeric/libxml2/HTMLparser.c:5370 #10 0x7fbdb69cccd4 in htmlParseChunk__internal_alias /gnumeric/libxml2/HTMLparser.c:6070 #11 0x7fbd930defa3 in html_file_open /gnumeric/gnumeric/plugins/html/html_read.c:553 #12 0x7fbdb7c063af in go_plugin_loader_module_func_file_open app/go-plugin-loader-module.c:282 #13 0x7fbdb7c0c4fa in go_plugin_file_opener_open app/go-plugin-service.c:685 #14 0x7fbdb7c14550 in go_file_opener_open app/file.c:417 #15 0x7fbdb8af229a in workbook_view_new_from_input /gnumeric/gnumeric/src/workbook-view.c:1278 #16 0x7fbdb8af2734 in workbook_view_new_from_uri /gnumeric/gnumeric/src/workbook-view.c:1337 #17 0x4080cb in convert /gnumeric/gnumeric/src/ssconvert.c:715 #18 0x409439 in main /gnumeric/gnumeric/src/ssconvert.c:903 #19 0x7fbdb122c7ff in __libc_start_main (/usr/lib/libc.so.6+0x207ff) -- Juha Kylmänen
Created attachment 303125 [details] foo.c Small test program
It turns out that this can indeed be provoked by xmllint: # valgrind xmllint --push --html ~/Downloads/gnumeric_case_24050_1738.html ==9637== Memcheck, a memory error detector ==9637== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==9637== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info ==9637== Command: xmllint --push --html /home/welinder/Downloads/gnumeric_case_24050_1738.html ==9637== ==9637== Invalid read of size 1 ==9637== at 0x51798C5: xmlDictComputeFastKey.isra.2 (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x517A4BA: xmlDictLookup (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F1229: htmlParseName (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50AAB72: htmlParseDocTypeDecl (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F6C3C: htmlParseChunk (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x111D56: ??? (in /usr/bin/xmllint) ==9637== by 0x10F329: ??? (in /usr/bin/xmllint) ==9637== by 0x5404EC4: (below main) (libc-start.c:287) ==9637== Address 0x631a9ec is 4 bytes before a block of size 4,096 alloc'd ==9637== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==9637== by 0x512AA81: xmlBufCreate (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50B2733: xmlSwitchInputEncodingInt (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50B3B18: xmlSwitchEncoding (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F0C44: htmlCurrentChar (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F0FE4: htmlParseName (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50AAB72: htmlParseDocTypeDecl (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F6C3C: htmlParseChunk (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x111D56: ??? (in /usr/bin/xmllint) ==9637== by 0x10F329: ??? (in /usr/bin/xmllint) ==9637== by 0x5404EC4: (below main) (libc-start.c:287) ==9637== ==9637== Invalid read of size 1 ==9637== at 0x5179906: xmlDictComputeFastKey.isra.2 (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x517A4BA: xmlDictLookup (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F1229: htmlParseName (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50AAB72: htmlParseDocTypeDecl (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F6C3C: htmlParseChunk (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x111D56: ??? (in /usr/bin/xmllint) ==9637== by 0x10F329: ??? (in /usr/bin/xmllint) ==9637== by 0x5404EC4: (below main) (libc-start.c:287) ==9637== Address 0x631a9ef is 1 bytes before a block of size 4,096 alloc'd ==9637== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==9637== by 0x512AA81: xmlBufCreate (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50B2733: xmlSwitchInputEncodingInt (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50B3B18: xmlSwitchEncoding (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F0C44: htmlCurrentChar (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F0FE4: htmlParseName (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50AAB72: htmlParseDocTypeDecl (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F6C3C: htmlParseChunk (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x111D56: ??? (in /usr/bin/xmllint) ==9637== by 0x10F329: ??? (in /usr/bin/xmllint) ==9637== by 0x5404EC4: (below main) (libc-start.c:287) ==9637== ==9637== Invalid read of size 1 ==9637== at 0x517990D: xmlDictComputeFastKey.isra.2 (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x517A4BA: xmlDictLookup (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F1229: htmlParseName (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50AAB72: htmlParseDocTypeDecl (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F6C3C: htmlParseChunk (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x111D56: ??? (in /usr/bin/xmllint) ==9637== by 0x10F329: ??? (in /usr/bin/xmllint) ==9637== by 0x5404EC4: (below main) (libc-start.c:287) ==9637== Address 0x631a9ee is 2 bytes before a block of size 4,096 alloc'd ==9637== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==9637== by 0x512AA81: xmlBufCreate (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50B2733: xmlSwitchInputEncodingInt (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50B3B18: xmlSwitchEncoding (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F0C44: htmlCurrentChar (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F0FE4: htmlParseName (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50AAB72: htmlParseDocTypeDecl (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F6C3C: htmlParseChunk (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x111D56: ??? (in /usr/bin/xmllint) ==9637== by 0x10F329: ??? (in /usr/bin/xmllint) ==9637== by 0x5404EC4: (below main) (libc-start.c:287) ==9637== ==9637== Invalid read of size 1 ==9637== at 0x5179914: xmlDictComputeFastKey.isra.2 (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x517A4BA: xmlDictLookup (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F1229: htmlParseName (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50AAB72: htmlParseDocTypeDecl (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F6C3C: htmlParseChunk (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x111D56: ??? (in /usr/bin/xmllint) ==9637== by 0x10F329: ??? (in /usr/bin/xmllint) ==9637== by 0x5404EC4: (below main) (libc-start.c:287) ==9637== Address 0x631a9ed is 3 bytes before a block of size 4,096 alloc'd ==9637== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==9637== by 0x512AA81: xmlBufCreate (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50B2733: xmlSwitchInputEncodingInt (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50B3B18: xmlSwitchEncoding (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F0C44: htmlCurrentChar (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F0FE4: htmlParseName (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50AAB72: htmlParseDocTypeDecl (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F6C3C: htmlParseChunk (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x111D56: ??? (in /usr/bin/xmllint) ==9637== by 0x10F329: ??? (in /usr/bin/xmllint) ==9637== by 0x5404EC4: (below main) (libc-start.c:287) ==9637== ==9637== Invalid read of size 2 ==9637== at 0x4C2F7E0: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==9637== by 0x517983D: xmlDictAddString.isra.0 (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x517A3F1: xmlDictLookup (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F1229: htmlParseName (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50AAB72: htmlParseDocTypeDecl (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F6C3C: htmlParseChunk (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x111D56: ??? (in /usr/bin/xmllint) ==9637== by 0x10F329: ??? (in /usr/bin/xmllint) ==9637== by 0x5404EC4: (below main) (libc-start.c:287) ==9637== Address 0x631a9ec is 4 bytes before a block of size 4,096 alloc'd ==9637== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==9637== by 0x512AA81: xmlBufCreate (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50B2733: xmlSwitchInputEncodingInt (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50B3B18: xmlSwitchEncoding (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F0C44: htmlCurrentChar (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F0FE4: htmlParseName (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50AAB72: htmlParseDocTypeDecl (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x50F6C3C: htmlParseChunk (in /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.1) ==9637== by 0x111D56: ??? (in /usr/bin/xmllint) ==9637== by 0x10F329: ??? (in /usr/bin/xmllint) ==9637== by 0x5404EC4: (below main) (libc-start.c:287) ==9637== /home/welinder/Downloads/gnumeric_case_24050_1738.html:1: HTML parser error : DOCTYPE improperly terminated â </</body></html> ^ <!DOCTYPE > <html><body><p>/</p></body></html> ==9637== ==9637== HEAP SUMMARY: ==9637== in use at exit: 0 bytes in 0 blocks ==9637== total heap usage: 64 allocs, 64 frees, 31,510 bytes allocated ==9637== ==9637== All heap blocks were freed -- no leaks are possible ==9637== ==9637== For counts of detected and suppressed errors, rerun with: -v ==9637== ERROR SUMMARY: 6 errors from 5 contexts (suppressed: 0 from 0)
Created attachment 303126 [details] Test file (bogus html) A copy of the fuzzed html file.
Ah, I failed with the xmllint command. I didn't use the --html switch. $ xmllint --html gnumeric_case_24050_1738.html ================================================================= ==32204==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100001b8fc at pc 0x7f5e2180f1a9 bp 0x7ffc71785490 sp 0x7ffc71785480 READ of size 1 at 0x62100001b8fc thread T0 #0 0x7f5e2180f1a8 in xmlDictComputeFastKey gnumeric/libxml2/dict.c:448 #1 0x7f5e218116b2 in xmlDictLookup__internal_alias gnumeric/libxml2/dict.c:848 #2 0x7f5e2161056b in htmlParseNameComplex gnumeric/libxml2/HTMLparser.c:2517 #3 0x7f5e2160fb9d in htmlParseName gnumeric/libxml2/HTMLparser.c:2483 #4 0x7f5e2161a9b1 in htmlParseDocTypeDecl gnumeric/libxml2/HTMLparser.c:3398 #5 0x7f5e21625591 in htmlParseDocument__internal_alias gnumeric/libxml2/HTMLparser.c:4717 #6 0x7f5e21630e90 in htmlDoRead gnumeric/libxml2/HTMLparser.c:6707 #7 0x7f5e21631082 in htmlReadFile__internal_alias gnumeric/libxml2/HTMLparser.c:6765 #8 0x40bde0 in parseAndPrintFile gnumeric/libxml2/xmllint.c:2248 #9 0x413c39 in main gnumeric/libxml2/xmllint.c:3759 #10 0x7f5e207aa7ff in __libc_start_main (/usr/lib/libc.so.6+0x207ff) #11 0x404bd8 in _start (apps/bin/xmllint+0x404bd8)
valgrind complains too, I should be able to reproduce and fix ! thinkpad2:~/XML -> valgrind xmllint --html gnumeric_case_24050_1738.html ==25765== Invalid read of size 1 ==25765== at 0x4CAAD5: xmlDictComputeFastKey.isra.1 (dict.c:448) ==25765== by 0x4CB6BC: xmlDictLookup (dict.c:848) ==25765== by 0x44A7EA: htmlParseNameComplex (HTMLparser.c:2517) ==25765== by 0x44A7EA: htmlParseName (HTMLparser.c:2483) ==25765== by 0x402E02: htmlParseDocTypeDecl (HTMLparser.c:3398) ==25765== by 0x44E6CB: htmlParseDocument (HTMLparser.c:4717) ==25765== by 0x450C3E: htmlDoRead (HTMLparser.c:6707) ==25765== by 0x406992: parseAndPrintFile (xmllint.c:2248) ==25765== by 0x403EFC: main (xmllint.c:3759) ==25765== Address 0x4c4083c is 4 bytes before a block of size 4,096 alloc'd thanks ! Daniel
Has any work been done on this? I've been trying to perform some fuzz-testing of libxml2 on my own lately, but this issue is triggered extremely frequently and might cover other bugs. If necessary, I can provide more test cases.
Hi Jurczyk, how you are generating fuzz input files?
FTR, this has been assigned CVE-2015-8806 (https://marc.info/?l=oss-security&m=145451956918740&w=2)
I tested and this is a duplicate of https://bugzilla.gnome.org/show_bug.cgi?id=758605 which has its own CVE already CVE-2016-1839, so you may want to cancel that new one, thanks, Daniel *** This bug has been marked as a duplicate of bug 758605 ***
I cannot confirm that as bug 758605 is restricted. Given the age of this bug that restriction does not seem proper.