GNOME Bugzilla – Bug 642065
Crashes caused by character encoding issues using HTML push parser
Last modified: 2021-07-05 13:22:47 UTC
Created attachment 180620 [details] Test program and sample input to cause this bug I was seeing sporadic crashes on certain HTML documents that happened to have a gb2312 encoding. The crashes appear to be caused by dangling pointer references that result from switching character encoding. Here, it correctly figures out the gb2312 encoding but then later switches back to another encoding and horribly fails from then, including the memory corruption issues. I'm attaching a sample program along with a document that shows this problem. Note that based on the valgrind log and given the number of function pointers floating around I wouldn't be too shocked if this were exploitable with enough heap-spraying. Compile with gcc -I/usr/local/include/libxml2 -L/usr/local/lib xmltest.c -lxml2 -o xmltest Valgrind log from running attached example program: $ valgrind ./xmltest ==3492== Memcheck, a memory error detector ==3492== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==3492== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info ==3492== Command: ./xmltest ==3492== encoding error : input conversion failed due to input error, bytes 0xF8 0xA3 0xAD 0xCD ==3492== Invalid read of size 1 ==3492== at 0x40A68D5: htmlParseTryOrFinish (HTMLparser.c:5250) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4352bca is 3,626 bytes inside a block of size 32,640 free'd ==3492== at 0x40257ED: free (vg_replace_malloc.c:366) ==3492== by 0x41C45BE: __gconv_close (gconv_close.c:56) ==3492== by 0x41C3AFB: iconv_close (iconv_close.c:36) ==3492== by 0x404BB0A: xmlCharEncCloseFunc (encoding.c:2375) ==3492== by 0x40501B1: xmlSwitchInputEncodingInt (parserInternals.c:1183) ==3492== by 0x405051C: xmlSwitchToEncodingInt (parserInternals.c:1313) ==3492== by 0x4050137: xmlSwitchEncoding (parserInternals.c:1134) ==3492== by 0x409D74B: htmlCurrentChar (HTMLparser.c:515) ==3492== by 0x40A1427: htmlParseCharData (HTMLparser.c:2962) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== ==3492== Invalid read of size 1 ==3492== at 0x40A76D9: htmlParseTryOrFinish (HTMLparser.c:5602) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d34 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x40A76E8: htmlParseTryOrFinish (HTMLparser.c:5603) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d35 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D30F: htmlCurrentChar (HTMLparser.c:436) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d34 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D330: htmlCurrentChar (HTMLparser.c:438) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d35 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D36B: htmlCurrentChar (HTMLparser.c:442) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d35 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D85E: htmlCurrentChar (HTMLparser.c:547) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d37 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D873: htmlCurrentChar (HTMLparser.c:547) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d36 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D888: htmlCurrentChar (HTMLparser.c:546) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d35 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D89A: htmlCurrentChar (HTMLparser.c:546) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d34 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D962: htmlCurrentChar (HTMLparser.c:558) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d34 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x40A12BA: htmlParseCharData (HTMLparser.c:2955) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d34 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D6FB: htmlCurrentChar (HTMLparser.c:503) ==3492== by 0x40A1427: htmlParseCharData (HTMLparser.c:2962) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d35 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D70E: htmlCurrentChar (HTMLparser.c:504) ==3492== by 0x40A1427: htmlParseCharData (HTMLparser.c:2962) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d35 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D6FB: htmlCurrentChar (HTMLparser.c:503) ==3492== by 0x40A1501: htmlParseCharData (HTMLparser.c:2966) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d36 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D70E: htmlCurrentChar (HTMLparser.c:504) ==3492== by 0x40A1501: htmlParseCharData (HTMLparser.c:2966) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d36 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D6FB: htmlCurrentChar (HTMLparser.c:503) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d38 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x404F072: xmlCurrentChar (parserInternals.c:625) ==3492== by 0x409D81C: htmlCurrentChar (HTMLparser.c:531) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d38 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x404F085: xmlCurrentChar (parserInternals.c:625) ==3492== by 0x409D81C: htmlCurrentChar (HTMLparser.c:531) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d38 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x404F0DC: xmlCurrentChar (parserInternals.c:645) ==3492== by 0x409D81C: htmlCurrentChar (HTMLparser.c:531) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d38 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x404F645: xmlCurrentChar (parserInternals.c:757) ==3492== by 0x409D81C: htmlCurrentChar (HTMLparser.c:531) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d3b is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x404F65A: xmlCurrentChar (parserInternals.c:757) ==3492== by 0x409D81C: htmlCurrentChar (HTMLparser.c:531) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d3a is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x404F66F: xmlCurrentChar (parserInternals.c:756) ==3492== by 0x409D81C: htmlCurrentChar (HTMLparser.c:531) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d39 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x404F681: xmlCurrentChar (parserInternals.c:756) ==3492== by 0x409D81C: htmlCurrentChar (HTMLparser.c:531) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d38 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x404F711: xmlCurrentChar (parserInternals.c:764) ==3492== by 0x409D81C: htmlCurrentChar (HTMLparser.c:531) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d38 is not stack'd, malloc'd or (recently) free'd ==3492== ==3492== Invalid read of size 1 ==3492== at 0x409D70E: htmlCurrentChar (HTMLparser.c:504) ==3492== by 0x40A109D: htmlParseCharData (HTMLparser.c:2928) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x80487E1: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d58 is 8 bytes inside a block of size 120 free'd ==3492== at 0x40257ED: free (vg_replace_malloc.c:366) ==3492== by 0x41CBC73: __gconv_release_cache (gconv_cache.c:460) ==3492== by 0x41C4906: __gconv_close_transform (gconv_db.c:799) ==3492== by 0x41C45A6: __gconv_close (gconv_close.c:64) ==3492== by 0x41C3AFB: iconv_close (iconv_close.c:36) ==3492== by 0x404BADD: xmlCharEncCloseFunc (encoding.c:2370) ==3492== by 0x40501B1: xmlSwitchInputEncodingInt (parserInternals.c:1183) ==3492== by 0x405051C: xmlSwitchToEncodingInt (parserInternals.c:1313) ==3492== by 0x4050137: xmlSwitchEncoding (parserInternals.c:1134) ==3492== by 0x409D74B: htmlCurrentChar (HTMLparser.c:515) ==3492== by 0x40A1427: htmlParseCharData (HTMLparser.c:2962) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== ==3492== Invalid read of size 1 ==3492== at 0x40A68D5: htmlParseTryOrFinish (HTMLparser.c:5250) ==3492== by 0x40A82CF: htmlParseChunk (HTMLparser.c:5947) ==3492== by 0x8048817: main (in /home/readams/xmltest/xmltest) ==3492== Address 0x4359d5c is 12 bytes inside a block of size 120 free'd ==3492== at 0x40257ED: free (vg_replace_malloc.c:366) ==3492== by 0x41CBC73: __gconv_release_cache (gconv_cache.c:460) ==3492== by 0x41C4906: __gconv_close_transform (gconv_db.c:799) ==3492== by 0x41C45A6: __gconv_close (gconv_close.c:64) ==3492== by 0x41C3AFB: iconv_close (iconv_close.c:36) ==3492== by 0x404BADD: xmlCharEncCloseFunc (encoding.c:2370) ==3492== by 0x40501B1: xmlSwitchInputEncodingInt (parserInternals.c:1183) ==3492== by 0x405051C: xmlSwitchToEncodingInt (parserInternals.c:1313) ==3492== by 0x4050137: xmlSwitchEncoding (parserInternals.c:1134) ==3492== by 0x409D74B: htmlCurrentChar (HTMLparser.c:515) ==3492== by 0x40A1427: htmlParseCharData (HTMLparser.c:2962) ==3492== by 0x40A7B55: htmlParseTryOrFinish (HTMLparser.c:5713) ==3492== ==3492== ==3492== HEAP SUMMARY: ==3492== in use at exit: 933 bytes in 22 blocks ==3492== total heap usage: 117 allocs, 95 frees, 180,798 bytes allocated ==3492== ==3492== LEAK SUMMARY: ==3492== definitely lost: 0 bytes in 0 blocks ==3492== indirectly lost: 0 bytes in 0 blocks ==3492== possibly lost: 0 bytes in 0 blocks ==3492== still reachable: 933 bytes in 22 blocks ==3492== suppressed: 0 bytes in 0 blocks ==3492== Rerun with --leak-check=full to see details of leaked memory ==3492== ==3492== For counts of detected and suppressed errors, rerun with: -v ==3492== ERROR SUMMARY: 29122 errors from 27 contexts (suppressed: 25 from 10)
One additional comment: this problem does not occur on libxml2 2.6.26 from CentOS. I see it in 2.7.8 that I compiled from source on an Ubuntu 10.10 box, and in the 2.7.7 version currently shipping with Ubuntu 10.10.
probably related to bug 706952
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.