GNOME Bugzilla – Bug 319715
HTMLparser bug in libxml 2.6.22 with large CDATA elements
Last modified: 2005-10-25 12:36:37 UTC
When a CDATA element such as <style> or <script> has too much text HTMLparser will produce the following error message: HTML parser error : Invalid char in CDATA 0x0 This is because this statement on line 2685 fails: if (((NXT(2) >= 'A') && (NXT(2) <= 'Z')) || ((NXT(2) >= 'a') && (NXT(2) <= 'z'))) the problem here is that the end of the buffer has been reached, and so NXT(2) is actually returning a null terminator or some other uninitialised value past the end of the buffer. I'm not sure why libxml 2.6.17 does not seem to exhibit this behaviour, even though the code is reasonably similar. However it seems clear that there needs to be some kind of check of whether the end of the buffer has been reached and to refill it if necessary; perhaps one of the other character reading macros performs this?
Created attachment 53867 [details] HTML file that exhibits the bug in libxml 2.6.22 xmllint --html bug.html
I think the loop need a GROW statement to make sure the parsing buffer is refilled as needed. tested indeed that was the problem paphio:~/XML -> xmllint --noout --html ../53867.html paphio:~/XML -> added the test to the regression suite, commited in CVS, thanks ! Daniel