GNOME Bugzilla – Bug 347708
HTML chunked parsing failure when chunk ends "</"
Last modified: 2006-10-16 09:32:23 UTC
If a chunk passed to htmlParseChunk ends with "</" while parsing a script element, the parser errors "HTML parser error : Invalid char in CDATA 0x0". The remainder of the document is then parsed incorrectly. I think the problem is in htmlParseScript (the error above is from line 2711) but I don't know how to fix it. I will attach a test-case showing this bug. Thanks.
Created attachment 69010 [details] Test-case for htmlParseChunk
Okay, thanks to the test case I could reproduce and fix the bug relatively easilly: paphio:~/XML -> ./tst htmlParseChunk 13 "<script>abc</" htmlParseChunk 13 "script><p>def" htmlParseChunk 5 "</p>" htmlParseChunk 0 HTML DOCUMENT standalone=true DTD(html), PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN, SYSTEM http://www.w3.org/TR/REC-html40/loose.dtd ELEMENT html ELEMENT head ELEMENT script CDATA_SECTION content=abc ELEMENT body ELEMENT p TEXT content=def paphio:~/XML -> Fixed in CVS, thanks a lot for the detailed report ! Daniel