GNOME Bugzilla – Bug 495213
Change in HTML "embed" handling breaks parser in 2.6.29+
Last modified: 2008-01-11 06:24:51 UTC
Please describe the problem: I noticed a problem with the new way libxml2 2.6.29+ handles the HTML "embed" tag. It serialises it without the enclosing tag, which then lets following attempts to parse the document fail, as the information where the tag is closed gets lost. Steps to reproduce: $ cat embed.html <html><body> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"></embed> <embed src="http://anothersite.com/v/another"></embed> <script src="http://www.youtube.com/example.js"></script> <script src="/something-else.js"></script> </body></html> $ xmllint --html embed.html > embed2.html $ cat embed2.html <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"><embed src="http://anothersite.com/v/another"><script src="http://www.youtube.com/example.js"></script><script src="/something-else.js"></script> </body></html> $ xmllint --html embed2.html > embed3.html $ cat embed3.html <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body> <embed src="http://www.youtube.com/v/183tVH1CZpA" type="application/x-shockwave-flash"><embed src="http://anothersite.com/v/another"><script src="http://www.youtube.com/example.js"></script><script src="/something-else.js"></script></embed></embed> </body></html> Actual results: The "script" tags have moved into the "embed" tag, although originally they were siblings. Expected results: A parse-serialise-parse cycle should not alter the structure. Does this happen every time? yes Other information:
Created attachment 99565 [details] [review] Patch to fix the serialisation of <embed> tags I attached a patch that fixes the problem. It instructs the serialiser to always include a closing tag for the <embed> tag, even if no content is provided.
Okay, this makes perfect sense, applied and commited to SVN revision 3671, thanks ! Daniel