GNOME Bugzilla – Bug 795343
<noscript> is implying end </p>
Last modified: 2018-04-18 14:07:20 UTC
When the html document contains the following: <p><img><noscript></noscript></p> It gets parsed as: <p><img></p><noscript></noscript> Notice that the </p> appears before the <noscript> tag. I think this is in error. If you review the Mozzila docs (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/p) , you will see that the <p> tag is permitted to contain "Phrasing Content", and <noscript> belongs to this category. I think the code that controls this behaviour is: https://git.gnome.org/browse/libxml2/tree/HTMLparser.c#n1099 static const char * const htmlStartClose[] = { ... "noscript", "p", NULL, ... }
Hum, this is a bit difficult to infer from the HTML 4 spec itself. However when considering the scenario of using noscript for its intended use like <p> <script> document.write("Hello World!") </script> <noscript>Your browser does not support JavaScript!</noscript> </p> then yes the current behaviour doesn't look proper, so it should close <p> but by its very nature it should close <script> so I pushed that as https://git.gnome.org/browse/libxml2/commit/?id=35e83488505d501864826125cfe6a7950d6cba78 it seems to be doing the right thing: thinkpad2:~/XML -> echo "<p><img><noscript></noscript></p>" | xmllint --html - <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body> <p><img><noscript></noscript></p> </body></html> thinkpad2:~/XML -> thanks, Daniel