After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 795343 - <noscript> is implying end </p>
<noscript> is implying end </p>
Status: RESOLVED FIXED
Product: libxml2
Classification: Platform
Component: htmlparser
git master
Other Windows
: Normal normal
: ---
Assigned To: Daniel Veillard
Depends on:
Blocks:
 
 
Reported: 2018-04-18 05:24 UTC by Tom Kaminski
Modified: 2018-04-18 14:07 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Tom Kaminski 2018-04-18 05:24:31 UTC
When the html document contains the following:

<p><img><noscript></noscript></p>

It gets parsed as:

<p><img></p><noscript></noscript>

Notice that the </p> appears before the <noscript> tag.  I think this is in error.  If you review the Mozzila docs (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/p) , you will see that the <p> tag is permitted to contain "Phrasing Content", and <noscript> belongs to this category.

I think the code that controls this behaviour is: https://git.gnome.org/browse/libxml2/tree/HTMLparser.c#n1099

static const char * const htmlStartClose[] = {
...
"noscript",	"p", NULL,
...
}
Comment 1 Daniel Veillard 2018-04-18 14:07:20 UTC
Hum, this is a bit difficult to infer from the HTML 4 spec itself.
However when considering the scenario of using noscript for its intended
use like

<p>
<script>
document.write("Hello World!")
</script>
<noscript>Your browser does not support JavaScript!</noscript> 
</p>

then yes the current behaviour doesn't look proper, so it should close
<p> but by its very nature it should close <script> so I pushed that
as 
https://git.gnome.org/browse/libxml2/commit/?id=35e83488505d501864826125cfe6a7950d6cba78

it seems to be doing the right thing:


thinkpad2:~/XML -> echo "<p><img><noscript></noscript></p>" | xmllint --html -
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p><img><noscript></noscript></p>
</body></html>
thinkpad2:~/XML -> 

  thanks,

Daniel