After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 495213 - Change in HTML "embed" handling breaks parser in 2.6.29+
Change in HTML "embed" handling breaks parser in 2.6.29+
Status: RESOLVED FIXED
Product: libxml2
Classification: Platform
Component: general
2.6.30
Other All
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2007-11-09 07:13 UTC by Stefan Behnel
Modified: 2008-01-11 06:24 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch to fix the serialisation of <embed> tags (461 bytes, patch)
2007-11-24 10:57 UTC, Stefan Behnel
none Details | Review

Description Stefan Behnel 2007-11-09 07:13:18 UTC
Please describe the problem:
I noticed a problem with the new way libxml2 2.6.29+ handles the HTML "embed"
tag. It serialises it without the enclosing tag, which then lets following
attempts to parse the document fail, as the information where the tag is
closed gets lost.

Steps to reproduce:
$ cat embed.html
<html><body>
<embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash"></embed>
<embed src="http://anothersite.com/v/another"></embed>
<script src="http://www.youtube.com/example.js"></script>
<script src="/something-else.js"></script>
</body></html>

$ xmllint --html embed.html > embed2.html

$ cat embed2.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash"><embed
src="http://anothersite.com/v/another"><script
src="http://www.youtube.com/example.js"></script><script
src="/something-else.js"></script>
</body></html>

$ xmllint --html embed2.html > embed3.html

$ cat embed3.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<embed src="http://www.youtube.com/v/183tVH1CZpA"
type="application/x-shockwave-flash"><embed
src="http://anothersite.com/v/another"><script
src="http://www.youtube.com/example.js"></script><script
src="/something-else.js"></script></embed></embed>
</body></html>


Actual results:
The "script" tags have moved into the "embed" tag, although originally they were siblings.

Expected results:
A parse-serialise-parse cycle should not alter the structure.

Does this happen every time?
yes

Other information:
Comment 1 Stefan Behnel 2007-11-24 10:57:41 UTC
Created attachment 99565 [details] [review]
Patch to fix the serialisation of <embed> tags

I attached a patch that fixes the problem. It instructs the serialiser to always include a closing tag for the <embed> tag, even if no content is provided.
Comment 2 Daniel Veillard 2008-01-11 06:24:51 UTC
Okay, this makes perfect sense, applied and commited to SVN revision 3671,

 thanks !

Daniel