After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 654146 - HTML parser strips pseudo-namespaces (fb:like, g:plusone etc)
HTML parser strips pseudo-namespaces (fb:like, g:plusone etc)
Status: RESOLVED OBSOLETE
Product: libxml2
Classification: Platform
Component: htmlparser
2.7.7
Other All
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
: 711670 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2011-07-07 10:11 UTC by Sergey Schetinin
Modified: 2021-07-05 13:20 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Keep 'prefixes' in HTML (642 bytes, patch)
2015-04-14 05:53 UTC, Ben Schmidt
none Details | Review

Description Sergey Schetinin 2011-07-07 10:11:15 UTC
When parsing HTML documents that contain XFBML (facebook markup), the parser remove the namespace required as part of that markup is stripped, so <fb:like> turns into <like>. (Same thing happens to <g:plusone></g:plusone>)

This makes processing pages that incorporate XFBML with lxml much harder.

I'm not 100% positive, but it looks like it's impossible to preserve that information when parsing with HTMLParser. However XMLParser is not an option unfortunately -- those tags are used in all kinds of real-world HTML documents and for that reason I hope you can add an ability to preserve those pseudo-namespaces (not even necessarily by default).

Thank you.

See also: http://stackoverflow.com/questions/6597271/how-to-preserve-namespace-information-when-parsing-html-with-lxml



@ubuntu:~$ echo '<fb:like/>'|xmllint --html -
-:1: namespace warning : Namespace prefix fb is not defined
<fb:like/>
        ^
-:1: HTML parser error : Tag fb:like invalid
<fb:like/>
        ^
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><like></like></body></html>

@ubuntu:~$ echo '<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml"><fb:like/></html>'|xmllint --html -
-:1: namespace warning : Namespace prefix fb is not defined
p://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml"><fb:like
                                                                               ^
-:1: HTML parser error : Tag fb:like invalid
p://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml"><fb:like
                                                                               ^
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml"><body><like></like></body></html>
Comment 1 Ben Schmidt 2015-04-14 05:50:40 UTC
I came across this undesirable behaviour also, in version 2.9.2. Attached patch fixes it. Since attributes starting with xmlsn are not parsed in HTML (cf. SAX2.c:1699 and SAX2.c:1740) it makes sense to include the full name, not just the local part, as the element name in the DOM tree.
Comment 2 Ben Schmidt 2015-04-14 05:53:36 UTC
Created attachment 301506 [details] [review]
Keep 'prefixes' in HTML
Comment 3 Ben Schmidt 2017-06-12 21:43:02 UTC
Would there be any chance in getting this patch included? Any way I can help?
Comment 4 Nick Wellnhofer 2017-06-17 10:58:40 UTC
*** Bug 711670 has been marked as a duplicate of this bug. ***
Comment 5 GNOME Infrastructure Team 2021-07-05 13:20:59 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/libxml2/-/issues/

Thank you for your understanding and your help.