GNOME Bugzilla – Bug 319964
HTMLparser bug in which attributes with quotes are parsed as namespaces
Last modified: 2007-06-12 15:15:56 UTC
Please describe the problem: HTMLparser treats "foo:bar" as an XML namespaced attribute, when it should just treat it as a simple attribute containing a colon. Steps to reproduce: 1. 2. 3. Actual results: Expected results: Does this happen every time? Other information:
I don't understand. It seems to work fine for me: paphio:~/XML -> cat tst.html <html> <body> <img src="foo.gif" alt="foo:bar"> </body> </html> paphio:~/XML -> xmllint --html --debug tst.html HTML DOCUMENT URL=tst.html standalone=true DTD(html), PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN, SYSTEM http://www.w3.org/TR/REC-html40/loose.dtd ELEMENT html ELEMENT body TEXT content= ELEMENT img ATTRIBUTE src TEXT content=foo.gif ATTRIBUTE alt TEXT content=foo:bar paphio:~/XML -> please provide an example, Daniel
Sorry, I wasn't clear: the problem is that if the *name* of the attribute contains a colon, then it tries to treat it as a namespaced attribute, but in a HTML document there are no such thing as namespaces. Example: <p foo:bar="meep">Hello, world!</p> foo.html:3: namespace error : Namespace prefix foo of attribute bar is not defined Of course, if you actually try to declare the namespace prefix that won't work either: <p xmlns:foo="blah" foo:bar="meep">Hello, world!</p> foo.html:3: namespace error : Namespace prefix xmlns of attribute foo is not defined Ideally the HTMLparser should treat attribute names as tokens that may contain colons, just like the XML 1.0 spec pre-namespaces.
Makes sense, fixed in SVN: paphio:~/XML -> xmllint --html test.html <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body><p foo:bar="meep">Hello, world!</p></body></html> paphio:~/XML -> xmllint --debug --html test.html HTML DOCUMENT URL=test.html standalone=true DTD(html), PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN, SYSTEM http://www.w3.org/TR/REC-html40/loose.dtd ELEMENT html ELEMENT body ELEMENT p ATTRIBUTE foo:bar TEXT content=meep TEXT content=Hello, world! paphio:~/XML -> thanks for the report ! Daniel