GNOME Bugzilla – Bug 332124
libxml2 html parser mishandles minimized attributes
Last modified: 2006-10-17 16:20:26 UTC
Please describe the problem: when using xmllint --html --xmlout ... <input disabled> becomes <input disabled=""/> instead of <input disabled="disabled"/> and the same for other minimized attributes Steps to reproduce: Actual results: Expected results: Does this happen every time? Other information: I wouldn't even expect the parser to know which html attributes are boolean. In fact I would find it more useful if it did this with any attribute that didn't have a value.
Concerning 2.4.13, it is WAY too old, use a more recent version first. --xmlout makes a direct translation using the XML serializer. The XML serializer has no knowledge of the HTML markup semantic (unless you use the XHTML-1.0 DOCTYPE in which case a specific serializer is used). The only way would be for xmllint to change (or add) the XHTML-1 DOCTYPE to the document before calling the serializer in that case, might be a good idea or a bad one depending on people's expectations. Daniel
Oops, that Gnome version was bogus to fill a required field. I'm using Mac. I meant to include this: xmllint --version xmllint: using libxml version 20616 compiled with: DTDValid FTP HTTP HTML C14N Catalog XPath XPointer XInclude Iconv Unicode Regexps Automata Schemas I have no idea when Apple last updated that. The problem isn't with the serializer. It's the parser, which should report a value for a minimized attribute that's the same as the attribute's name.
Okay, easy enough, fixed in CVS: paphio:~/XML -> ./xmllint --html tst.html <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body> <input disabled> </body></html> paphio:~/XML -> ./xmllint --html --debug tst.html HTML DOCUMENT URL=tst.html standalone=true DTD(html), PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN, SYSTEM http://www.w3.org/TR/REC-html40/loose.dtd ELEMENT html ELEMENT body TEXT content= ELEMENT input ATTRIBUTE disabled TEXT content=disabled paphio:~/XML -> The output shows it in minimized form still, but internally an attribute value of the name is generated, this sshows up at the SAX level, Should be fixed in CVS now, thanks Daniel