GNOME Bugzilla – Bug 596184
RELAX NG validation fails due to default attribute value
Last modified: 2017-06-12 19:06:18 UTC
Created attachment 143907 [details] testcase (archive containing debbug288149.rng and debbug288149.xml) Note: this bug has been reported against Debian here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=288149 When the DTD has a default attribute value, xmllint ignores it for RELAX NG validation. According to the RELAX NG spec (see below), this is incorrect. With the attached testcase: $ --relaxng debbug288149.rng debbug288149.xml <?xml version="1.0"?> <!DOCTYPE root [ <!ELEMENT root (#PCDATA)> <!ATTLIST root type (text | number) "text"> ]> <root>Test</root> debbug288149.xml:8: element root: Relax-NG validity error : Element root failed to validate attributes debbug288149.xml fails to validate zsh: exit 3 xmllint --relaxng debbug288149.rng debbug288149.xml Note that jing validates the XML file. The explanations I gave in the Debian bug report: Yes, I think it's a bug in xmllint for the following reason. The question is whether an attribute given in the DTD with a default value (but not in the start tag of an element) is regarded as present or not for RELAX NG validation. According to the RELAX NG spec, the data model is based on the infoset obtained when all declarations of the DTD are processed. According to the infoset spec[*], attributes that have a default value are part of the infoset. The information concerning a default value is provided by the [specified] property: [specified] A flag indicating whether this attribute was actually specified in the start-tag of its element, or was defaulted from the DTD. Let's get back to RELAX NG. Its spec doesn't say anything concerning this [specified] property. This means that the origin of an attribute (start tag or default value in DTD) is ignored, i.e. an attribute specified in a start tag and an attribute with default value in the DTD are regarded as equivalent for RELAX NG. [*] http://www.w3.org/TR/xml-infoset/
The command above should be: xmllint --relaxng debbug288149.rng debbug288149.xml
not a bug, if you want to fetch the DTD that's ultimately your choice Relax-NG validates a tree, the tree is the result of your parsing. xmllint contrary to jing allows you to lookup or not attributes coming from the DTD, just use the appropriate option ! --dtdattr : loaddtd + populate the tree with inherited attributes The fact that all Java parsers actually tend to force you to always load the DTD is a nuisance rather than anything else. An XML parser can operate in both modes, libxml2 defaults to the less intrusive and less dangerous one, I won't change this, but the user can. Forcing --dtdattr when processing Relax-NG by default can mean a lot of disruption in existing processes, and the user has the choice, programatically or on the command line. Feature, not bug Daniel
Reopening because xmllint doesn't behave strictly as documented. The man page says: --relaxng SCHEMA Use RelaxNG file named SCHEMA for validation. No more. As documented, if xmllint doesn't use the full infoset (with attribute default values) by default, it breaks the RELAX NG specification, and that's a bug. I can understand that you may want to force the user to use --dtdattr in such a case, but then, this must explicitly be documented. The user can't guess that. Something like: In order to conform to the RELAX NG specification, the user may need to use the --dtdattr option too. Actually I think it would be better to make --relaxng imply --dtdattr by default and have --nodtdattr to disable that (a bit like xsltproc, which has --nodtdattr instead of requiring the user to add --dtdattr for full conformance). I wonder why an inconsistency between xmllint --relaxng and xsltproc. Anyway, just completing the documentation as above is OK, IMHO. Other two points: 1. The API documentation may need to be completed. 2. I think it should be made clear that it is not an error to use --dtdattr when there is no DTD.
No. In addition to the reasons given previously: - using DTD for attribute defaulting in an environment where Relax-NG means people will validate the instance, find it fine but the using a parser in non-validating mode the attribute will be missing. This is a bug inducing behaviour, and a bad practice, it increases the gap between how validating and non-validating parsers will process a document. - xmllint and libxml2 in general default to not loading the external subset, it is a *good* thing, I stand by it and prefer a small deviation in front of a non-sensical use of DTD and RNG than change this by default. There is still a way to do the opposite. - libxml2 is designed as an editing toolkit, it won't modify the document when processing it by default, this opposes SAX and many other tools who just didn't care about this case, sorry libxml2 won't do this. -- nodtdattr is the default it's the rule of least surprize, and the best way to preserve the data. - people have been using libxml2/xmllint for years to validate with Relax-NG without loading the DTD, I don't want to break this for obscure and misleading reasons. If you want to argue that libxml2 is not RelaxNG compliant, fine, use something else, lobby for people to use something else if you feel so but I stand where I am, I do feel it's the best for a large majority of my users. Daniel
I still don't understand why you don't want to document xmllint properly.
Well if it's just about changing the docs, fine, just suggest a patch on the mailing-list or here. But there is no --nodtdattr option, and I don't plan to change --relaxng and add it as you suggested in comment #3 I.e. changing the behaviour which is the bugzilla is WONTFIX, clarifying docs is fine (but probably too verbose for --help , more for the man page) Daniel
As I've said, in the xmllint man page, after --relaxng SCHEMA Use RelaxNG file named SCHEMA for validation. you could add: In order to conform to the RELAX NG specification, the user may need to use the --dtdattr option too.