GNOME Bugzilla – Bug 656534
--nodtdattr xsltproc option doesn't work on xmlns
Last modified: 2021-07-05 10:59:40 UTC
Consider the following files: tst.dtd <!ELEMENT root (elem)> <!ELEMENT elem (#PCDATA)> <!ATTLIST elem xmlns CDATA #FIXED '' > tst.xml <?xml version="1.0"?> <!DOCTYPE root [ <!ELEMENT root (elem)> <!ELEMENT elem (#PCDATA)> <!ATTLIST elem xmlns CDATA #FIXED '' > ]> <root> <elem>foo</elem> </root> tst-ext.xml <?xml version="1.0"?> <!DOCTYPE root SYSTEM "tst.dtd"> <root> <elem>foo</elem> </root> tst.xsl <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml"/> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> </xsl:stylesheet> If I run xsltproc --nodtdattr tst.xsl tst.xml or xsltproc --nodtdattr tst.xsl tst-ext.xml I get in both cases: <?xml version="1.0"?> <root> <elem xmlns="">foo</elem> </root> instead of: <?xml version="1.0"?> <root> <elem>foo</elem> </root>
Actually it seems that --nodtdattr doesn't work on xmlns only, as shown below. Note that in the above examples, the default namespace is still the null namespace, so that the added xmlns="" attributes are useless. The files: $ cat fixed-ext.xml <?xml version="1.0"?> <!DOCTYPE root SYSTEM "fixed.dtd"> <root> <elem>foo</elem> </root> $ cat fixed-int.xml <?xml version="1.0"?> <!DOCTYPE root [ <!ELEMENT root (elem)> <!ELEMENT elem (#PCDATA)> <!ATTLIST elem xmlns CDATA #FIXED '' fixed CDATA #FIXED 'value' > ]> <root> <elem>foo</elem> </root> $ cat fixed.dtd <!ELEMENT root (elem)> <!ELEMENT elem (#PCDATA)> <!ATTLIST elem xmlns CDATA #FIXED '' fixed CDATA #FIXED 'value' > $ cat fixed.xsl <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml"/> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> </xsl:stylesheet> And the tests: $ xsltproc --nodtdattr fixed.xsl fixed-ext.xml <?xml version="1.0"?> <root> <elem xmlns="">foo</elem> </root> $ xsltproc --nodtdattr fixed.xsl fixed-int.xml <?xml version="1.0"?> <root> <elem xmlns="">foo</elem> </root> $ xsltproc fixed.xsl fixed-ext.xml <?xml version="1.0"?> <root> <elem xmlns="" fixed="value">foo</elem> </root> $ xsltproc fixed.xsl fixed-int.xml <?xml version="1.0"?> <root> <elem xmlns="" fixed="value">foo</elem> </root> Similar problem with xmllint from libxml2 2.8: $ xmllint fixed-ext.xml <?xml version="1.0"?> <!DOCTYPE root SYSTEM "fixed.dtd"> <root> <elem>foo</elem> </root> $ xmllint --loaddtd fixed-ext.xml <?xml version="1.0"?> <!DOCTYPE root SYSTEM "fixed.dtd"> <root> <elem xmlns="">foo</elem> </root> $ xmllint fixed-int.xml <?xml version="1.0"?> <!DOCTYPE root [ <!ELEMENT root (elem)> <!ELEMENT elem (#PCDATA)> <!ATTLIST elem xmlns CDATA #FIXED ""> <!ATTLIST elem fixed CDATA #FIXED "value"> ]> <root> <elem xmlns="">foo</elem> </root>
This is probably related to libxml2.
I think it's related to libxslt and libxml2, but i'm not sure it's a bug. Basically namespace usage by the elem element means that this namespace declaraction must be put in place for XSLT (it's a requirement from the spec, the prefix used could be any non-conflicting one). For libxml2 it's a serialization rule hardcoded in the code that if a namespace is detected by the parser (in that case if it loads the external subset) then the namespace will be output because it is an essential part of the interpretation of the document. Daniel
(In reply to comment #3) > I think it's related to libxslt and libxml2, but i'm not sure it's a bug. > Basically namespace usage by the elem element means that this namespace > declaraction must be put in place for XSLT (it's a requirement from the > spec, the prefix used could be any non-conflicting one). I would have initially thought that copying the other fixed attributes were also a requirement from the spec, and that the goal of the --nodtdattr option was to change this behavior. > For libxml2 > it's a serialization rule hardcoded in the code that if a namespace is > detected by the parser (in that case if it loads the external subset) > then the namespace will be output because it is an essential part of > the interpretation of the document. Do you mean that namespace is more important (e.g. if the namespace of the element changes then you can consider that the whole element changes), and that's why it's treated differently? Now, what about the fact that on this particular case, the xmlns="" attribute is useless? I mean, the documents <root> <elem xmlns="">foo</elem> </root> and <root> <elem>foo</elem> </root> are equivalent (elem is in the empty name space in both cases), aren't they?
Actutally xmlns="" is a weird example. xmlns="" means cancelling the default namespace in the scope of the element. http://www.w3.org/TR/REC-xml-names/#nsc-NoPrefixUndecl http://www.w3.org/TR/REC-xml-names/#defaulting "The attribute value in a default namespace declaration MAY be empty. This has the same effect, within the scope of the declaration, of there being no default namespace." So what you are suggesting doesn't have the semantic you expect. It's not that there is an empty namespace, it's that there is no namespace, it removes it ... if it exists. Defaulting the removal of something which doesn't exist doesn't change much :-) So, right, there are bugs on the default namespace, but possibly not exactly for the expected reasons, or I didn't understood your point :-) . Now the problem is that XSLT can be used to serialize chunk of a document, and there it becomes impossible for the processor to know if there is or not a default namespace in scope, blocking the xmlns="" from being serialized could make embedding of XSLT generated chunk fail, because the generated chunk would now inherit the defaulted namespace. Daniel
Well, when I'm mentioning an element with an "empty namespace", I mean an element that has a null namespace URI. But this is equivalent to saying that the element has no namespace, isn't it? Perhaps this will be more clear with a new example (here with xmllint, but when I initially reported this bug, I got a similar problem with libxslt, where each output element had a xmlns="", even when the parent had already one): <?xml version="1.0"?> <!DOCTYPE root [ <!ELEMENT root (a+)> <!ELEMENT a (b1|b2)> <!ATTLIST a xmlns CDATA #IMPLIED> <!ELEMENT b1 (#PCDATA)> <!ATTLIST b1 xmlns CDATA #FIXED "http://localhost/"> <!ATTLIST b1 fixed CDATA #FIXED "value"> <!ELEMENT b2 (#PCDATA)> <!ATTLIST b2 xmlns CDATA #FIXED ""> <!ATTLIST b2 fixed CDATA #FIXED "value"> ]> <root> <a xmlns="http://localhost/"><b1>foo</b1></a> <a xmlns=""><b2>foo</b2></a> </root> $ xmllint test.xml <?xml version="1.0"?> <!DOCTYPE root [ <!ELEMENT root (a)+> <!ELEMENT a (b1 | b2)> <!ATTLIST a xmlns CDATA #IMPLIED> <!ELEMENT b1 (#PCDATA)> <!ATTLIST b1 xmlns CDATA #FIXED "http://localhost/"> <!ATTLIST b1 fixed CDATA #FIXED "value"> <!ELEMENT b2 (#PCDATA)> <!ATTLIST b2 xmlns CDATA #FIXED ""> <!ATTLIST b2 fixed CDATA #FIXED "value"> ]> <root> <a xmlns="http://localhost/"><b1>foo</b1></a> <a xmlns=""><b2 xmlns="">foo</b2></a> </root> I don't see why the handling of xmlns is different between b1 and b2. > Now the problem is that XSLT can be used to serialize chunk of a document, > and there it becomes impossible for the processor to know if there is or > not a default namespace in scope, blocking the xmlns="" from being serialized > could make embedding of XSLT generated chunk fail, because the generated > chunk would now inherit the defaulted namespace. OK, so my initial example was incorrect (that's the problem when trying to simplify too complex examples for a bug report; but the one just above is closer to the real problem I was seeing).
Actually, for libxslt, this is a bit different. Consider the above XML example transformed by the following XSLT stylesheet: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml"/> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> <xsl:template match="root"> <xsl:copy> <xsl:attribute name="attr">rootattr</xsl:attribute> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> </xsl:stylesheet> I get: <?xml version="1.0"?> <root attr="rootattr"> <a xmlns="http://localhost/"><b1 fixed="value">foo</b1></a> <a xmlns=""><b2 fixed="value">foo</b2></a> </root> There is no xmlns="" on b2, which is fine. But the xmlns="" on the "a" is useless, or it should have been on the "root" element, for the following reason. Since the xsl:template match="root" matches, it is assumed that the "root" element has no namespace (= it is in the namespace with a null URI, using the XPath terminology), as the concept of default namespace doesn't apply to XPath expressions. So, there is no reason to add a xmlns="" to the second "a". If it is assumed that the generated XML text could be interpreted with any default namespace, then the "root" element should have been generated with a xmlns="" attribute (since this is a "root" element with no namespace that was copied by xsltproc).
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxslt/-/issues/ Thank you for your understanding and your help.