Bug 656534 – --nodtdattr xsltproc option doesn't work on xmlns

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 656534 - --nodtdattr xsltproc option doesn't work on xmlns


Summary:	--nodtdattr xsltproc option doesn't work on xmlns


Status:	RESOLVED OBSOLETE

Product:	libxslt
Classification:	Platform
Component:	general
Version:	1.1.26
Hardware:	Other Mac OS

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Daniel Veillard
QA Contact:	libxml QA maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-08-14 21:58 UTC by Vincent Lefevre
Modified:	2021-07-05 10:59 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Vincent Lefevre 2011-08-14 21:58:01 UTC

Consider the following files:

tst.dtd

<!ELEMENT root (elem)>
<!ELEMENT elem (#PCDATA)>
<!ATTLIST elem
  xmlns CDATA #FIXED ''
>

tst.xml

<?xml version="1.0"?>
<!DOCTYPE root [
<!ELEMENT root (elem)>
<!ELEMENT elem (#PCDATA)>
<!ATTLIST elem
  xmlns CDATA #FIXED ''
>
]>
<root>
<elem>foo</elem>
</root>

tst-ext.xml

<?xml version="1.0"?>
<!DOCTYPE root SYSTEM "tst.dtd">
<root>
<elem>foo</elem>
</root>

tst.xsl

<?xml version="1.0"?>

<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml"/>

<xsl:template match="node()|@*">
  <xsl:copy>
    <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

If I run
  xsltproc --nodtdattr tst.xsl tst.xml
or
  xsltproc --nodtdattr tst.xsl tst-ext.xml
I get in both cases:

<?xml version="1.0"?>
<root>
<elem xmlns="">foo</elem>
</root>

instead of:

<?xml version="1.0"?>
<root>
<elem>foo</elem>
</root>

Comment 1 Vincent Lefevre 2012-09-06 13:56:36 UTC

Actually it seems that --nodtdattr doesn't work on xmlns only, as shown below. Note that in the above examples, the default namespace is still the null namespace, so that the added xmlns="" attributes are useless.

The files:

$ cat fixed-ext.xml 
<?xml version="1.0"?>
<!DOCTYPE root SYSTEM "fixed.dtd">
<root>
<elem>foo</elem>
</root>

$ cat fixed-int.xml 
<?xml version="1.0"?>
<!DOCTYPE root [
<!ELEMENT root (elem)>
<!ELEMENT elem (#PCDATA)>
<!ATTLIST elem
  xmlns CDATA #FIXED ''
  fixed CDATA #FIXED 'value'
>
]>
<root>
<elem>foo</elem>
</root>

$ cat fixed.dtd 
<!ELEMENT root (elem)>
<!ELEMENT elem (#PCDATA)>
<!ATTLIST elem
  xmlns CDATA #FIXED ''
  fixed CDATA #FIXED 'value'
>

$ cat fixed.xsl 
<?xml version="1.0"?>

<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml"/>

<xsl:template match="node()|@*">
  <xsl:copy>
    <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

And the tests:

$ xsltproc --nodtdattr fixed.xsl fixed-ext.xml
<?xml version="1.0"?>
<root>
<elem xmlns="">foo</elem>
</root>

$ xsltproc --nodtdattr fixed.xsl fixed-int.xml
<?xml version="1.0"?>
<root>
<elem xmlns="">foo</elem>
</root>

$ xsltproc fixed.xsl fixed-ext.xml
<?xml version="1.0"?>
<root>
<elem xmlns="" fixed="value">foo</elem>
</root>

$ xsltproc fixed.xsl fixed-int.xml
<?xml version="1.0"?>
<root>
<elem xmlns="" fixed="value">foo</elem>
</root>

Similar problem with xmllint from libxml2 2.8:

$ xmllint fixed-ext.xml
<?xml version="1.0"?>
<!DOCTYPE root SYSTEM "fixed.dtd">
<root>
<elem>foo</elem>
</root>

$ xmllint --loaddtd fixed-ext.xml
<?xml version="1.0"?>
<!DOCTYPE root SYSTEM "fixed.dtd">
<root>
<elem xmlns="">foo</elem>
</root>

$ xmllint fixed-int.xml
<?xml version="1.0"?>
<!DOCTYPE root [
<!ELEMENT root (elem)>
<!ELEMENT elem (#PCDATA)>
<!ATTLIST elem xmlns CDATA #FIXED "">
<!ATTLIST elem fixed CDATA #FIXED "value">
]>
<root>
<elem xmlns="">foo</elem>
</root>

Comment 2 Nick Wellnhofer 2012-09-16 21:41:31 UTC

This is probably related to libxml2.

Comment 3 Daniel Veillard 2012-09-17 01:29:07 UTC

I think it's related to libxslt and libxml2, but i'm not sure it's a bug. Basically namespace usage by the elem element means that this namespace
declaraction must be put in place for XSLT (it's a requirement from the
spec, the prefix used could be any non-conflicting one). For libxml2
it's a serialization rule hardcoded in the code that if a namespace is
detected by the parser (in that case if it loads the external subset)
then the namespace will be output because it is an essential part of
the interpretation of the document.

Daniel

Comment 4 Vincent Lefevre 2012-09-17 08:01:25 UTC

(In reply to comment #3)
> I think it's related to libxslt and libxml2, but i'm not sure it's a bug.
> Basically namespace usage by the elem element means that this namespace
> declaraction must be put in place for XSLT (it's a requirement from the
> spec, the prefix used could be any non-conflicting one).

I would have initially thought that copying the other fixed attributes were also a requirement from the spec, and that the goal of the --nodtdattr option was to change this behavior.

> For libxml2
> it's a serialization rule hardcoded in the code that if a namespace is
> detected by the parser (in that case if it loads the external subset)
> then the namespace will be output because it is an essential part of
> the interpretation of the document.

Do you mean that namespace is more important (e.g. if the namespace of the element changes then you can consider that the whole element changes), and that's why it's treated differently?

Now, what about the fact that on this particular case, the xmlns="" attribute is useless? I mean, the documents

<root>
<elem xmlns="">foo</elem>
</root>

and

<root>
<elem>foo</elem>
</root>

are equivalent (elem is in the empty name space in both cases), aren't they?

Comment 5 Daniel Veillard 2012-09-17 08:24:40 UTC

Actutally xmlns="" is a weird example.
xmlns="" means cancelling the default namespace in the scope of the element.

http://www.w3.org/TR/REC-xml-names/#nsc-NoPrefixUndecl

http://www.w3.org/TR/REC-xml-names/#defaulting
"The attribute value in a default namespace declaration MAY be empty. This has the same effect, within the scope of the declaration, of there being no default namespace."

So what you are suggesting doesn't have the semantic you expect. It's not
that there is an empty namespace, it's that there is no namespace, it removes
it ... if it exists.
Defaulting the removal of something which doesn't exist doesn't change much :-)

So, right, there are bugs on the default namespace, but possibly not exactly
for the expected reasons, or I didn't understood your point :-) . 
Now the problem is that XSLT can be used to serialize chunk of a document,
and there it becomes impossible for the processor to know if there is or
not a default namespace in scope, blocking the xmlns="" from being serialized
could make embedding of XSLT generated chunk fail, because the generated
chunk would now inherit the defaulted namespace.

Daniel

Comment 6 Vincent Lefevre 2012-09-17 11:30:28 UTC

Well, when I'm mentioning an element with an "empty namespace", I mean an element that has a null namespace URI. But this is equivalent to saying that the element has no namespace, isn't it?

Perhaps this will be more clear with a new example (here with xmllint, but when I initially reported this bug, I got a similar problem with libxslt, where each output element had a xmlns="", even when the parent had already one):

<?xml version="1.0"?>
<!DOCTYPE root [
<!ELEMENT root (a+)>
<!ELEMENT a (b1|b2)>
<!ATTLIST a xmlns CDATA #IMPLIED>
<!ELEMENT b1 (#PCDATA)>
<!ATTLIST b1 xmlns CDATA #FIXED "http://localhost/">
<!ATTLIST b1 fixed CDATA #FIXED "value">
<!ELEMENT b2 (#PCDATA)>
<!ATTLIST b2 xmlns CDATA #FIXED "">
<!ATTLIST b2 fixed CDATA #FIXED "value">
]>
<root>
<a xmlns="http://localhost/"><b1>foo</b1></a>
<a xmlns=""><b2>foo</b2></a>
</root>

$ xmllint test.xml
<?xml version="1.0"?>
<!DOCTYPE root [
<!ELEMENT root (a)+>
<!ELEMENT a (b1 | b2)>
<!ATTLIST a xmlns CDATA #IMPLIED>
<!ELEMENT b1 (#PCDATA)>
<!ATTLIST b1 xmlns CDATA #FIXED "http://localhost/">
<!ATTLIST b1 fixed CDATA #FIXED "value">
<!ELEMENT b2 (#PCDATA)>
<!ATTLIST b2 xmlns CDATA #FIXED "">
<!ATTLIST b2 fixed CDATA #FIXED "value">
]>
<root>
<a xmlns="http://localhost/"><b1>foo</b1></a>
<a xmlns=""><b2 xmlns="">foo</b2></a>
</root>

I don't see why the handling of xmlns is different between b1 and b2.

> Now the problem is that XSLT can be used to serialize chunk of a document,
> and there it becomes impossible for the processor to know if there is or
> not a default namespace in scope, blocking the xmlns="" from being serialized
> could make embedding of XSLT generated chunk fail, because the generated
> chunk would now inherit the defaulted namespace.

OK, so my initial example was incorrect (that's the problem when trying to simplify too complex examples for a bug report; but the one just above is closer to the real problem I was seeing).

Comment 7 Vincent Lefevre 2012-09-17 12:37:20 UTC

Actually, for libxslt, this is a bit different. Consider the above XML example transformed by the following XSLT stylesheet:

<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml"/>

<xsl:template match="node()|@*">
  <xsl:copy>
    <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="root">
  <xsl:copy>
    <xsl:attribute name="attr">rootattr</xsl:attribute>
    <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

I get:

<?xml version="1.0"?>
<root attr="rootattr">
<a xmlns="http://localhost/"><b1 fixed="value">foo</b1></a>
<a xmlns=""><b2 fixed="value">foo</b2></a>
</root>

There is no xmlns="" on b2, which is fine. But the xmlns="" on the "a" is useless, or it should have been on the "root" element, for the following reason.

Since the xsl:template match="root" matches, it is assumed that the "root" element has no namespace (= it is in the namespace with a null URI, using the XPath terminology), as the concept of default namespace doesn't apply to XPath expressions. So, there is no reason to add a xmlns="" to the second "a". If it is assumed that the generated XML text could be interpreted with any default namespace, then the "root" element should have been generated with a xmlns="" attribute (since this is a "root" element with no namespace that was copied by xsltproc).

Comment 8 GNOME Infrastructure Team 2021-07-05 10:59:40 UTC

GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/libxslt/-/issues/

Thank you for your understanding and your help.