Bug 632210 – Incorrect handling of whitespace data

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 632210 - Incorrect handling of whitespace data


Summary:	Incorrect handling of whitespace data


Status:	RESOLVED NOTABUG

Product:	libxslt
Classification:	Platform
Component:	general
Version:	1.1.26
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Daniel Veillard
QA Contact:	libxml QA maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2010-10-15 11:54 UTC by Éric Bischoff
Modified:	2010-10-16 12:51 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Éric Bischoff 2010-10-15 11:54:14 UTC

The following stylesheet

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:template match="text()">[<xsl:value-of select="." />]</xsl:template>
</xsl:template>

applied on the following data

<water>
  <calcium>20 mg/l</calcium>
  <magnesium>50 mg/l</magnesium>
</water>

produces the following data
[
  ][20 mg/l][
  ]{50 mg/l][
]

That is incorrect. It should produce:

[20 mg/l][50 mg/l]

Other products like Altova XMLspy produce the correct output.

Quoting http://www.w3.org/TR/xslt#strip

"After the tree for a source document or stylesheet document has been constructed, but before it is otherwise processed by XSLT, some text nodes are stripped. A text node is never stripped unless it contains only whitespace characters. Stripping the text node removes the text node from the tree."

The text between <water> and <calcium> contains only whitespace, and therefore should be stripped, according to the standard. libxslt does not strip it.

Comment 1 Daniel Veillard 2010-10-15 16:26:35 UTC

Haha ! What is leading you to believe Altova XMLspy produced the right
output and libxslt the wrong output ?

Your input with <water> is a *source* document. So...

-----------
For source documents, the set of whitespace-preserving element names is specified by xsl:strip-space and xsl:preserve-space top-level elements.
....
Initially, the set of whitespace-preserving element names contains all element names.
-----------

 since you have no declaration in the stylesheet to modify the set of 
strip-space elements, well <water> is certainly in the set and the whitespaces
in the element MUST NOT be stripped. 
I firmly believe the actual output of libxslt is *right* !

  Please go report a bug for Altova (and tell me if/when they fix it ;-)

Daniel

Comment 2 Éric Bischoff 2010-10-16 11:33:45 UTC

Salut Daniel,


OK, you convinced me, the standard seems to make libxsl right for the document nodes.

I missed the "Initially, the set of whitespace-preserving element names contains all element names." sentence. My bad.

I don't see why I would do a bug report against a proprietary product. Did they bring me a present to me like you do? But feel free to file it (and tell me if/when they fix it ;-).


Eric

Comment 3 Éric Bischoff 2010-10-16 12:04:17 UTC

Daniel,

It might be a bit less simple than it seems:

A quote from XSLT 2.0 recommendation


J.1.1 Tree construction: whitespace stripping

Both in XSLT 1.0 and in XSLT 2.0, the XSLT specification places no constraints on the way in which source trees are constructed. For XSLT 2.0, however, the [Data Model] specification describes explicit processes for constructing a tree from an Infoset or a PSVI, while also permitting other processes to be used. The process described in [Data Model] has the effect of stripping whitespace text nodes from elements declared to have element-only content. Although the XSLT 1.0 specification did not preclude such behavior, it differs from the way that most existing XSLT 1.0 implementations work.


The sentence "XSLT 1.0 specification did not preclude such behavior" suggests that it might be authorized behaviour to strip whitespace-only nodes for elements with element-only content model (no mixed content). That would make both products behave correctly.

That raises also another problem : how do you differentiate mixed content nodes from pure complex content nodes with no text in the instance document? Without access to the DTD or schema, I don't see how to do that from the style sheet.


Eric

Comment 4 Daniel Veillard 2010-10-16 12:51:50 UTC

Well the XSLT-1.0 standard is defined by the XSLT-1.0 specification.
I don't see any reason why those nodes should be stripped out based
on the spec itself. That the XSLT-2.0 spec (written in a large part
by a different set of persons), may have a different perspective fine,
but they can't just modify a posteriori an older spec.

Based on the spec I think the current behaviour is the only correct.

Daniel