After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 632210 - Incorrect handling of whitespace data
Incorrect handling of whitespace data
Status: RESOLVED NOTABUG
Product: libxslt
Classification: Platform
Component: general
1.1.26
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2010-10-15 11:54 UTC by Éric Bischoff
Modified: 2010-10-16 12:51 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Éric Bischoff 2010-10-15 11:54:14 UTC
The following stylesheet

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:template match="text()">[<xsl:value-of select="." />]</xsl:template>
</xsl:template>

applied on the following data

<water>
  <calcium>20 mg/l</calcium>
  <magnesium>50 mg/l</magnesium>
</water>

produces the following data
[
  ][20 mg/l][
  ]{50 mg/l][
]

That is incorrect. It should produce:

[20 mg/l][50 mg/l]

Other products like Altova XMLspy produce the correct output.

Quoting http://www.w3.org/TR/xslt#strip

"After the tree for a source document or stylesheet document has been constructed, but before it is otherwise processed by XSLT, some text nodes are stripped. A text node is never stripped unless it contains only whitespace characters. Stripping the text node removes the text node from the tree."

The text between <water> and <calcium> contains only whitespace, and therefore should be stripped, according to the standard. libxslt does not strip it.
Comment 1 Daniel Veillard 2010-10-15 16:26:35 UTC
Haha ! What is leading you to believe Altova XMLspy produced the right
output and libxslt the wrong output ?

Your input with <water> is a *source* document. So...

-----------
For source documents, the set of whitespace-preserving element names is specified by xsl:strip-space and xsl:preserve-space top-level elements.
....
Initially, the set of whitespace-preserving element names contains all element names.
-----------

 since you have no declaration in the stylesheet to modify the set of 
strip-space elements, well <water> is certainly in the set and the whitespaces
in the element MUST NOT be stripped. 
I firmly believe the actual output of libxslt is *right* !

  Please go report a bug for Altova (and tell me if/when they fix it ;-)

Daniel
Comment 2 Éric Bischoff 2010-10-16 11:33:45 UTC
Salut Daniel,


OK, you convinced me, the standard seems to make libxsl right for the document nodes.

I missed the "Initially, the set of whitespace-preserving element names contains all element names." sentence. My bad.

I don't see why I would do a bug report against a proprietary product. Did they bring me a present to me like you do? But feel free to file it (and tell me if/when they fix it ;-).


Eric
Comment 3 Éric Bischoff 2010-10-16 12:04:17 UTC
Daniel,

It might be a bit less simple than it seems:

A quote from XSLT 2.0 recommendation


J.1.1 Tree construction: whitespace stripping

Both in XSLT 1.0 and in XSLT 2.0, the XSLT specification places no constraints on the way in which source trees are constructed. For XSLT 2.0, however, the [Data Model] specification describes explicit processes for constructing a tree from an Infoset or a PSVI, while also permitting other processes to be used. The process described in [Data Model] has the effect of stripping whitespace text nodes from elements declared to have element-only content. Although the XSLT 1.0 specification did not preclude such behavior, it differs from the way that most existing XSLT 1.0 implementations work.


The sentence "XSLT 1.0 specification did not preclude such behavior" suggests that it might be authorized behaviour to strip whitespace-only nodes for elements with element-only content model (no mixed content). That would make both products behave correctly.

That raises also another problem : how do you differentiate mixed content nodes from pure complex content nodes with no text in the instance document? Without access to the DTD or schema, I don't see how to do that from the style sheet.


Eric
Comment 4 Daniel Veillard 2010-10-16 12:51:50 UTC
Well the XSLT-1.0 standard is defined by the XSLT-1.0 specification.
I don't see any reason why those nodes should be stripped out based
on the spec itself. That the XSLT-2.0 spec (written in a large part
by a different set of persons), may have a different perspective fine,
but they can't just modify a posteriori an older spec.

Based on the spec I think the current behaviour is the only correct.

Daniel