GNOME Bugzilla – Bug 632210
Incorrect handling of whitespace data
Last modified: 2010-10-16 12:51:50 UTC
The following stylesheet <?xml version="1.0" encoding="UTF-8" ?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text" /> <xsl:template match="text()">[<xsl:value-of select="." />]</xsl:template> </xsl:template> applied on the following data <water> <calcium>20 mg/l</calcium> <magnesium>50 mg/l</magnesium> </water> produces the following data [ ][20 mg/l][ ]{50 mg/l][ ] That is incorrect. It should produce: [20 mg/l][50 mg/l] Other products like Altova XMLspy produce the correct output. Quoting http://www.w3.org/TR/xslt#strip "After the tree for a source document or stylesheet document has been constructed, but before it is otherwise processed by XSLT, some text nodes are stripped. A text node is never stripped unless it contains only whitespace characters. Stripping the text node removes the text node from the tree." The text between <water> and <calcium> contains only whitespace, and therefore should be stripped, according to the standard. libxslt does not strip it.
Haha ! What is leading you to believe Altova XMLspy produced the right output and libxslt the wrong output ? Your input with <water> is a *source* document. So... ----------- For source documents, the set of whitespace-preserving element names is specified by xsl:strip-space and xsl:preserve-space top-level elements. .... Initially, the set of whitespace-preserving element names contains all element names. ----------- since you have no declaration in the stylesheet to modify the set of strip-space elements, well <water> is certainly in the set and the whitespaces in the element MUST NOT be stripped. I firmly believe the actual output of libxslt is *right* ! Please go report a bug for Altova (and tell me if/when they fix it ;-) Daniel
Salut Daniel, OK, you convinced me, the standard seems to make libxsl right for the document nodes. I missed the "Initially, the set of whitespace-preserving element names contains all element names." sentence. My bad. I don't see why I would do a bug report against a proprietary product. Did they bring me a present to me like you do? But feel free to file it (and tell me if/when they fix it ;-). Eric
Daniel, It might be a bit less simple than it seems: A quote from XSLT 2.0 recommendation J.1.1 Tree construction: whitespace stripping Both in XSLT 1.0 and in XSLT 2.0, the XSLT specification places no constraints on the way in which source trees are constructed. For XSLT 2.0, however, the [Data Model] specification describes explicit processes for constructing a tree from an Infoset or a PSVI, while also permitting other processes to be used. The process described in [Data Model] has the effect of stripping whitespace text nodes from elements declared to have element-only content. Although the XSLT 1.0 specification did not preclude such behavior, it differs from the way that most existing XSLT 1.0 implementations work. The sentence "XSLT 1.0 specification did not preclude such behavior" suggests that it might be authorized behaviour to strip whitespace-only nodes for elements with element-only content model (no mixed content). That would make both products behave correctly. That raises also another problem : how do you differentiate mixed content nodes from pure complex content nodes with no text in the instance document? Without access to the DTD or schema, I don't see how to do that from the style sheet. Eric
Well the XSLT-1.0 standard is defined by the XSLT-1.0 specification. I don't see any reason why those nodes should be stripped out based on the spec itself. That the XSLT-2.0 spec (written in a large part by a different set of persons), may have a different perspective fine, but they can't just modify a posteriori an older spec. Based on the spec I think the current behaviour is the only correct. Daniel