GNOME Bugzilla – Bug 337565
XPath conversion of numbers into string is incorrect
Last modified: 2021-07-05 13:22:19 UTC
Please describe the problem: In the conversion of a "large" number into a string, the "style e" printf representation is used, which doesn't conform to the XSLT recommendation (and this leads to completely buggy results in some XSLT stylesheets). This can be seen with xsltproc and the following number2string.xsl stylesheet: <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:param name="n"/> <xsl:template match="/"> <xsl:value-of select="string(number($n))"/> <xsl:text> </xsl:text> </xsl:template> </xsl:stylesheet> Steps to reproduce: 1. xsltproc --param n 21474836478 number2string.xsl number2string.xsl Actual results: 2.1474836478e+10 Expected results: 21474836478 Does this happen every time? Yes. Other information: xalan does the conversion correctly, e.g. "xalan -xsl number2string.xsl -in number2string.xsl -param n 123456789012345678901235465460506540646484474" outputs 123456789012345669380138225650680065035337728 (which is correct as the source number is on 53 bits).
Was already discussed in the libxml2 list, and very unlikely to be changed without discussions there. The section on number formatting of XSLT-1.0 refers to a vanished page related to Java, this is a big screwup of that spec. Raise it in the mailing-list if you want any change on that front that won't be changed just by adding a bugzilla entry, as this was already discussed on the list and the majority was for the status-quo. Daniel
Reading through the spec, it looks to me as though this report is correct, and the code should be fixed. I agree that it would be best to allow for some discussion on the mailing list before implementing this change, but I have already coded and tested a fix that seems (to me) to take care of it. I'm attaching this proposed fix to this report, and if there is no substantial objection on the mailing list, I will commit it after awhile.
Created attachment 62897 [details] [review] patch to xpath.c in libxml2 for changing behaviour of number format
[Concerning where the discussion should take place: wouldn't it be better to have it here in bugzilla, where anyone can Cc without subscribing to the mailing-list? It is also more visible to anyone who looks for libxml2 bugs. Of course, this bug could be pointed to the libxml2 ML members.] There are 2 problems with the current behavior: 1. A number that is written by xsltproc must be readable by any XSLT/XPath processor, and the "style e" is not a number according to the XPath recommendation. More precisely, converting a string like 1.23456789012e+11 into a number should produce a NaN (this is again a bug in libxml2, and xalan does it right). So, this means that data produced by xsltproc won't be readable by other XSLT/XPath processors like xalan, because libxml2 breaks the specs. For an interchange format (one of the main goals of XML), this is not acceptable. 2. Even inside xsltproc, the current behavior may break things when one wants to do string manipulations like digit extraction (this is my case: I had integers between 0 and 2^53). It seems that the patch corrects integers only. There's the same problem with large non-integers (between 10^9 and 2^52?). Also the generated strings cannot always allow to distinguish the numbers: <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:param name="x"/> <xsl:param name="y"/> <xsl:template match="/"> <xsl:variable name="nx" select="number($x)"/> <xsl:variable name="ny" select="number($y)"/> <xsl:variable name="cmp"> <xsl:choose> <xsl:when test="$nx = $ny">equal</xsl:when> <xsl:otherwise>different</xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:value-of select="concat('The following numbers: ', $nx, ' ', $ny, ' are ', $cmp, '. ')"/> </xsl:template> </xsl:stylesheet> $ xsltproc --param x 4503599627370497 --param y 4503599627370498 diff-numbers.xsl diff-numbers.xsl The following numbers: 4.5035996273705e+15 4.5035996273705e+15 are different. Two different numbers must have different strings, as the XPath spec says: "beyond the one required digit after the decimal point there must be as many, but only as many, more digits as are needed to uniquely distinguish the number from all other IEEE 754 numeric values."
As I stated, this was debated previously, and people argued in favor of the status quo. Those people *MUST* get the discussion, they won't see it if it's held in bugzilla. If people use libxslt/libxml2 and care about it, subscribing to the list should not be a problem, there is hundreds of people watching on the list, only half a dozen see the mail exchange on this bugzilla => discussion goes on the list. http://mail.gnome.org/archives/xslt/2006-April/msg00019.html Daniel
Ref Comment #4 - seems you are mixing apples and oranges. Your example is being run without the patch (still using 'e' format). I support your argument that "integer" values shouldn't be output in exponential format. In light of the previous (5 years ago) discussion about exponential format in general, I am not particularly supportive of the remainder of your points, but am still willing to be convinced. You are correct in respect to the fact that my patch only corrects the treatment of integers - that was intentional.
(In reply to comment #6) > Ref Comment #4 - seems you are mixing apples and oranges. Your example is > being run without the patch (still using 'e' format). Yes, I meant that without the patch, libxml2 is buggy for *two* reasons. But the patch does not fix the second problem: xsltproc --param x 1 --param y 1.000000000000001 diff-numbers.xsl diff-numbers.xsl gives: The following numbers: 1 1 are different.
I have a comment about the patch posted in comment #3. What happens if buffersize is too small? This is used at least here in xpath.c (xmlXPathCastNumberToString): char buf[100]; xmlXPathFormatNumber(val, buf, 99); buf[99] = 0; ret = xmlStrdup((const xmlChar *) buf); For integers, the number can have at most 310 characters (sign + 308 digits), so that a 311-byte buffer is sufficient in this case.
The bug still occurs in libxml2 2.9.10: $ echo '<a/>' | xmllint --xpath "string(1.0999999999999999)" - 1.1 which would mean that 1.0999999999999999 and 1.1 are converted to the same floating-point number. But this is not the case, as their difference is non-zero: $ echo '<a/>' | xmllint --xpath "string(1.1 - 1.0999999999999999)" - 2.22044604925031e-16 This is confirmed by atof() in C: 1.0999999999999999 gives 0x1.1999999999999p+0 1.1 gives 0x1.199999999999ap+0 i.e. 2 different double-precision numbers.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.