Bug 437179 – xmlXPathCastStringToNumber(xmlXPathCastNumberToString(v)) != v

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 437179 - xmlXPathCastStringToNumber(xmlXPathCastNumberToString(v)) != v


Summary:	xmlXPathCastStringToNumber(xmlXPathCastNumberToString(v)) != v


Status:	RESOLVED INCOMPLETE

Product:	libxml2
Classification:	Platform
Component:	general
Version:	2.6.26
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Daniel Veillard
QA Contact:	libxml QA maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2007-05-09 15:07 UTC by Kevin Puetz
Modified:	2010-05-05 10:48 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch removing MAX_FRAC limitation in xmlXPathStringEvalNumber (3.46 KB, patch) 2007-05-11 16:38 UTC, Kevin Puetz	none	Details \| Review
Patch removing MAX_FRAC limitation in xmlXPathStringEvalNumber (2.14 KB, patch) 2007-05-11 17:02 UTC, Kevin Puetz	none	Details \| Review

Description Kevin Puetz 2007-05-09 15:07:36 UTC

Please describe the problem:
The verbiage about decimal form at http://www.w3.org/TR/xpath#section-String-Functions seems to imply that number(string(v)) == v should hold, and these are the functions implementing that transform. I asked about this in IRC, and bbhk asked me to file it.

xmlXPathCastStringToNumber seems to be basing it's decision about how many digits to print on DBL_DIG, which is inadequate for two reasons:
1) DBL_DIG indicates the number of digits for which all values can be distinguished by a double; this is not the same thing as the number of digits adequate to distinguish all doubles. 
   First, it is highly likely that there are *some* values that can be distinguished with more than DBL_DIG places, just not all of them; if a double can represent 1.0{14}00 and 1.0{14}05 and 0.0{14}10, DBL_DIG has to be 15 since it cannot distinguish 1.0{14}02 from the above. 
   Second, because the radix mismatch between decimal and binary floats can also need another digit. 0.999...x is nearly 10 times larger than 0.1000...xy, so the latter has >3 bits of additional precision to work with and may need an additional decimal digit y to uniquely distinguish the smaller values.
   In total, for IEEE-754 floats, you can need somewhere between DBL_DIG and DBL_DIG+2 places to recoverably represent the value.

2) Prior to switching to exponential notation (which XPath doesn't really permit, but I've already seen the argument that libxml2's behavior is a deliberate spec deviation; it doesn't bother me), xmlXPathCastStringToNumber is not compensating the number of digits it prints for leading decimal 0s (abs(v) < 1). So we can be losing up to 5 additional digits for leading 0s.

Steps to reproduce:
double v1 = 0.000030999999580672004 //0x3F0040BFDFFFFFFF
double v2 = xmlXPathCastStringToNumber(xmlXPathCastNumberToString(v))


Actual results:
v1 != v2

Expected results:
v1 == v2

Does this happen every time?
yes

Other information:

Comment 1 William M. Brack 2007-05-11 14:47:16 UTC

I think a more precise definition of DBL_DIG is:
  The maximum number of decimal digits such that any floating point number
  with DBL_DIG (base 10) digits can be rounded to a floating point number
  on the underlying architecture, and back again, without change to the
  DBL_DIG digits.
For libxml2, this is either defined by the underlying libraries, or (if not
defined there) as 16.  For my particular OS's (Fedora FC6 and Gentoo), the
definition comes from the libraries as 15.  Therefore, as long as the number of
significant digits is less than or equal to 15, the equality which you desire
should hold.  For the current released version of libxml2 it doesn't hold, so
there is certainly some validity in your report.

On the other hand, your "test case" involves a constant with 17 digits of
precision.  Since that exceeds DBL_DIG, there can be no guarantee that your
desired equality will hold; in fact, it will almost certainly fail (since the
string representation is rounded to meet the DBL_DIG requirement).

Your point (2) is valid, and that was where the library was in error.  I have
coded an enhancement to xpath.c for that part which is now present in SVN.

Please carefully consider what I have put down above.  If you agree, we can
close the bug.  If you disagree, then we can discuss it further :-).

Comment 2 Kevin Puetz 2007-05-11 15:22:14 UTC

My test case is a constant exactly representable as a double (which is why I showed the 64bit hex value I had in mind in the comment). Not all 17 digit values are (since 17 > DBL_DIG), but this one is.

DBL_DIG is the length for which one can confidently go from decimal->double->decimal and know you will recover the original decimal. XPath does not have a spec constraint that would be based on DBL_DIG, as it does not have a decimal type, and there are lots of cases where string(number(s)) != s (s could have trailing 0s, or not be a number at all, or similar).

I'm going to give single-precision examples since the numbers are more manageable; the same logic applies to double of course.

For IEEE754 32bit float, FLT_DIG is 6; all numbers with 6 significant figures are distinguishable.

0x3FFFFFEF == 1.9999989
0x3FFFFFF0 == 1.9999981
0x3FFFFFF1 == 1.9999982
0x3FFFFFF2 == 1.9999983
0x3FFFFFF3 == 1.9999985
0x3FFFFFF4 == 1.9999986
0x3FFFFFF5 == 1.9999987
0x3FFFFFF6 == 1.9999988
0x3FFFFFF7 == 1.999999f

So 



However, IMO the spec requires that number(string(n)) == n, and this is not met by printing DBL_DIG.

Comment 3 Kevin Puetz 2007-05-11 16:07:36 UTC

blast, I hit submit early by accident. Ignore everything past "I'm going to give single-precision examples...". As I worked out my numbers, I discovered the first case I discussed doesn't happen for singles, more or less by chance (but it still does for doubles). So I guess the examples will be doubles after all.

For IEEE754 64bit double,DBL_DIG is 15; all numbers with 15 significant figures
are distinguishable.
...
0x4023FFFFFFFFFFF5 == 9.999999999999980
...
0x4023FFFFFFFFFFFA == 9.999999999999990
0x4023FFFFFFFFFFFB == 9.999999999999991
0x4023FFFFFFFFFFFC == 9.999999999999993
0x4023FFFFFFFFFFFD == 9.999999999999995
0x4023FFFFFFFFFFFE == 9.999999999999996
0x4023FFFFFFFFFFFF == 9.999999999999998

DBL_DIG is correctly 15; it can't distinguish all 16-digit numbers (9.9...2,9.9...4,and 9.9...7 have no represnation and will round). However, we need 16 digits to represant all doubles, since *some* of them are distinct; this relation is not fully symmetric.

Moving down to smaller 16-digit numbers...

0x3FF0000000000000 == 1.0000000000000000
0x3FF0000000000001 == 1.0000000000000002
0x3FF0000000000002 == 1.0000000000000004
...

So down near 1 we can represent all 16 digit numbers, and additionally some 17-digit ones.

Hence it takes more than DBL_DIG to guarentee that num(string(v)) == v for doubles.

Comment 4 Kevin Puetz 2007-05-11 16:38:02 UTC

Created attachment 88024 [details] [review]
Patch removing MAX_FRAC limitation in xmlXPathStringEvalNumber

to fix the leading 0s, we also need to avoid getting trapped by MAX_FRAC in xmlXPathStringEvalNumber; Either raise it to 22 or just not need require it at all. The attached patch fixes that so we'll parse anything.

Comment 5 Kevin Puetz 2007-05-11 17:02:42 UTC

Created attachment 88028 [details] [review]
Patch removing MAX_FRAC limitation in xmlXPathStringEvalNumber

gah, I attached my whole test patch, and I only meant to attach the xmlXPathStringEvalNumber part. Sorry about the confusion.

Comment 6 Kevin Puetz 2007-05-17 03:14:25 UTC

I put some more thought into generalizing the cases I showed above. Luckily, the solution was simpler than I was thinking it would be; the minimum number of needed digits is a straightforward function of the resolution at any point, and may be calculated as ceil(-log10(ulp)), which will range from 15 to 17 for double.

ulp, in turn, is related to epsilon, raised to the same exponent as in v; this is not hard to calculate, though portability is a bit tricky.

In C99, we can find ulp regardless of the machine's radix with:

int exp = ilogb(v);
if(exp < DBL_MIN_EXP) exp = DBL_MIN_EXP; // clamp subnormals;
double ulp = scalbn(DBL_EPSILON,exp);

Likewise in SuS, BSD4.3, or X/Open, though with a different function than C99 standardized:

double exp = logb(v)
if(exp < if(exp < DBL_MIN_EXP) exp = DBL_MIN_EXP; // clamp subnormals;
double ulp = scalb(DBL_EPSILON,exp)

In pure C89, though, we can't count on scalb or logb in either form (though they are generally available). Without them the best we can do is to assume (or at least accept approximating) a machine with base-2 floating-point, since that's all C89's math lib really supports. There we do:

int exp;
frexp(v,&exp
if(exp < DBL_MIN_EXP) exp = DBL_MIN_EXP; // clamp subnormals;
double ulp = ldexp(DBL_EPSILON,exp);

This is alost certainly good enough; XPath specificlalu mandates IEEE754, so binary floating point isn't a pretty solid assumption. In any case, I suspect slightly wrong rounding in string() is probably not the only nonconformance if compiling libxml2 on a machine that does something completely different...

Comment 7 William M. Brack 2007-05-29 00:29:22 UTC

I did a little experimentation with your suggestions.  It seems to be an improvement, but still not always producing the results you desire.  I prepared a test program, together with the relevant routines extracted from xpath.c, with your proposed changes included.  Perhaps you could download and try it?
   http://bbrack.org/~bill/xp_num_str2.tgz

Comment 8 Daniel Veillard 2007-08-23 12:03:00 UTC

At that point we are really waiting for further feedback
based on the test program, so input welcome !

Daniel

Comment 9 Tobias Mueller 2009-02-05 20:12:54 UTC

Hm. What are we going to do about this issue?

If it's recognized as a bug which has to be fixed, I'd set this to NEW. If not, close as WONTFIX.

Comment 10 Kevin Puetz 2009-02-05 20:57:00 UTC

I've actually gone back and improved this some more recently (it now passes all the samples that were here, and I think it's finally right in general), but we have a introduce a new "open source contribution" process that the change is struggling through. So there is an updated patch coming whenever it finally clears...

Comment 11 Daniel Veillard 2009-08-11 15:32:19 UTC

Kevin, any news on that last patch ?

Daniel

Comment 12 Tobias Mueller 2010-01-25 21:47:45 UTC

Kevin? Thank you so much for your efforts so far, could you please comment on Daniels question in comment 11? Thanks in advance!

Comment 13 Tobias Mueller 2010-03-19 16:51:16 UTC

Kevin, Ping. I'm afraid that we have to close it as INCOMPLETE if we don't receive feedback :( That'd be very unfortunate given the amount of work that's already been done.

Comment 14 Tobias Mueller 2010-05-05 10:48:44 UTC

Closing this bug report as no further information has been provided. Please feel free to reopen this bug if you can provide the information asked for.
Thanks!