GNOME Bugzilla – Bug 437179
xmlXPathCastStringToNumber(xmlXPathCastNumberToString(v)) != v
Last modified: 2010-05-05 10:48:44 UTC
Please describe the problem: The verbiage about decimal form at http://www.w3.org/TR/xpath#section-String-Functions seems to imply that number(string(v)) == v should hold, and these are the functions implementing that transform. I asked about this in IRC, and bbhk asked me to file it. xmlXPathCastStringToNumber seems to be basing it's decision about how many digits to print on DBL_DIG, which is inadequate for two reasons: 1) DBL_DIG indicates the number of digits for which all values can be distinguished by a double; this is not the same thing as the number of digits adequate to distinguish all doubles. First, it is highly likely that there are *some* values that can be distinguished with more than DBL_DIG places, just not all of them; if a double can represent 1.0{14}00 and 1.0{14}05 and 0.0{14}10, DBL_DIG has to be 15 since it cannot distinguish 1.0{14}02 from the above. Second, because the radix mismatch between decimal and binary floats can also need another digit. 0.999...x is nearly 10 times larger than 0.1000...xy, so the latter has >3 bits of additional precision to work with and may need an additional decimal digit y to uniquely distinguish the smaller values. In total, for IEEE-754 floats, you can need somewhere between DBL_DIG and DBL_DIG+2 places to recoverably represent the value. 2) Prior to switching to exponential notation (which XPath doesn't really permit, but I've already seen the argument that libxml2's behavior is a deliberate spec deviation; it doesn't bother me), xmlXPathCastStringToNumber is not compensating the number of digits it prints for leading decimal 0s (abs(v) < 1). So we can be losing up to 5 additional digits for leading 0s. Steps to reproduce: double v1 = 0.000030999999580672004 //0x3F0040BFDFFFFFFF double v2 = xmlXPathCastStringToNumber(xmlXPathCastNumberToString(v)) Actual results: v1 != v2 Expected results: v1 == v2 Does this happen every time? yes Other information:
I think a more precise definition of DBL_DIG is: The maximum number of decimal digits such that any floating point number with DBL_DIG (base 10) digits can be rounded to a floating point number on the underlying architecture, and back again, without change to the DBL_DIG digits. For libxml2, this is either defined by the underlying libraries, or (if not defined there) as 16. For my particular OS's (Fedora FC6 and Gentoo), the definition comes from the libraries as 15. Therefore, as long as the number of significant digits is less than or equal to 15, the equality which you desire should hold. For the current released version of libxml2 it doesn't hold, so there is certainly some validity in your report. On the other hand, your "test case" involves a constant with 17 digits of precision. Since that exceeds DBL_DIG, there can be no guarantee that your desired equality will hold; in fact, it will almost certainly fail (since the string representation is rounded to meet the DBL_DIG requirement). Your point (2) is valid, and that was where the library was in error. I have coded an enhancement to xpath.c for that part which is now present in SVN. Please carefully consider what I have put down above. If you agree, we can close the bug. If you disagree, then we can discuss it further :-).
My test case is a constant exactly representable as a double (which is why I showed the 64bit hex value I had in mind in the comment). Not all 17 digit values are (since 17 > DBL_DIG), but this one is. DBL_DIG is the length for which one can confidently go from decimal->double->decimal and know you will recover the original decimal. XPath does not have a spec constraint that would be based on DBL_DIG, as it does not have a decimal type, and there are lots of cases where string(number(s)) != s (s could have trailing 0s, or not be a number at all, or similar). I'm going to give single-precision examples since the numbers are more manageable; the same logic applies to double of course. For IEEE754 32bit float, FLT_DIG is 6; all numbers with 6 significant figures are distinguishable. 0x3FFFFFEF == 1.9999989 0x3FFFFFF0 == 1.9999981 0x3FFFFFF1 == 1.9999982 0x3FFFFFF2 == 1.9999983 0x3FFFFFF3 == 1.9999985 0x3FFFFFF4 == 1.9999986 0x3FFFFFF5 == 1.9999987 0x3FFFFFF6 == 1.9999988 0x3FFFFFF7 == 1.999999f So However, IMO the spec requires that number(string(n)) == n, and this is not met by printing DBL_DIG.
blast, I hit submit early by accident. Ignore everything past "I'm going to give single-precision examples...". As I worked out my numbers, I discovered the first case I discussed doesn't happen for singles, more or less by chance (but it still does for doubles). So I guess the examples will be doubles after all. For IEEE754 64bit double,DBL_DIG is 15; all numbers with 15 significant figures are distinguishable. ... 0x4023FFFFFFFFFFF5 == 9.999999999999980 ... 0x4023FFFFFFFFFFFA == 9.999999999999990 0x4023FFFFFFFFFFFB == 9.999999999999991 0x4023FFFFFFFFFFFC == 9.999999999999993 0x4023FFFFFFFFFFFD == 9.999999999999995 0x4023FFFFFFFFFFFE == 9.999999999999996 0x4023FFFFFFFFFFFF == 9.999999999999998 DBL_DIG is correctly 15; it can't distinguish all 16-digit numbers (9.9...2,9.9...4,and 9.9...7 have no represnation and will round). However, we need 16 digits to represant all doubles, since *some* of them are distinct; this relation is not fully symmetric. Moving down to smaller 16-digit numbers... 0x3FF0000000000000 == 1.0000000000000000 0x3FF0000000000001 == 1.0000000000000002 0x3FF0000000000002 == 1.0000000000000004 ... So down near 1 we can represent all 16 digit numbers, and additionally some 17-digit ones. Hence it takes more than DBL_DIG to guarentee that num(string(v)) == v for doubles.
Created attachment 88024 [details] [review] Patch removing MAX_FRAC limitation in xmlXPathStringEvalNumber to fix the leading 0s, we also need to avoid getting trapped by MAX_FRAC in xmlXPathStringEvalNumber; Either raise it to 22 or just not need require it at all. The attached patch fixes that so we'll parse anything.
Created attachment 88028 [details] [review] Patch removing MAX_FRAC limitation in xmlXPathStringEvalNumber gah, I attached my whole test patch, and I only meant to attach the xmlXPathStringEvalNumber part. Sorry about the confusion.
I put some more thought into generalizing the cases I showed above. Luckily, the solution was simpler than I was thinking it would be; the minimum number of needed digits is a straightforward function of the resolution at any point, and may be calculated as ceil(-log10(ulp)), which will range from 15 to 17 for double. ulp, in turn, is related to epsilon, raised to the same exponent as in v; this is not hard to calculate, though portability is a bit tricky. In C99, we can find ulp regardless of the machine's radix with: int exp = ilogb(v); if(exp < DBL_MIN_EXP) exp = DBL_MIN_EXP; // clamp subnormals; double ulp = scalbn(DBL_EPSILON,exp); Likewise in SuS, BSD4.3, or X/Open, though with a different function than C99 standardized: double exp = logb(v) if(exp < if(exp < DBL_MIN_EXP) exp = DBL_MIN_EXP; // clamp subnormals; double ulp = scalb(DBL_EPSILON,exp) In pure C89, though, we can't count on scalb or logb in either form (though they are generally available). Without them the best we can do is to assume (or at least accept approximating) a machine with base-2 floating-point, since that's all C89's math lib really supports. There we do: int exp; frexp(v,&exp if(exp < DBL_MIN_EXP) exp = DBL_MIN_EXP; // clamp subnormals; double ulp = ldexp(DBL_EPSILON,exp); This is alost certainly good enough; XPath specificlalu mandates IEEE754, so binary floating point isn't a pretty solid assumption. In any case, I suspect slightly wrong rounding in string() is probably not the only nonconformance if compiling libxml2 on a machine that does something completely different...
I did a little experimentation with your suggestions. It seems to be an improvement, but still not always producing the results you desire. I prepared a test program, together with the relevant routines extracted from xpath.c, with your proposed changes included. Perhaps you could download and try it? http://bbrack.org/~bill/xp_num_str2.tgz
At that point we are really waiting for further feedback based on the test program, so input welcome ! Daniel
Hm. What are we going to do about this issue? If it's recognized as a bug which has to be fixed, I'd set this to NEW. If not, close as WONTFIX.
I've actually gone back and improved this some more recently (it now passes all the samples that were here, and I think it's finally right in general), but we have a introduce a new "open source contribution" process that the change is struggling through. So there is an updated patch coming whenever it finally clears...
Kevin, any news on that last patch ? Daniel
Kevin? Thank you so much for your efforts so far, could you please comment on Daniels question in comment 11? Thanks in advance!
Kevin, Ping. I'm afraid that we have to close it as INCOMPLETE if we don't receive feedback :( That'd be very unfortunate given the amount of work that's already been done.
Closing this bug report as no further information has been provided. Please feel free to reopen this bug if you can provide the information asked for. Thanks!