GNOME Bugzilla – Bug 663712
strtod sometimes off by 1 ULP
Last modified: 2012-12-26 17:59:39 UTC
I used the spreadsheet function called "value" of gnumeric 1.10.17 on Ubuntu 10.04 x64 in order to convert extended numbers of type string to doubles. When comparing the results to those of self-written routines using extended precision, in 46 cases out of a sample of 1M data, gnumeric yielded doubles that were systematically too small by just 1 ULP. I then used the strtod function in C to convert all the numbers directly and found no deviation at all from the results obtained by self-written conversion routines. These 46 strings are contained in the first row of the attached file. The doubles are represented by their hex notation in little endian order (obtained using the union feature of C in order to access the 8 bytes of the doubles directly; essential part of C is given in the file). Some other hopefully helpful information is given as well. Regards Schorsch
Created attachment 201086 [details] XLS file providing the problematic text strings
This would happen in libgoffice Breakpoint 1, go_strtod (s=0x873a190 "5.24695020914077743636723377400E-237", end=0xbfffdc28) at go-math.c:362 I find it some what strange though that you seem to be interested in excessive precision but choose to use the single precision version of Gnumeric provided by Ubuntu rather than the long-double version of Gnumeric.
Note that go_strtod just uses C's strtod function (after providing some setup services): Breakpoint 1, go_strtod (s=0x87e6d58 "5.24695020914077743636723377400E-237", end=0xbfffdc28) at go-math.c:362 362 int maxlen = strtod_helper (s); (gdb) n 367 if (maxlen == INT_MAX) { (gdb) 368 errno = 0; (gdb) 369 return strtod (s, end); (gdb) p s $1 = 0x87e6d58 "5.24695020914077743636723377400E-237" (gdb) I think we need some more details of how you obtained your results.
How did you retrieve the number that you think Gnumeric parsed from the string? Did you use the value as displayed? So perhaps the difference only lies in the rounding of the displayed number.
Gnumeric git now has a HEXREP function useful for debugging things like this.
I can confirm the numbers in the C column, i.e., the result that Gnumeric produces. However, I see the same numbers when I go via C: welinder@sequoia:~> cat foo.c #include <stdio.h> #include <stdlib.h> int main (int argc, char **argv) { union { double d; unsigned char b[8]; } u; int i; u.d = strtod ("5.24695020914077743636723377400E-237", NULL); for (i = 7; i >= 0; i--) { printf ("%02x ", u.b[i]); } printf("\n"); return 0; } welinder@sequoia:~> gcc -Wall foo.c welinder@sequoia:~> ./a.out 0e e1 15 57 ab cf e3 9a
This appears to be a known libc problem: http://sources.redhat.com/bugzilla/show_bug.cgi?id=3479
Note, that you should not rely on gcc getting this right when you embed the constant directly: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21718
A fix to glibc isn't likely to be fixed within a decade, so we might have to rely on something like this: http://www.netlib.org/fp/dtoa.c
Thank you very much, Morten and Andreas! I think the values in col D of the XLS file clearly show that the very numbers which are incorrectly parsed are only slightly bigger than the midpoint between two doubles. It seems to me that it is impossible to state that a number is "even", i.e. located right in the middle between two doubles, without checking infinitely many bits... I'd rather round up if the 54th bit was 1, but I'm not IEEE. I used gcc_x32 4.6.1 on Win7x64 without any complier switches and that version got it right. All the extended and hex stuff in gnumeric was done using my Python extension functions once annouced on the gnumeric mailing list. In Excel, I used the FOSS addin xNumbers v6.
This raises the question of which version of gcc were used that did not get it right. Perhaps the issue is already fixed in all glibc 4.6 and newer.
Neither gcc [compile-time constants; less important] nor libc [runtime via strtod] are fixed as far as I can tell. Nor do they have a hint of a proposed fix. > I used gcc_x32 4.6.1 on Win7x64 without any complier switches and that > version got it right. Was that a more comprehensive search or just for the values in the file?
Not a single deviation within my 1M sample. This sample uses a "uniform random" decimal mantissa between 1 and 10 and a "uniform random" exponent between -308 and 308. In case the "random" exponent was found to be +/-308, the range of mantissa values was adjusted in order to avoid normalised DP over-/underflow conditions. I will check whether the 46 cases listed in the XLS file are the 46 samples closest to a "tie" situation present in the 1M sample. I find it surprising that the largest exponent is +14 among the 46 deviating conversions. The "uniform random" numbers were created using the xRandD function of xNumbers add-in. <a href="http://en.wikipedia.org/wiki/IEEE_754_revision">On wikipedia</a> I found that in IEEE754-2008 there is a rounding mode called "round-to-nearest, ties away from zero" that would imho allow to base the rounding decision solely on the 54th bit. I still cannot imagine how to decide what is a "tie" using only a finite number of additional bits without violating the "round to nearest" rule with a much higher probability (all numbers that have a single out of infinitely many trailing bits set) than avoiding a bias in case of an exact tie (the single number that has none of infinitely many trailing bits set).
Sorry for the double post, was caused by a loss of internet connection on the move. --- The fact that deviations occurred only for exponents smaller than +15 is not caused by the sample. The two numbers in the sample that have the largerst amount of zero mantissa bits between the 54-th bit and the next trailing bit that is 1 are: x3.84845072031021156163156105719E+139 IIOOIIIOIIOIOOIIIIOOIOIIOIOIOOOIIIOIOOOIIOIOIOOOOIOOIVIOOOOOOOOOOOOOOOOOOIOIIIIOO 18x"0"; would be rounded up to "even" x3.93696951866149878343763053445E+161 IIIOOOOOOOOOOIIOOOOIIIIOOOOOIIIIOOIOIOIIIIIOOOIOOIOOOVIOOOOOOOOOOOOOOOOOOIIOIOOIO 18x"0"; would be rounded up to "odd" The "V" has been always inserted after the 53rd mantissa bit in order to increase "readability".
You do not need infinite precision to decide ties. If you have x = strtod(str,..) and worry about whether x is the right value, then you can compute str2 = dtostr(x) // exact operation because any binary floating-point number, x, has a finite decimal string representation. (2^-n needs exactly n decimals.) You can then subtract str and str2 using the decimal arithmetic we all learned in school. Finally you compare the difference to half of the value of the lowest bit in x. (There are probably some tricky special cases near powers of 2 and zero.) This just shows it can be done, not that anyone would do it this way! In practice, http://www.netlib.org/fp/dtoa.c seems to be what people are pointing to.
Thank you again, Morten. In the libquadmath path of the gcc 4.6.1 source files tree I found a strtod_I.c file that has been changed this February. This might explain why the gcc version 4.6.1 published in June gets things right. I yet have not looked into the diffs... and to be honest, that code seems to be above my C horizon. We agree on the fact that IEEE doubles correspond to a finite decimal fraction. I once did a linear regression of the number of significant decimal figures (NSDF) on the leading power of 2 (LPof2) of an IEEE DP (sdev of the reg. coefficients in brackets): If LPof2 >= 52 Then NSDF = 3,01035E-1(3,3E-5) · LPof2 + 7,98E-1(2,0E-2) Else NSDF = -6,98974E-1(3,1E-5) · LPof2 + 5,2799E+1(1,7E-2) For example, this yields NSDF = 767 for Pof2 = -1022 (leading exponent of smallest normalised DP) and SDF = 803 for Pof2 = -1074 (least significant bit of smallest normalised DP). So n digits are more than are sufficient to hold 2^(-n). I double checked the NSDF values by calculating them in XL/xNumbers as well as in Python using the mpmath package. The problem imho is the conversion from a decimal fraction into a binary one, because a finite decimal fraction may correspond to an infinite binary fraction. I do not know whether number theory can show that only binary periods of finite length might occur so that a finite binary fraction contains all the information of the true infinite one.
Any rational number can be written as a finite or repeating fraction independent from the base. The proof that you always get a repeating (possibly repeating 0s, ie. finite) does not depend on the base used fore representation.
Thank you Andreas, I now understand that it is possible to decide whether a finite decimal number is a "tie" even in the binary system using only a finite number of bits. I noticed that somewhere in the source code the number extracted from a string was transformed into a rational number having a power of 10 as denominator. Of course, then a hardware or software implementation specified by IEEE need only operate on bits and thus is much faster than the "brute force" software implementation using extended decimal numbers which I applied to obtain the 80 bit mantissa.
The newest glibc release fixes the underlying issue, see http://sourceware.org/bugzilla/show_bug.cgi?id=3479