Bug 663712 – strtod sometimes off by 1 ULP

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 663712 - strtod sometimes off by 1 ULP


Summary:	strtod sometimes off by 1 ULP


Status:	RESOLVED NOTGNOME

Product:	libgoffice
Classification:	Other
Component:	General
Version:	0.8.x
Hardware:	Other other

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Jody Goldberg
QA Contact:	Jody Goldberg

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-11-09 17:56 UTC by schorsch_mcmlx
Modified:	2012-12-26 17:59 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
XLS file providing the problematic text strings (34.00 KB, application/vnd.ms-excel) 2011-11-09 17:58 UTC, schorsch_mcmlx	Details

Description schorsch_mcmlx 2011-11-09 17:56:42 UTC

I used the spreadsheet function called "value" of gnumeric 1.10.17 on Ubuntu 10.04 x64 in order to convert extended numbers of type string to doubles. When comparing the results to those of self-written routines using extended precision, in 46 cases out of a sample of 1M data, gnumeric yielded doubles that were systematically too small by just 1 ULP. 

I then used the strtod function in C to convert all the numbers directly and found no deviation at all from the results obtained by self-written conversion routines.

These 46 strings are contained in the first row of the attached file. The doubles are represented by their hex notation in little endian order (obtained using the union feature of C in order to access the 8 bytes of the doubles directly; essential part of C is given in the file). Some other hopefully helpful information is given as well.

Regards
Schorsch

Comment 1 schorsch_mcmlx 2011-11-09 17:58:13 UTC

Created attachment 201086 [details]
XLS file providing the problematic text strings

Comment 2 Andreas J. Guelzow 2011-11-09 18:36:17 UTC

This would happen in libgoffice

Breakpoint 1, go_strtod (s=0x873a190 "5.24695020914077743636723377400E-237", 
    end=0xbfffdc28) at go-math.c:362

I find it some what strange though that you seem to be interested in excessive precision but choose to use the single precision version of Gnumeric provided by Ubuntu rather than the long-double version of Gnumeric.

Comment 3 Andreas J. Guelzow 2011-11-09 18:44:39 UTC

Note that go_strtod just uses C's strtod function (after providing some setup services):

Breakpoint 1, go_strtod (s=0x87e6d58 "5.24695020914077743636723377400E-237", 
    end=0xbfffdc28) at go-math.c:362
362		int maxlen = strtod_helper (s);
(gdb) n
367		if (maxlen == INT_MAX) {
(gdb) 
368			errno = 0;
(gdb) 
369			return strtod (s, end);
(gdb) p s
$1 = 0x87e6d58 "5.24695020914077743636723377400E-237"
(gdb) 

I think we need some more details of how you obtained your results.

Comment 4 Andreas J. Guelzow 2011-11-09 18:49:18 UTC

How did you retrieve the number that you think Gnumeric parsed from the string? Did you use the value as displayed? So perhaps the difference only lies in the rounding of the displayed number.

Comment 5 Morten Welinder 2011-11-09 21:00:39 UTC

Gnumeric git now has a HEXREP function useful for debugging things like
this.

Comment 6 Morten Welinder 2011-11-09 21:21:57 UTC

I can confirm the numbers in the C column, i.e., the result that Gnumeric
produces.

However, I see the same numbers when I go via C:

welinder@sequoia:~> cat foo.c
#include <stdio.h>
#include <stdlib.h>

int
main (int argc, char **argv)
{
  union {
    double d;
    unsigned char b[8];
  } u;
  int i;

  u.d = strtod ("5.24695020914077743636723377400E-237", NULL);

  for (i = 7; i >= 0; i--) {
    printf ("%02x ", u.b[i]);
  }
  printf("\n");

  return 0;
}
welinder@sequoia:~> gcc -Wall foo.c
welinder@sequoia:~> ./a.out 
0e e1 15 57 ab cf e3 9a

Comment 7 Morten Welinder 2011-11-09 21:34:21 UTC

This appears to be a known libc problem:

http://sources.redhat.com/bugzilla/show_bug.cgi?id=3479

Comment 8 Morten Welinder 2011-11-09 21:45:42 UTC

Note, that you should not rely on gcc getting this right when you embed
the constant directly:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21718

Comment 9 Morten Welinder 2011-11-09 21:57:00 UTC

A fix to glibc isn't likely to be fixed within a decade, so we might
have to rely on something like this:

http://www.netlib.org/fp/dtoa.c

Comment 10 schorsch_mcmlx 2011-11-09 22:15:11 UTC

Thank you very much, Morten and Andreas! I think the values in col D of the XLS file clearly show that the very numbers which are incorrectly parsed are only slightly bigger than the midpoint between two doubles.
It seems to me that it is impossible to state that a number is "even", i.e. located right in the middle between two doubles, without checking infinitely many bits... I'd rather round up if the 54th bit was 1, but I'm not IEEE. 

I used gcc_x32 4.6.1 on Win7x64 without any complier switches and that version got it right.

All the extended and hex stuff in gnumeric was done using my Python extension functions once annouced on the gnumeric mailing list. In Excel, I used the FOSS addin xNumbers v6.

Comment 11 Andreas J. Guelzow 2011-11-09 23:33:46 UTC

This raises the question of which version of gcc were used that did not get it right. Perhaps the issue is already fixed in all glibc 4.6 and newer.

Comment 12 Morten Welinder 2011-11-10 00:27:21 UTC

Neither gcc [compile-time constants; less important] nor libc [runtime via
strtod] are fixed as far as I can tell.  Nor do they have a hint of a
proposed fix.

> I used gcc_x32 4.6.1 on Win7x64 without any complier switches and that
> version got it right.

Was that a more comprehensive search or just for the values in the file?

Comment 13 schorsch_mcmlx 2011-11-10 10:10:17 UTC

Not a single deviation within my 1M sample. 

This sample uses a "uniform random" decimal mantissa between 1 and 10 and a "uniform random" exponent between -308 and 308. In case the "random" exponent was found to be +/-308, the range of mantissa values was adjusted in order to avoid normalised DP over-/underflow conditions. 
I will check whether the 46 cases listed in the XLS file are the 46 samples closest to a "tie" situation present in the 1M sample. I find it surprising that the largest exponent is +14 among the 46 deviating conversions.
The "uniform random" numbers were created using the xRandD function of xNumbers add-in.

<a href="http://en.wikipedia.org/wiki/IEEE_754_revision">On wikipedia</a> I found that in IEEE754-2008 there is a rounding mode called "round-to-nearest, ties away from zero" that would imho allow to base the rounding decision solely on the 54th bit. 
I still cannot imagine how to decide what is a "tie" using only a finite number of additional bits without violating the "round to nearest" rule with a much higher probability (all numbers that have a single out of infinitely many trailing bits set) than avoiding a bias in case of an exact tie (the single number that has none of infinitely many trailing bits set).

Comment 14 schorsch_mcmlx 2011-11-10 10:14:32 UTC

Not a single deviation within my 1M sample. 

This sample uses a "uniform random" decimal mantissa between 1 and 10 and a "uniform random" exponent between -308 and 308. In case the "random" exponent was found to be +/-308, the range of mantissa values was adjusted in order to avoid normalised DP over-/underflow conditions. 
I will check whether the 46 cases listed in the XLS file are the 46 samples closest to a "tie" situation present in the 1M sample. I find it surprising that the largest exponent is +14 among the 46 deviating conversions.
The "uniform random" numbers were created using the xRandD function of xNumbers add-in.

<a href="http://en.wikipedia.org/wiki/IEEE_754_revision">On wikipedia</a> I found that in IEEE754-2008 there is a rounding mode called "round-to-nearest, ties away from zero" that would imho allow to base the rounding decision solely on the 54th bit. 
I still cannot imagine how to decide what is a "tie" using only a finite number of additional bits without violating the "round to nearest" rule with a much higher probability (all numbers that have a single out of infinitely many trailing bits set) than avoiding a bias in case of an exact tie (the single number that has none of infinitely many trailing bits set).

Comment 15 schorsch_mcmlx 2011-11-10 11:21:01 UTC

Sorry for the double post, was caused by a loss of internet connection on the move.
---
The fact that deviations occurred only for exponents smaller than +15 is not caused by the sample. The two numbers in  the sample that have the largerst amount of zero mantissa bits between the 54-th bit and the next trailing bit that is 1 are:

x3.84845072031021156163156105719E+139
IIOOIIIOIIOIOOIIIIOOIOIIOIOIOOOIIIOIOOOIIOIOIOOOOIOOIVIOOOOOOOOOOOOOOOOOOIOIIIIOO	18x"0"; would be rounded up to "even"

x3.93696951866149878343763053445E+161
IIIOOOOOOOOOOIIOOOOIIIIOOOOOIIIIOOIOIOIIIIIOOOIOOIOOOVIOOOOOOOOOOOOOOOOOOIIOIOOIO	18x"0"; would be rounded up to "odd"

The "V" has been always inserted after the 53rd mantissa bit in order to increase "readability".

Comment 16 Morten Welinder 2011-11-10 13:38:59 UTC

You do not need infinite precision to decide ties.  If you have

   x = strtod(str,..)

and worry about whether x is the right value, then you can compute

   str2 = dtostr(x)     // exact operation

because any binary floating-point number, x, has a finite decimal string
representation.  (2^-n needs exactly n decimals.)  You can then subtract
str and str2 using the decimal arithmetic we all learned in school.
Finally you compare the difference to half of the value of the lowest
bit in x.  (There are probably some tricky special cases near powers of
2 and zero.)

This just shows it can be done, not that anyone would do it this way!
In practice, http://www.netlib.org/fp/dtoa.c seems to be what people
are pointing to.

Comment 17 schorsch_mcmlx 2011-11-10 18:59:41 UTC

Thank you again, Morten. In the libquadmath path of the gcc 4.6.1 source files tree I found a strtod_I.c file that has been changed this February. This might explain why the gcc version 4.6.1 published in June gets things right. I yet have not looked into the diffs... and to be honest, that code seems to be above my C horizon. 

We agree on the fact that IEEE doubles correspond to a finite decimal fraction. I once did a linear regression of the number of significant decimal figures (NSDF) on the leading power of 2 (LPof2) of an IEEE DP (sdev of the reg. coefficients in brackets):

If LPof2 >= 52 Then
   NSDF = 3,01035E-1(3,3E-5) · LPof2 + 7,98E-1(2,0E-2)
Else
   NSDF = -6,98974E-1(3,1E-5) · LPof2 + 5,2799E+1(1,7E-2)

For example, this yields NSDF = 767 for Pof2 = -1022 (leading exponent of smallest normalised DP) and SDF = 803 for Pof2 = -1074 (least significant bit of smallest normalised DP). So n digits are more than are sufficient to hold 2^(-n). I double checked the NSDF values by calculating them in XL/xNumbers as well as in Python using the mpmath package.

The problem imho is the conversion from a decimal fraction into a binary one, because a finite decimal fraction may correspond to an infinite binary fraction. I do not know whether number theory can show that only binary periods of finite length might occur so that a finite binary fraction contains all the information of the true infinite one.

Comment 18 Andreas J. Guelzow 2011-11-10 19:19:57 UTC

Any rational number can be written as a finite or repeating fraction independent from the base. The proof that you always get a repeating (possibly repeating 0s, ie. finite) does not depend on the base used fore representation.

Comment 19 schorsch_mcmlx 2011-11-10 22:10:59 UTC

Thank you Andreas, I now understand that it is possible to decide whether a finite decimal number is a "tie" even in the binary system using only a finite number of bits. I noticed that somewhere in the source code the number extracted from a string was transformed into a rational number having a power of 10 as  denominator. Of course, then a hardware or software implementation specified by IEEE need only operate on bits and thus is much faster than the "brute force" software implementation using extended decimal numbers which I applied to obtain the 80 bit mantissa.

Comment 20 Morten Welinder 2012-12-26 17:59:39 UTC

The newest glibc release fixes the underlying issue, see

http://sourceware.org/bugzilla/show_bug.cgi?id=3479