GNOME Bugzilla – Bug 615777
Gnumeric mangles dates from Excel .xlsx files
Last modified: 2010-04-15 04:42:38 UTC
Created attachment 158750 [details] Spreadsheet showing the problem I've been given an excel file in .xlsx format, and some dates are off by about four years. See the attached file: on worksheet 1 (Soil sample), cells D3:D28; the dates should be in 2007, but instead appear as 2003. Oddly enough, the dates on worksheet 2 (DW-Extraction-Quantification), cells G2:I29 have the correct year. Opening the file with Openoffice 3.1.1 gives the correct dates. This is on a Fedora 11 system, which is kept up-to-date with yum nightly. The rpm is gnumeric-1.8.4-5.fc11.i586.rpm.
Are you sure that Openoffice gives the correct dates? With OOo 3.1.1 I see: on "Soil Samples" D3: 22/10/2007 on "...Extraction..." G2: 11-Aug-11 while current Gnumeric yields: on "Soil Samples" D3: 10-21-03 on "...Extraction..." G2: 21-Aug-07 In both sstems the serial numbers of those cells are 37915 and 39315 respectively. So the difference is the null date (ie. the date with serial number 0.) Considering that the file really stores the serial numbers and I suspect that these dates are supposed to be in th e2007, Excel appears to use different null dates on different sheets (perhaps these sheets originated on different machines.) OOo uses a null date for these files of 1994/1/1 while gnumeric uses a null date of 1899/12/31 with a skip of 1 after 1900/2/28. Note that by default OOo uses a null date of 1899/12/30 but adjusts it for these files. Gnumeric's adjustment is not triggered.
Note that rather than 11-Aug-11 it should have been 22-Aug-11. If I open this file in Excel 2007 I get the same dates as OOo. Are those 2011 dates correct?
To cite from the "Open Office XML" standard: " 3.16.9.1 Date Representation (...) There are two different bases for serial values: In the 1900 date base system, the lower limit is January 1, 1900, which has serial value 1. The upper-limit is December 31, 9999, which has serial value 2,958,465. In the 1904 date base system, the lower limit is January 1, 1904, which has serial value 0. The upper-limit is December 31, 9999, which has serial value 2,957,003. As to which date base system an implementation uses by default or whether it allows its users to switch between date base systems, is unspecified. " So unless there is some specification in the file of which date system is used (which I haven't found yet), this should be considered an "Open Office XML" bug.
Okay: Part 4/3.2.28 of "Open Office XML" gives a workbook property: date1904 (Date 1904) which defaults to FALSE. If it is set to TRUE we should interpret serial numbers relative to 1904/1/1. We apparently are ignoring this setting for this document.
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.