GNOME Bugzilla – Bug 128936
Text Export should permit selection of number, cell, etc formats
Last modified: 2005-04-05 01:35:06 UTC
1. Enter a valid date in a cell 2. Change the format to Date (mine is mm/dd/yyyy) 3. Save As CSV (Comma Separated Values) 4. Open file in a text editor 5. Voila, you get an "answer" instead of a Date Gnumeric should export Date format cells as strings, not mathematical equations. For "General" cells, perhaps the current behavior is acceptable... I don't know what Excel does. Thanks for all the hard work guys, -s
Created attachment 22278 [details] gnumeric format original version
Created attachment 22279 [details] CSV result of date problem
If your date is 12/12/2003 your `answer' is 37967 which is not the result of calculating that expression but simply the normal way to store that day (number of days since January 1, 1900).
I don't think this is a bug, since what are the options? We could export the files using the displayed (formatted) strings, but hten we run into the problem that we may loose information (eg. a date formatted as mm/yyyy would lose the day, numbers may loose precision). Perhaps this is really an enhancement request that users should be able to choose to export the formatted strings rather than the raw numbers.
Well, the main problems with the current behavior are: 1. Invalid dates, such as "08/00/2004" are, obviously, left as strings. Yes, I have bad data like that I have to deal with. It *SUCKS*. 2. It breaks reimportation of CSV data. "08/04/2003" is more likely to be understood by a parser as a date without user intervention. That's my impression, anyway. Thanks for replying so promptly!
I just noticed that it does the conversion, even if you're using .txt export! Ouch! Even if that's the default for CSV, please allow the configurable Text Export to export dates as string. Thank you!
Note that OOo's behavior is to export dates as strings. Just tried it, so I have a temporary solution -- but I still like Gnumeric more than OOo. ;)
I imagine we could export as 10-Dec-2003 for any date format. I wonder what OO does, precisely. I don't see exporting as 12/10/2003 as an option. That is bound to be misinterpreted. =DATE(2003,12,10) might work too. For non-integers (or large values) for which such a date format would not produce exact results, we might need to use the current number format.
I just meant that Gnumeric should export the date "as entered" unless told to do otherwise. Perhaps in the text export screen, a dropdown with "date export format" would be helpful.
"As entered" can cause lots of problems: Imagine I am running gnumeric in locale en_CA and enter 22/1/1959. I'll save the file and later open it in locale C to enter more data, let's say 1/22/1959. When I then export, sould the exported file really contain both 22/1/1959 and 1/22/1959? (Of course practically that doesn't work anyways since we don't remember what was typed in since the parsing depends on things like locale.) Personally I would like the possibility of selecting the output format as we do when we import.
Andreas : agreed. Selecting export formats would be a nice enhancement. In the interim I don't see alot of choices other than what we're doing now.
*** Bug 129398 has been marked as a duplicate of this bug. ***
i just posted (accidentally) a duplicate (oops, bugzilla doesn't like plurals 'date' != 'dates') and I have one thing to add. Excel exports dates as 'mm/dd/yyyy'. I think Gnumeric should do this too. Pehaps the locale could be consulted instead. But the current behaviour is USELESS. Oracle, for example, doesn't have a date conversion for this 'days since 1899' crap AFAIK, which makes using the data very difficult for real uses.
I have added a checkbutton tot eh export dialog that when selected causes the export to use the rendered strings as shown on screen. This gives the user the option to export in the format of their choice. It would still be nice to be able to select the formats directly in the export druid so I am keeping this open as an enhancement.
*** Bug 133991 has been marked as a duplicate of this bug. ***
*** Bug 152203 has been marked as a duplicate of this bug. ***
I reported the latest dup of this bug. I think if any format were to be chosen as a "default" for text export, it should be the ISO standard of yyyy/mm/dd. The reason for this is that it is impossible to confuse the month and day in this format, because *nobody* uses yyyy/dd/mm. We can thank America [ :o) ] for the idiosyncratic mm/dd/yyyy (which doesn't make any lexicographic sense at all -- only dd/mm/yyyy and yyyy/mm/dd really are logical). Plus the ISO standard keeps the month and day in the same order as the US ordering of mm/dd, meaning that America can continue to be blissfully unaware of why mm/dd/yyyy is illogical, as long as they can get used to seeing the year written first! The ISO date system also specifies that year will always use 4 digits, preventing any Y3k problems. The final, and most important reason why ISO dates make sense is that they sort into chronological order using a standard lexicographic (dictionary) sort ordering. That means that if your first field is a date, you can use emacs or commandline 'sort' or some other tool to sort the lines of your CSV file, and they will actually end up in the correct chronological order. No other date system works like that. However, having said that, Andreas mentioned in my dup that you can just check "preserve formats" in Text Export, and dates are written correctly. I think that that may be a useful option to be able to *disable*, but I don't think it makes sense to have this disabled by default -- especially for CSV, which has a set of sensible and fairly-standard defaults already so that you don't have to set any options in the Text Export dialog. It would be nice if Save As CSV, and perhaps also Text Export, preserved formats by default -- just because text-only formats are typically used to store values and not formulae, so they are more presentation-oriented.
There is a reason for defaulting to not preserving formats: Preserving the formats used on the sheet may cause you to loose information. For example if the display format is mm/dd then you will loose the year info.
Not to start a flame war, but the American style of numeric dates evolved directly from the textual "December Tenth, Nineteen Forty-one" -> "Dec 10, 1941" -> "12/10/1941". So it is perfectly logical in a linguistic context. Furthermore, many databases, such as Informix, default to MM/DD/YYYY. This is why I need data to work in that format. No, I can't go around changing 20 years of proprietary code to "fix" the "problem", much as I'd like to. (Actually, I'd like to dump the system altogether, but that's even more off-topic...)
There is a reason why the rest of the world says "the 10th [day] of December, 1941". Anyway, a default format of mm/dd/yyyy is plain and simple not an option. That means immediate data loss for the dd/mm/yyyy people. And vice versa. If you need either, specify it.
Right, and the problem is that as of today, you can't specify that. :)
Hmm, question -- what does Informix do if it gets data in yyyy/mm/dd format? I just tested Gnumeric, and it handles it fine (i.e. automatically parses it correctly as a date). I would guess that Excel does too. If Informix has had a release in the last 5 years that increased standardization of things like dates, it probably handles it too. As I pointed out, the ISO standard is always unambiguous, and it is very easy to detect that you have data in ISO standard date format.
Our ancient version of Informix has to be told via environment variable DBDATE what date format is being used. This gets incredibly hairy when doing Informix -> CSV -> Spreadsheet -> CSV -> Informix when you have bad data as well as conversion issues. This is why I would prefer a nice text export for mm/dd/yyyy instead of days since 1900.
Stephen, What do you mean with "as of today, you can't specify that.". Format the date on the sheet in the format of your choice, then export using the configurable export and check `preserve formats', and you get what you want.
I just confirmed that this works. Ticket can be closed. Thanks everyone. :)
Nope, we should keep it open: ideally you should be able to select the used cell format in the export dialog and the formats should be classified as lossy or not, so if a column uses a non-lossy format it can become the default.
Right now we have two options: 1. Raw numbers; dates as 34567. <--- default 2. Formatted. I am tempted to change this to: 1. Raw numbers. 2. Automatic. <--- default 3. Formatted. "Auto" would re-format date/time-formattted cells in "yyyy/mm/dd" or "yyyy/mm/dd hh:mm:ss.s" format as appropritate.
So "Automatic" would use the presentation format if it is non-lossy, and a default (standards-compliant) format if it is lossy? Looks like a good solution to me, although it only really makes sense if "Automatic" is the default. That way, both people that explicitly want data in the presentation format and those that want the raw numbers have to explicitly choose a non-default option, while those that just want their data out in a sensible format that they can parse (I would guess, most people in most situations) can get it by default. By the way Excel doesn't have a way of specifying formatting options when saving to a CSV, and it has caused a great deal of havoc where I work (dates seemingly randomly get converted to integers, but it doesn't even seem to happen consistently; regardless, every time it does happen, it renders the date useless for our purposes).
Fixed in cvs.