GNOME Bugzilla – Bug 161403
CSV import is broken with french locale
Last modified: 2008-09-19 04:39:50 UTC
My locale is fr_FR@euro. If I open a csv text file, the commas are not used as fields separators.
Please show the output of running "locale". And please supply a sample file and let us know *how* you import it. Thanks.
laurent@stan:~/tmp$ locale LANG=fr_FR@euro LC_CTYPE="fr_FR@euro" LC_NUMERIC="fr_FR@euro" LC_TIME="fr_FR@euro" LC_COLLATE="fr_FR@euro" LC_MONETARY="fr_FR@euro" LC_MESSAGES="fr_FR@euro" LC_PAPER="fr_FR@euro" LC_NAME="fr_FR@euro" LC_ADDRESS="fr_FR@euro" LC_TELEPHONE="fr_FR@euro" LC_MEASUREMENT="fr_FR@euro" LC_IDENTIFICATION="fr_FR@euro" LC_ALL=fr_FR@euro To import, I used "File/Open" from the menu. A sample file will follow.
Created attachment 34883 [details] Sample CSV file The file is UTF-8 encoded, but it does not seem to be the problem.
This works for me in C locale. (I don't have fr_FR installed.) By default it seems to guess the wrong encoding, but if you select the configurable text importer as "file type", you can set override that. Someone with more locales available is going to have to debug this.
Confirmed with version 1.2.13. I don't have a 1.4 available to test with that. All of the data ends up in column 1, with commas in the data fields. I have tested this from scratch by creating a streadsheet, entering some data, dsaving both in GNUMEric's XML format and in .csv, the file opens fine with C (it loses cell formatting information, but that is normal).
I can replicate this with locale de_DE. I think we discussed this issue before (unfortuntely I couldn't find the bug report). It seems that gnumeric may be looking for ';' as field separator and then falls back to ' '.
I guess te comma is not used because it's supposed to used as the decimal separator. It would be nice to be able to configure field and decimal separators when you import text files.
The problem is here: format_get_arg_sep () in format.c: 228 char 229 format_get_arg_sep (void) 230 { 231 if (format_get_decimal ()->str[0] == ',') 232 return ';'; 233 return ','; 234 } So since the decimal separator in French is , the csv separator should be ; But when writing the cvs file, even in French, , is used as the separator. In stf_write_csv () in stf.c, we have 437 stf_export_options_set_terminator_type (config, TERMINATOR_TYPE_LINEFEED); 438 stf_export_options_set_cell_separator (config, ','); 439 stf_export_options_set_quoting_mode (config, QUOTING_MODE_AUTO); 440 stf_export_options_set_quoting_char (config, '"'); In short, a bunch of quoting, separating and line terminating characters are set one time for all. It's hard to see how to fix this proberly though - one step forward would be to consistently write the same separator character within the same language. You get that by changing line 438 above to stf_export_options_set_cell_separator (config, format_get_arg_sep ()); but doing it in a way that allows proper interchangeability of the files (which I guess isn't really an issue) requires specifying the separator to use in a file header or something (I think this is what MS Excel CSV files do).
I guess I didn't say the "format_get_arg_sep" function is called to find out what separator to use for fields when loading a CSV file.
Created attachment 35085 [details] [review] Trivial patch to use ; when exporting to a locale that uses , as a numeric separator As a workaround, the original author can use the text import rather than the autodetect feature, and choose , as the field separator. Dave.
I am not sure that patching to use ";" is the right way to do this. CSV seems to mean "comma separated", and in fac tthe translation in de_DE specifies "Komma" rather than "Semikolon". I think we should probably always use "," for csv. (It just means that decimal numbers will always be surrounded by "".
In fact, using ";" as a column separator is a good thing because : a. gnumeric could read again a previously saved csv table. Currently, create a new table, save it as CSV and then reload as CSV doesn't work if LC_NUMERIC="fr_FR". b. it will ease data exchange with other programs,like R (http://www.r-project.org/), which handle the most two common CSV variants : "." as decimal separator + "," as column separator "," as decimal separator + ";" as column separator.
As long as we assume the same field separator (a) will work. FOr data interchange with other programs one should probably use the configurable test importer anyways. THs csv import just guesses at the right settings.
Fixed typo in subject
This byg is open much to long! We should import and export with the same character (assuming the same locale). Since the translation to German for csv is "komma-separated..." I am going to change this to always use a comma unless somebody protests.
this CVS comma-versus-period-versus-semicolon issue still remains in gnumeric 1.6.3. i run in this locale: $ locale LANG=fr_CA.UTF-8 LANGUAGE=fr_CA.UTF-8 LC_CTYPE="fr_CA.UTF-8" LC_NUMERIC="fr_CA.UTF-8" LC_TIME="fr_CA.UTF-8" LC_COLLATE="fr_CA.UTF-8" LC_MONETARY="fr_CA.UTF-8" LC_MESSAGES="fr_CA.UTF-8" LC_PAPER="fr_CA.UTF-8" LC_NAME="fr_CA.UTF-8" LC_ADDRESS="fr_CA.UTF-8" LC_TELEPHONE="fr_CA.UTF-8" LC_MEASUREMENT="fr_CA.UTF-8" LC_IDENTIFICATION="fr_CA.UTF-8" LC_ALL= saving a CSV file from the fr_CA locale uses commas and the separator and thus i cannot open the CSV file i just created. further, if i open a CSV file from someone who was running in english (en_CA for example) gnumeric doesn't parse the data properly. perhaps this is related to bug #359269.
Is this problem still reproducible for someone with gnumeric 1.8.x or newer? Testing with the sample file (comment #3) using gnumeric 1.8.3 on Debian sid through env LC_ALL=fr_FR.UTF-8 gnumeric /tmp/metiers.csv (or fr_FR@euro or de_DE), the commas are being used as field separators.
No idea - haven't tried it in ages. Dave.
i gave this a try with the sample file from comment #3. under a french locale (fr_CA in my case), gnumeric 1.8.2 (1.8.2-1ubuntu1 under ubuntu 8.04) opens the file correctly and saves the file correctly. the resulting file, the one just saved, will open correctly too. if i change the ++ data in the sample file with gnumeric to a number such as 1,1 (that's one and one-tenth since french uses commas for the decimal marker) and then save the file as a csv file, the number is stored as "1,1". this is fine since gnumeric opens the file correctly and interprets the "1,1" as the number 1,1. the only remaining tricky issue is how to open that file with "1,1" when running an english locale and have the "1,1" interpreted as "1.1". jason.
If you need to switch locales you can (must) use the configurable text importer. There you can select the resource locale and everything should be converted correctly. There is no way to do this automatically since even in a locale using period as decimal separator the string "1,1" might be used (and clearly "1,234" could mean various things depending on locale and convention).
>> the only remaining tricky issue is how to open that file with "1,1" when running an english locale and have the "1,1" interpreted as "1.1". << this is trivial with the configurable imtext importer. Just select a locale with , as decimal point.