After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 161403 - CSV import is broken with french locale
CSV import is broken with french locale
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: import/export Text
1.4.x
Other Linux
: Normal normal
: ---
Assigned To: Jody Goldberg
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2004-12-15 19:33 UTC by Laurent Martelli
Modified: 2008-09-19 04:39 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Sample CSV file (1.00 KB, text/plain)
2004-12-16 02:23 UTC, Laurent Martelli
  Details
Trivial patch to use ; when exporting to a locale that uses , as a numeric separator (742 bytes, patch)
2004-12-21 15:34 UTC, Dave Neary
none Details | Review

Description Laurent Martelli 2004-12-15 19:33:58 UTC
My locale is fr_FR@euro. If I open a csv text file, the commas are not used as
fields separators.
Comment 1 Morten Welinder 2004-12-15 22:09:08 UTC
Please show the output of running "locale".
And please supply a sample file and let us know *how* you import it.

Thanks.
Comment 2 Laurent Martelli 2004-12-16 02:16:43 UTC
laurent@stan:~/tmp$ locale
LANG=fr_FR@euro
LC_CTYPE="fr_FR@euro"
LC_NUMERIC="fr_FR@euro"
LC_TIME="fr_FR@euro"
LC_COLLATE="fr_FR@euro"
LC_MONETARY="fr_FR@euro"
LC_MESSAGES="fr_FR@euro"
LC_PAPER="fr_FR@euro"
LC_NAME="fr_FR@euro"
LC_ADDRESS="fr_FR@euro"
LC_TELEPHONE="fr_FR@euro"
LC_MEASUREMENT="fr_FR@euro"
LC_IDENTIFICATION="fr_FR@euro"
LC_ALL=fr_FR@euro

To import, I used "File/Open" from the menu. A sample file will follow.
Comment 3 Laurent Martelli 2004-12-16 02:23:32 UTC
Created attachment 34883 [details]
Sample CSV file

The file is UTF-8 encoded, but it does not seem to be the problem.
Comment 4 Morten Welinder 2004-12-16 14:34:07 UTC
This works for me in C locale.  (I don't have fr_FR installed.)
By default it seems to guess the wrong encoding, but if you select the
configurable text importer as "file type", you can set override that.

Someone with more locales available is going to have to debug this.
Comment 5 Dave Neary 2004-12-21 10:04:07 UTC
Confirmed with version 1.2.13. 

I don't have a 1.4 available to test with that.

All of the data ends up in column 1, with commas in the data fields. I have
tested this from scratch by creating a streadsheet, entering some data, dsaving
both in GNUMEric's XML format and in .csv, the file opens fine with C (it loses
cell formatting information, but that is normal).

Comment 6 Andreas J. Guelzow 2004-12-21 14:11:05 UTC
I can replicate this with locale de_DE. 

I think we discussed this issue before (unfortuntely I couldn't find the bug
report). It seems that gnumeric may be looking for ';' as field separator and
then falls back to ' '.
Comment 7 Laurent Martelli 2004-12-21 14:18:47 UTC
I guess te comma is not used because it's supposed to used as the decimal separator.

It would be nice to be able to configure field and decimal separators when you
import text files.
Comment 8 Dave Neary 2004-12-21 15:07:23 UTC
The problem is here:

format_get_arg_sep () in format.c:
228     char
229     format_get_arg_sep (void)
230     {
231             if (format_get_decimal ()->str[0] == ',')
232                     return ';';
233             return ',';
234     }


So since the decimal separator in French is , the csv separator should be ;

But when writing the cvs file, even in French, , is used as the separator.

In stf_write_csv () in stf.c, we have 
437             stf_export_options_set_terminator_type (config,
TERMINATOR_TYPE_LINEFEED);
438             stf_export_options_set_cell_separator  (config, ',');
439             stf_export_options_set_quoting_mode    (config, QUOTING_MODE_AUTO);
440             stf_export_options_set_quoting_char    (config, '"');

In short, a bunch of quoting, separating and line terminating characters are set
one time for all.

It's hard to see how to fix this proberly though - one step forward would be to
consistently write the same separator character within the same language. You
get that by changing line 438 above to 

            stf_export_options_set_cell_separator  (config, format_get_arg_sep ());

but doing it in a way that allows proper interchangeability of the files (which
I guess isn't really an issue) requires specifying the separator to use in a
file header or something (I think this is what MS Excel CSV files do).


Comment 9 Dave Neary 2004-12-21 15:27:38 UTC
I guess I didn't say the "format_get_arg_sep" function is called to find out
what separator to use for fields when loading a CSV file.

Comment 10 Dave Neary 2004-12-21 15:34:33 UTC
Created attachment 35085 [details] [review]
Trivial patch to use ; when exporting to a locale that uses , as a numeric separator


As a workaround, the original author can use the text import rather than the
autodetect feature, and choose , as the field separator.

Dave.
Comment 11 Andreas J. Guelzow 2004-12-27 16:54:25 UTC
I am not sure that patching to use ";" is the right way to do this. CSV seems to
mean "comma separated", and in fac tthe translation in de_DE specifies "Komma"
rather than "Semikolon". I think we should probably always use "," for csv. (It
just means that decimal numbers will always be surrounded by "".
Comment 12 Cyril Humbert 2005-04-13 09:53:30 UTC
In fact, using ";" as a column separator is a good thing because :

a. gnumeric could read again a previously saved csv table. 
Currently, create a new table, save it as CSV and then reload as 
CSV doesn't work if LC_NUMERIC="fr_FR".

b. it will ease data exchange with other programs,like R
(http://www.r-project.org/), which handle the most
two common CSV variants :
"." as decimal separator + "," as column separator
"," as decimal separator + ";" as column separator.
Comment 13 Andreas J. Guelzow 2005-04-13 12:53:45 UTC
As long as we assume the same field separator (a) will work.

FOr data interchange with other programs one should probably use the
configurable test importer anyways.

THs csv import just guesses at the right settings.
Comment 14 Jon Kåre Hellan 2005-04-14 07:14:35 UTC
Fixed typo in subject
Comment 15 Andreas J. Guelzow 2006-06-27 23:21:54 UTC
This byg is open much to long! We should import and export with the same character (assuming the same locale). Since the translation to German for csv is "komma-separated..." I am going to change this to always use a comma unless somebody protests.  
Comment 16 jjvenkit 2006-10-17 20:43:53 UTC
this CVS comma-versus-period-versus-semicolon issue still remains in gnumeric 1.6.3.

i run in this locale:
$ locale
LANG=fr_CA.UTF-8
LANGUAGE=fr_CA.UTF-8
LC_CTYPE="fr_CA.UTF-8"
LC_NUMERIC="fr_CA.UTF-8"
LC_TIME="fr_CA.UTF-8"
LC_COLLATE="fr_CA.UTF-8"
LC_MONETARY="fr_CA.UTF-8"
LC_MESSAGES="fr_CA.UTF-8"
LC_PAPER="fr_CA.UTF-8"
LC_NAME="fr_CA.UTF-8"
LC_ADDRESS="fr_CA.UTF-8"
LC_TELEPHONE="fr_CA.UTF-8"
LC_MEASUREMENT="fr_CA.UTF-8"
LC_IDENTIFICATION="fr_CA.UTF-8"
LC_ALL=

saving a CSV file from the fr_CA locale uses commas and the separator and thus i cannot open the CSV file i just created.  further, if i open a CSV file from someone who was running in english (en_CA for example) gnumeric doesn't parse the data properly.

perhaps this is related to bug #359269.
Comment 17 J.H.M. Dassen (Ray) 2008-08-29 17:48:48 UTC
Is this problem still reproducible for someone with gnumeric 1.8.x or newer?

Testing with the sample file (comment #3) using gnumeric 1.8.3 on Debian sid
through
	env LC_ALL=fr_FR.UTF-8 gnumeric /tmp/metiers.csv
(or fr_FR@euro or de_DE), the commas are being used as field separators.
Comment 18 Dave Neary 2008-09-01 18:33:51 UTC
No idea - haven't tried it in ages.

Dave.
Comment 19 jjvenkit 2008-09-02 18:24:15 UTC
i gave this a try with the sample file from comment #3.  under a french locale (fr_CA in my case), gnumeric 1.8.2 (1.8.2-1ubuntu1 under ubuntu 8.04) opens the file correctly and saves the file correctly.  the resulting file, the one just saved, will open correctly too.

if i change the ++ data in the sample file with gnumeric to a number such as 1,1 (that's one and one-tenth since french uses commas for the decimal marker) and then save the file as a csv file, the number is stored as "1,1".  this is fine since gnumeric opens the file correctly and interprets the "1,1" as the number 1,1.

the only remaining tricky issue is how to open that file with "1,1" when running an english locale and have the "1,1" interpreted as "1.1".

jason.
Comment 20 Andreas J. Guelzow 2008-09-02 19:36:08 UTC
If you need to switch locales you can (must) use the configurable text importer. There you can select the resource locale and everything should be converted correctly. 

There is no way to do this automatically since even in a locale using period as decimal separator the string "1,1" might be used (and clearly "1,234"  could mean various things depending on locale and convention).
Comment 21 Andreas J. Guelzow 2008-09-19 04:39:50 UTC
>> the only remaining tricky issue is how to open that file with "1,1" when
running an english locale and have the "1,1" interpreted as "1.1". <<

this is trivial with the configurable imtext importer. Just select a locale with , as decimal point.