After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 359269 - Import csv mixes up fields
Import csv mixes up fields
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: import/export Text
git master
Other All
: Normal normal
: ---
Assigned To: Morten Welinder
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2006-10-03 13:02 UTC by samverstraete
Modified: 2006-10-18 16:01 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description samverstraete 2006-10-03 13:02:10 UTC
when importing a cvs file with corrupted headers gnumeric seems to mix fields. Please see 
http://merlin.ugent.be/gamv_ver.csv for an example

I'm not sure about this but for me a csv file is just a comma separated file:
one comma separates one column from the other, new line means we should take a new row.
Comment 1 Morten Welinder 2006-10-03 13:22:57 UTC
"10","5.000","1.17969","4","-0.77303","-0.77303"
"11","5.426","0.47679","6","-0.62350","-0.62350"
"12","6.003","1.24442","12","-0.83859","-0.83859"
Covariance","tail:Normal","Score","head:Normal","Score","direction","1",""
"1","0.000","0.00370","532","-0.00043","-0.00043"
"2","0.000","0.00000","0","0.00000","0.00000"

The file is decided broken -- odd number of quotes on some lines -- but
Gnumeric's reaction to that might not be the best.

Note: an odd number of quotes means that Gnumeric cannot know what is
inside and what is outside quotes.  That's why the string in A1 is a
mile long.
Comment 2 Morten Welinder 2006-10-03 13:36:37 UTC
Argh.  That code sure looks like it needs some tender loving care.
Comment 3 Andreas J. Guelzow 2006-10-03 15:32:09 UTC
PLease note that there is important to be able to include commas inside apostrophes so that these commas are not considered field separators. Similarly a newline should be able to be included inside a quoted string.
Comment 4 Morten Welinder 2006-10-03 15:39:23 UTC
Andreas: right.  But the current code does really crazy thing while
tokenizing.  When it sees

    foo","bar",

it makes the tokens

    f o o "," ...

i.e., it thinks that the quote after foo starts a string.  I'm pretty certain
that if an item did not start with a quote, we should not consider quotes
special.

And don't get me started on the way it deals with double quotes!
Comment 5 Andreas J. Guelzow 2006-10-03 17:34:02 UTC
Morten:
I am not defending the original code but I believe to recall a discussion that
foo","bar
was supposed to be equivalent to
"foo,bar"
Personally I don't see the point in that, but the
foo","bar
is either nonsense or interpretable as
"foo,bar"
of course
"foo""bar"
in that case looks like
"foobar"
when it probably should contain a literal ".
Comment 6 Morten Welinder 2006-10-03 18:39:08 UTC
The closest thing to a normative reference for csv I can find is...

    http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm

It seems to say, that if you wanted the seven character item foo,bar then
you must enclose the whole thing in quotes.  If you use foo","bar then you
get a nine character item with internal quotes.
Comment 7 jjvenkit 2006-10-17 20:45:11 UTC
perhaps this is related to bug #161403.

...

i run in this locale:
$ locale
LANG=fr_CA.UTF-8
LANGUAGE=fr_CA.UTF-8
LC_CTYPE="fr_CA.UTF-8"
LC_NUMERIC="fr_CA.UTF-8"
LC_TIME="fr_CA.UTF-8"
LC_COLLATE="fr_CA.UTF-8"
LC_MONETARY="fr_CA.UTF-8"
LC_MESSAGES="fr_CA.UTF-8"
LC_PAPER="fr_CA.UTF-8"
LC_NAME="fr_CA.UTF-8"
LC_ADDRESS="fr_CA.UTF-8"
LC_TELEPHONE="fr_CA.UTF-8"
LC_MEASUREMENT="fr_CA.UTF-8"
LC_IDENTIFICATION="fr_CA.UTF-8"
LC_ALL=

saving a CSV file from the fr_CA locale uses commas and the separator and thus i cannot open the CSV file i just created.  further, if i open a CSV file from someone who was running in english (en_CA for example) gnumeric doesn't parse the data properly.
Comment 8 Morten Welinder 2006-10-18 01:27:04 UTC
No, that's a different problem.  (Namely that "csv" is not a well defined
standard.)

This problem is that we don't react the right way when seeing a clearly
bogus file such as http://merlin.ugent.be/gamv_ver.csv
Comment 9 Morten Welinder 2006-10-18 16:01:35 UTC
Fixed in the development version. The fix will be available in the next major release. Thank you for your bug report.

In the example from above, there will be a cell containing Covariance"
including the terminating quote.

This also fixes the problem that space-trimming (as selected from the format
page in the gui) removed spaces inside quotes.