GNOME Bugzilla – Bug 359269
Import csv mixes up fields
Last modified: 2006-10-18 16:01:35 UTC
when importing a cvs file with corrupted headers gnumeric seems to mix fields. Please see http://merlin.ugent.be/gamv_ver.csv for an example I'm not sure about this but for me a csv file is just a comma separated file: one comma separates one column from the other, new line means we should take a new row.
"10","5.000","1.17969","4","-0.77303","-0.77303" "11","5.426","0.47679","6","-0.62350","-0.62350" "12","6.003","1.24442","12","-0.83859","-0.83859" Covariance","tail:Normal","Score","head:Normal","Score","direction","1","" "1","0.000","0.00370","532","-0.00043","-0.00043" "2","0.000","0.00000","0","0.00000","0.00000" The file is decided broken -- odd number of quotes on some lines -- but Gnumeric's reaction to that might not be the best. Note: an odd number of quotes means that Gnumeric cannot know what is inside and what is outside quotes. That's why the string in A1 is a mile long.
Argh. That code sure looks like it needs some tender loving care.
PLease note that there is important to be able to include commas inside apostrophes so that these commas are not considered field separators. Similarly a newline should be able to be included inside a quoted string.
Andreas: right. But the current code does really crazy thing while tokenizing. When it sees foo","bar", it makes the tokens f o o "," ... i.e., it thinks that the quote after foo starts a string. I'm pretty certain that if an item did not start with a quote, we should not consider quotes special. And don't get me started on the way it deals with double quotes!
Morten: I am not defending the original code but I believe to recall a discussion that foo","bar was supposed to be equivalent to "foo,bar" Personally I don't see the point in that, but the foo","bar is either nonsense or interpretable as "foo,bar" of course "foo""bar" in that case looks like "foobar" when it probably should contain a literal ".
The closest thing to a normative reference for csv I can find is... http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm It seems to say, that if you wanted the seven character item foo,bar then you must enclose the whole thing in quotes. If you use foo","bar then you get a nine character item with internal quotes.
perhaps this is related to bug #161403. ... i run in this locale: $ locale LANG=fr_CA.UTF-8 LANGUAGE=fr_CA.UTF-8 LC_CTYPE="fr_CA.UTF-8" LC_NUMERIC="fr_CA.UTF-8" LC_TIME="fr_CA.UTF-8" LC_COLLATE="fr_CA.UTF-8" LC_MONETARY="fr_CA.UTF-8" LC_MESSAGES="fr_CA.UTF-8" LC_PAPER="fr_CA.UTF-8" LC_NAME="fr_CA.UTF-8" LC_ADDRESS="fr_CA.UTF-8" LC_TELEPHONE="fr_CA.UTF-8" LC_MEASUREMENT="fr_CA.UTF-8" LC_IDENTIFICATION="fr_CA.UTF-8" LC_ALL= saving a CSV file from the fr_CA locale uses commas and the separator and thus i cannot open the CSV file i just created. further, if i open a CSV file from someone who was running in english (en_CA for example) gnumeric doesn't parse the data properly.
No, that's a different problem. (Namely that "csv" is not a well defined standard.) This problem is that we don't react the right way when seeing a clearly bogus file such as http://merlin.ugent.be/gamv_ver.csv
Fixed in the development version. The fix will be available in the next major release. Thank you for your bug report. In the example from above, there will be a cell containing Covariance" including the terminating quote. This also fixes the problem that space-trimming (as selected from the format page in the gui) removed spaces inside quotes.