GNOME Bugzilla – Bug 167449
Selecting TSV format does not import file correctly
Last modified: 2005-02-18 16:15:28 UTC
Please describe the problem: I have a TSV (tab seperated file), saved with the extension .tsv. If I open it in Gnumeric, selecting "Comma or Tab seperated values (CSV/TSV)" as the format, it is imported as standard text - one row per line, no seperators, all input of the line in the first cell. I will attach the file Steps to reproduce: Actual results: Expected results: Does this happen every time? Other information:
Created attachment 37488 [details] TSV file which is not imported correctly
Confirmed. The configurable text importer can handle it, though.
oh boy, the problem seems to be in the empty rows: when we try to guess the separation character we use count_character with a quantile of 0.2. So we count the number of tabs in each line, then sort the resulting values and then choose the 20th percentile. Since for this data half the rows have no tabs, we end up with approximately no tabs. :-)
This seems problematic - if you go to a much lower percentile, you might find the letter "a" as a seperator in this file. Can Gnumeric take a hint from the file extension? This file had an extension specifically as ".tsv", which might be able to give Gnumeric a hint if the mathematical guessing doesn't work. Similar situation for CSV. After all, I selected the CSV/TSV import - I'm guessing that a mathematcal analysis can't decide between them, but as the file ends in .tsv, perhaps this way can? In this file format, Gnumeric only needs to decide between two seperators, right?
Of course it should be able to figure out `\t' as the separator. `a' isn't even ever in competition so that's not really an issue. Considering what all is advertised as a tsv or csv file, it is tricky to depend on the extension.
When I mentioned 'a', I did so because the letter repeats quite a lot in my file, that's all. I suggested relying on the extension if and *only* if the heuristics didn't guess any seperator. If there are a lot of strange files there that end with ".tsv", I guess this idea won't work.
2005-02-18 Morten Welinder <terra@gnome.org> * src/stf-parse.c (count_character): Ignore completely empty lines. They aren't telling us anything about separators.