After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 167449 - Selecting TSV format does not import file correctly
Selecting TSV format does not import file correctly
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: import/export Text
git master
Other All
: Normal minor
: ---
Assigned To: Jody Goldberg
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2005-02-15 10:05 UTC by Uri David Akavia
Modified: 2005-02-18 16:15 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
TSV file which is not imported correctly (74.35 KB, text/plain)
2005-02-15 10:06 UTC, Uri David Akavia
Details

Description Uri David Akavia 2005-02-15 10:05:06 UTC
Please describe the problem:
I have a TSV (tab seperated file), saved with the extension .tsv.
If I open it in Gnumeric, selecting "Comma or Tab seperated values (CSV/TSV)" as
the format, it is imported as standard text - one row per line, no seperators,
all input of the line in the first cell.

I will attach the file

Steps to reproduce:


Actual results:


Expected results:


Does this happen every time?


Other information:
Comment 1 Uri David Akavia 2005-02-15 10:06:02 UTC
Created attachment 37488 [details]
TSV file which is not imported correctly
Comment 2 Morten Welinder 2005-02-15 14:38:32 UTC
Confirmed.  The configurable text importer can handle it, though.
Comment 3 Andreas J. Guelzow 2005-02-15 23:43:12 UTC
oh boy, the problem seems to be in the empty rows:

when we try to guess the separation character we use count_character with a
quantile of 0.2. So we count the number of tabs in each line, then sort the
resulting values and then choose the 20th percentile. 

Since for this data half the rows have no tabs, we end up with approximately no
tabs.

:-)

Comment 4 Uri David Akavia 2005-02-16 08:30:59 UTC
This seems problematic - if you go to a much lower percentile, you might find
the letter "a" as a seperator in this file.
Can Gnumeric take a hint from the file extension? This file had an extension
specifically as ".tsv", which might be able to give Gnumeric a hint if the
mathematical guessing doesn't work. Similar situation for CSV.

After all, I selected the CSV/TSV import - I'm guessing that a mathematcal
analysis can't decide between them, but as the file ends in .tsv, perhaps this
way can? In this file format, Gnumeric only needs to decide between two
seperators, right?
Comment 5 Andreas J. Guelzow 2005-02-16 13:45:08 UTC
Of course it should be able to figure out `\t' as the separator. `a' isn't even
ever in competition so that's not really an issue. Considering what all is
advertised as a tsv or csv file, it is tricky to depend on the extension.
Comment 6 Uri David Akavia 2005-02-16 13:57:38 UTC
When I mentioned 'a', I did so because the letter repeats quite a lot in my
file, that's all.

I suggested relying on the extension if and *only* if the heuristics didn't
guess any seperator. If there are a lot of strange files there that end with
".tsv", I guess this idea won't work.
Comment 7 Morten Welinder 2005-02-18 16:15:28 UTC
2005-02-18  Morten Welinder  <terra@gnome.org>

	* src/stf-parse.c (count_character): Ignore completely empty
	lines.  They aren't telling us anything about separators.