GNOME Bugzilla – Bug 83794
Autodetection of tab-delimited data
Last modified: 2009-08-15 18:40:50 UTC
Originally filed as http://bugs.debian.org/147947 From: Matt Zimmerman <mdz@debian.org> Package: gnumeric Version: 1.0.6-1 Severity: wishlist Tags: patch Gnumeric nicely detects CSV data in a file specified on the command line, and imports it. However, folks around here use tab-delimited data a lot. Here is a patch which implements a crude heuristic for guessing as to whether a text input file contains comma-delimited or tab-delimited data. Seems to work well enough for our purposes. -- - mdz --- gnumeric-1.0.6/src/stf.c Mon Feb 11 22:23:52 2002 +++ stf.c Thu May 23 14:05:25 2002 @@ -239,16 +239,16 @@ } /** - * stf_read_workbook_default_csv + * stf_read_workbook_auto_csvtab * @fo : file opener * @context : command context * @book : workbook * @filename : file to read from+convert * - * Automatic importing of a comma delimited csv file. + * Attempt to auto-detect CSV or tab-delimited file **/ static void -stf_read_workbook_default_csv (GnumFileOpener const *fo, IOContext *context, +stf_read_workbook_auto_csvtab (GnumFileOpener const *fo, IOContext *context, WorkbookView *wbv, char const *filename) { Sheet *sheet; @@ -256,11 +256,22 @@ char *name, *data; StfParseOptions_t *po; + char *pos; + unsigned int comma = 0, tab = 0, lines = 0; + book = wb_view_workbook (wbv); data = stf_preparse (context, filename); if (!data) return; + for( pos = data ; *pos ; ++pos ) + if (*pos == ',') + ++comma; + else if (*pos == '\t') + ++tab; + else if (*pos == '\n') + ++lines; + name = g_strdup_printf (_("Imported %s"), g_basename (filename)); sheet = sheet_new (book, name); g_free (name); @@ -273,11 +284,15 @@ stf_parse_options_set_trim_spaces (po, TRIM_TYPE_LEFT | TRIM_TYPE_RIGHT); stf_parse_options_set_lines_to_parse (po, -1); - stf_parse_options_csv_set_separators (po, ",", NULL); stf_parse_options_csv_set_stringindicator (po, '"'); stf_parse_options_csv_set_indicator_2x_is_single (po, FALSE); stf_parse_options_csv_set_duplicates (po, FALSE); - + + /* Guess */ + stf_parse_options_csv_set_separators (po, ",", NULL); + if (tab >= lines && tab > comma) + stf_parse_options_csv_set_separators (po, "\t", NULL); + if (!stf_parse_sheet (po, data, sheet)) { workbook_sheet_detach (book, sheet); @@ -435,9 +450,9 @@ stf_init (void) { register_file_opener (gnum_file_opener_new ( - "Gnumeric_stf:stf_csv", - _("Text import (defaults to csv)"), - stf_read_default_probe, stf_read_workbook_default_csv), 0); + "Gnumeric_stf:stf_csvtab", + _("Text import (auto-detect CSV or tab-delimited)"), + stf_read_default_probe, stf_read_workbook_auto_csvtab), 0); register_file_opener_as_importer_as_default (gnum_file_opener_new ( "Gnumeric_stf:stf_druid", _("Text import (configurable)"),
The patch looks interesting, can you attach it this bug report so that it does not get butchered by the mail system ? If I can study it this evening it may make it into 1.0.7 tommorow. Thanks
Created attachment 8931 [details] [review] non-butchered version of the patch
Looks good. Applied in 1.0, and forward ported to 1.1