GNOME Bugzilla – Bug 311816
Collections of [VH]LOOKUP on a single table are slow
Last modified: 2008-04-11 21:54:46 UTC
When opening an MS-Excel .xls file needing formula recalculation, Gnumeric takes an extremely long time to recalculate. Jody Goldberg <jody@gnome.org> writes: ----- Ahh, I see the problem. We're using to the more expensive case insensitive comparison immediately which requires alot of string conversions to handle the collation. I'll look into checking straight equality first. ----- Other information: A rather large file that exposes the poor performance can be found at http://wfltest.mr-paradox.net/players.zip . The zipped spreadsheet was created with the perl module Spreadsheet::WriteExcel v2.14 and contains one formula per row per worksheet (all in the "WFL" column on each sheet). Due to the way Spreadsheet::WriteExcel handles formulas, each formula cell requires recalculation upon opening by a spreadsheet reader (Excel, Gnumeric, etc.).
Running a value_compare (case_insensitive = TRUE) on large data sets is very expensive. We're doing a pair of casefolds and a collate.
The collate alone is 76% of total time: 100.00% main 99.07% wb_view_new_from_uri 99.07% wb_view_new_from_input 97.43% workbook_recalc 97.43% dependent_eval 97.43% cell_eval_content 97.43% gnm_expr_eval 97.43% function_call_with_list 97.42% gnumeric_vlookup 97.42% find_index_linear 96.44% value_compare 72.62% g_utf8_collate 40.08% g_convert 30.32% open_converter 30.15% g_utf8_normalize 26.83% _g_gnulib_sprintf 26.77% _g_gnulib_vsprintf 25.63% _g_gnulib_vasnprintf 25.60% _g_utf8_normalize_wc 22.81% g_utf8_casefold 15.20% g_realloc 15.10% _g_gnulib_printf_parse 14.36% realloc 12.08% _malloc_unlocked 11.86% g_string_append_unichar 10.45% g_string_insert_unichar 9.98% g_malloc 8.99% malloc 7.65% g_string_insert_len 6.63% cleanfree 6.53% combine 6.40% g_convert_with_iconv 6.28% g_free 5.29% g_string_maybe_expand 4.90% g_unicode_canonical_ordering 4.70% memcpy 4.20% free 3.99% sprintf 3.92% g_iconv 3.89% g_ucs4_to_utf8 3.82% libiconv 3.68% _doprnt 3.67% _free_unlocked 3.53% unicode_loop_convert 3.38% g_unichar_tolower 3.32% g_string_new 3.25% g_string_sized_new 3.24% g_utf8_get_char 3.23% realfree 3.15% close_converter[libglib-2.0.so.0/103] 2.92% g_hash_table_lookup 2.88% mutex_unlock 2.53% t_delete 2.16% g_unichar_to_utf8 1.94% find_decomposition 1.68% mutex_lock 1.64% go_file_opener_open 1.64% go_plugin_file_opener_open 1.64% go_plugin_loader_module_func_file_open 1.64% excel_file_open 1.64% excel_read_workbook 1.64% excel_read_BOF 1.64% excel_read_sheet 1.58% t_splay 1.55% strlen 1.28% g_string_free 1.13% g_hash_table_remove 1.10% g_hash_table_insert
See also bug 172406.
In case it becomes necessary for anyone to download this spreadsheet from this point forward, it is now available only at http://www.wildfootball.org/players.zip . The 'wfltest' hostname in the initial problem report has ceased to exist. And I have one question, as I'm not much of a source diver: would this problem also cause poor performance when updating a cell in a worksheet referenced by vlookup formulas, or is that a separate issue? Gnumeric goes out to lunch for quite some time, similar to recalculation upon opening, if I change seemingly any cell value in the 2004WFL_Players worksheet of the above workbook.
Same problem. The problem is that we are going to re-evaluate all cells that depend on the cell you change. That means that all the lookup stuff gets re-evaluated, just as it does on load for this sheet. The killer is that we are not noticing and using that a number of lookups on the same table is being done.
Accidentally fixed. Thanks for taking the time to report this bug. This particular bug has already been reported into our bug tracking system, but we are happy to tell you that the problem has already been fixed. It should be solved in the next software version. You may want to check for a software upgrade. *** This bug has been marked as a duplicate of 525875 ***