After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 525875 - Optimize evaluation involving large ranges
Optimize evaluation involving large ranges
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: General
git master
Other All
: Normal normal
: ---
Assigned To: Jody Goldberg
Jody Goldberg
: 311816 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2008-04-03 00:25 UTC by Morten Welinder
Modified: 2008-04-11 21:54 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Massive [VH]LOOKUP speed improvement (7.78 KB, patch)
2008-04-08 01:57 UTC, Morten Welinder
none Details | Review
Patch to try (3.91 KB, patch)
2008-04-08 02:23 UTC, Morten Welinder
none Details | Review
Improved patch to try (8.03 KB, patch)
2008-04-08 13:26 UTC, Morten Welinder
none Details | Review

Description Morten Welinder 2008-04-03 00:25:49 UTC
Functions like VLOOKUP are often used a large number of
times for the same very large area.  When the linear-
search form of VLOOKUP is used, that is pretty horrible.

1. In collect.c, cache common, large areas.
   Patch in hand for gnm_float; still needs strings.

2. Add collect flag to use NaN for would-be ignored items.
   (NULL for strings.)

3. Optimize VLOOKUP's and HLOOKUP's linear modes to evaluate
   the whole range -- now cached -- and search in the result.

4. If that is not fast enough, we might have to cache a stable
   sort of the ranges.  Then we can binary search in there.
Comment 1 Jean Bréfort 2008-04-06 12:49:33 UTC
the strip_missing routine in src/tools/analysis_tools.c has serious performance issues too, to see that, just use a whole column.
Comment 2 Morten Welinder 2008-04-06 18:04:08 UTC
My efforts is a bit of a bust.  The solution might have to become local to
[VH]LOOKUP.

Are you sure that strip_missing is to blame?  I would guess any performance
problems here would be because the collect callback is called for each and
every empty cell.
Comment 3 Jean Bréfort 2008-04-07 20:04:42 UTC
I was wrong, the slow one is union_of_int_sets.
Comment 4 Morten Welinder 2008-04-07 20:54:22 UTC
Oh.  That's horrible.  I can make things linear easily.
Comment 5 Morten Welinder 2008-04-08 01:57:11 UTC
Created attachment 108834 [details] [review]
Massive [VH]LOOKUP speed improvement

This speeds up the linear string equality case of VLOOKUP and HLOOKUP,
and possibly some MATCH case too.

Complexity is changed from linear search with piles of g_utd8_casefold
calls to a simple hash lookup.  This is achieved by caching the result
of converting the range the first time it is observed.  (This patch
will thus only give a speedup if the same range is used multiple times.
That is quite common, though.)

Recalc of OO's i81336_vlookup.ods sample goes from ~300s to <1s on my
machine.
Comment 6 Morten Welinder 2008-04-08 02:23:57 UTC
Created attachment 108835 [details] [review]
Patch to try

Jean, this untested patch ought to make union_of_int_sets as fast
as it is ever going to be.  strip_missing also looked like it was
O(n^2), so I fixed that too.

The code assumes that things are sorted and keeps them that way.
Comment 7 Jean Bréfort 2008-04-08 04:30:21 UTC
That's the assumption I made when I fixed the union_of_int_sets in collect.c last year (as far as I remember, I also changed the name), but I did not touch the analysis-tool.c instance because I was not certain that data were always sorted there.
Comment 8 Morten Welinder 2008-04-08 13:26:07 UTC
Created attachment 108860 [details] [review]
Improved patch to try

There is no need for two copies of that code.  This uses the best versions
of the two functions.
Comment 9 Morten Welinder 2008-04-08 17:29:17 UTC
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.
Comment 10 Morten Welinder 2008-04-11 21:54:46 UTC
*** Bug 311816 has been marked as a duplicate of this bug. ***