Bug 654538 – gnumeric slow on certain operations

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 654538 - gnumeric slow on certain operations


Summary:	gnumeric slow on certain operations


Status:	RESOLVED FIXED

Product:	Gnumeric
Classification:	Applications
Component:	General
Version:	1.10.x
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Jody Goldberg
QA Contact:	Jody Goldberg

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-07-13 10:32 UTC by Frédéric Parrenin
Modified:	2011-07-20 21:14 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Frédéric Parrenin 2011-07-13 10:32:25 UTC

I provide a .gnumeric file where the velocity of certain operations can be tested:
- inserting/deleting cells
- inserting/deleting sheets
- etc.

http://parrenin.frederic.free.fr/download/bugzilla/Ice-synchros-2011-07-07.gnumeric

It seems sometimes gnumeric unnecessarily recalculates everything.

Comment 1 Andreas J. Guelzow 2011-07-13 18:16:05 UTC

This is a huge file. Why are you asking for these recalculations if you don't need them? You do have the option to switch off automatic recalculation do all your insertion/deletion and then recalculate again.

While in retrospect some recalculation may be unneccessary it is not very obvious to determine them without essentially doig the calculation.

Comment 2 Andreas J. Guelzow 2011-07-13 18:35:01 UTC

Is there any other spreadsheet program that can handle this spreadsheet and allows you to insert rows or columns in less than twice gnumeric's time?

Comment 3 Andreas J. Guelzow 2011-07-13 20:07:16 UTC

Hmm, there is something strange going on:

In the attached file, on sheet EDC, select row 8 and choose insert row.

It takes suspiciously long until the warning "Would split array G5:G68863" appears. Note that there should not be any recalculations happening!

Comment 4 Andreas J. Guelzow 2011-07-13 21:11:24 UTC

thess sheets could be sped up bz just using more appropriate formulae, for example on EDC-LD F10:F13 contain individual interpolation formulae that really shoudl be a single array formula. Similarly for G10:G11 (perhaps even G12) on the same sheet.

Comment 5 Andreas J. Guelzow 2011-07-14 03:15:15 UTC

On EDC-SIPLE all the formulae in column A should be replaced with a single array formula. The fact that calculations in this workbook take a very long time is definitely due to the choice of performing the same calculations and data retrieval a large number of times when it could also be done in a more time efficient fashion.

Comment 6 Andreas J. Guelzow 2011-07-14 06:21:50 UTC

Regarding my comment #3, inserting the rows of course causes cmd_insert_rows to be called. In cmd_insert_rows we have 
r = sheet_get_extent (sheet, TRUE)
this takes a long time. (We probably should be using sheet_get_cells_extent here).

Comment 7 Andreas J. Guelzow 2011-07-14 06:26:44 UTC

Of course for the row/column insertion we should not need to find the extent. We only need to know that the last row(s)/column(s) is/are empty.

Comment 8 Andreas J. Guelzow 2011-07-14 07:25:56 UTC

I have fixed the issue of comments #3, #6, #7.

Comment 9 Frédéric Parrenin 2011-07-15 11:47:00 UTC

@Andreas: 

- thank you very much for having fixed the issue you described above.
- I am sometimes not using array formulas because I often need to insert/delete rows and array areas would be broken.
- I still find strange that changing some basic values (i.e. in sheet EDC-VK, columns G, H) everything is calculated quickly (only a few seconds) but some other operations like inserting/deleting cells are very slow (as slow as opening the file for the first time)
- I cannot compare with other spreadsheet programs because the interpolation function does not exist in these programs.

Comment 10 Andreas J. Guelzow 2011-07-17 20:35:57 UTC

INTERPOLATION is written as an array function. It:
1) retrieves the data (which I suspect takes the longest)
2) linearly compares the targets with the abscissae to determine the corresponding interpolation values. (If the number of target points is of the same magnitude as (or larger than) the number of abscissae this is faster than searching for the appropriate interval one at a time.)

As a  consequence if you split a an interpolation formula that could use as 12000 entry array into 12000 individual formulae, you are increasing the calculation time by a factor of 6000 to 12000; i/e/ if the array function would have taken 1/100 sec this has become 1 to 2 minutes.


If you change a value only the values directly or indirectly depending on that value will be recalculated. 

If you change the location of a formula for example by inserting a new column A you change the location of all formulae on that sheet and then those formulae (and whatever depends on them) will be recalculated).

Essentially for some formulae the value depends on the location. Normally it is much faster than to recalculate than to try to figure out whether the value could have changed. Your interpolation formulae are an example where that clearly is not the case.

Comment 11 Frédéric Parrenin 2011-07-19 17:38:58 UTC

Would not it be possible to figure out whether the values could have changed, as an option?

Comment 12 Andreas J. Guelzow 2011-07-19 17:51:13 UTC

Users already have the option to switch off recalculation, perform all their insertions, deletions and rearrangement and then recalculate once.   In my opinion it does not make sense to add checks in the regular recalculation that would slow them down to address some rare situation.

I suspect that you would not see any such significant slowdown if you were to use the array version of the function. Note that if you have a column of array function you can still insert partial rows to the left (or right) of that column.

Comment 13 Andreas J. Guelzow 2011-07-19 19:04:53 UTC

Please note that I didn't close this bug, since there is clearly room for improvement:

We currently cache the retrieval of single arrays of floats. INTERPOLATION benefits of that cache with respect to the third argument. The first two are retrieved as pairs (since we drop incomplete pairs to ensure that a single missing value doesn't offset the numbers). We do not cache those retrievals. This workbook would benefit greatly from such a cache. If the third argument is a single value we may also want to use a binary search for the appropriate interval but that would result in a much smaller improvement than the caching.

So I do think there is lots of room of improvement in the speed of this worksheet.

Comment 14 Frédéric Parrenin 2011-07-19 19:10:42 UTC

I unfortunately need to insert partial rows inside the arrays.

Comment 15 Andreas J. Guelzow 2011-07-19 19:58:31 UTC

If you provide a large enough array initially you should not need to have to insert inside the array itself, I think (unless there is some charting issue when you include extra cells that show an error).

Comment 16 Andreas J. Guelzow 2011-07-20 19:24:13 UTC

I have implemented caching of the retrieval of abscissae/ordinates. On my machine this improves loading speed of your file by a factor of 5 (from 12.5 minutes to 2.5 minutes). A similar improvement should be seen in the recalculation after inserting cols or rows etc.

I still may consider modifying the INTERPOLATION call itself to be faster in the case of a single target.

Comment 17 Andreas J. Guelzow 2011-07-20 21:14:31 UTC

When, for the case of a single target in an INTERPOLATION call, I implement a binary search for the correct interval rather than the current linear search the opening time of the file in fact becomes slightly longer. It seems that the overhead of the search in this case offsets the reduction in the number of comparisons. Since we are just comparing floats this is not really that surprising.

So we will not do that.

I consider this bug fixed now.

This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.