GNOME Bugzilla – Bug 654538
gnumeric slow on certain operations
Last modified: 2011-07-20 21:14:31 UTC
I provide a .gnumeric file where the velocity of certain operations can be tested: - inserting/deleting cells - inserting/deleting sheets - etc. http://parrenin.frederic.free.fr/download/bugzilla/Ice-synchros-2011-07-07.gnumeric It seems sometimes gnumeric unnecessarily recalculates everything.
This is a huge file. Why are you asking for these recalculations if you don't need them? You do have the option to switch off automatic recalculation do all your insertion/deletion and then recalculate again. While in retrospect some recalculation may be unneccessary it is not very obvious to determine them without essentially doig the calculation.
Is there any other spreadsheet program that can handle this spreadsheet and allows you to insert rows or columns in less than twice gnumeric's time?
Hmm, there is something strange going on: In the attached file, on sheet EDC, select row 8 and choose insert row. It takes suspiciously long until the warning "Would split array G5:G68863" appears. Note that there should not be any recalculations happening!
thess sheets could be sped up bz just using more appropriate formulae, for example on EDC-LD F10:F13 contain individual interpolation formulae that really shoudl be a single array formula. Similarly for G10:G11 (perhaps even G12) on the same sheet.
On EDC-SIPLE all the formulae in column A should be replaced with a single array formula. The fact that calculations in this workbook take a very long time is definitely due to the choice of performing the same calculations and data retrieval a large number of times when it could also be done in a more time efficient fashion.
Regarding my comment #3, inserting the rows of course causes cmd_insert_rows to be called. In cmd_insert_rows we have r = sheet_get_extent (sheet, TRUE) this takes a long time. (We probably should be using sheet_get_cells_extent here).
Of course for the row/column insertion we should not need to find the extent. We only need to know that the last row(s)/column(s) is/are empty.
I have fixed the issue of comments #3, #6, #7.
@Andreas: - thank you very much for having fixed the issue you described above. - I am sometimes not using array formulas because I often need to insert/delete rows and array areas would be broken. - I still find strange that changing some basic values (i.e. in sheet EDC-VK, columns G, H) everything is calculated quickly (only a few seconds) but some other operations like inserting/deleting cells are very slow (as slow as opening the file for the first time) - I cannot compare with other spreadsheet programs because the interpolation function does not exist in these programs.
INTERPOLATION is written as an array function. It: 1) retrieves the data (which I suspect takes the longest) 2) linearly compares the targets with the abscissae to determine the corresponding interpolation values. (If the number of target points is of the same magnitude as (or larger than) the number of abscissae this is faster than searching for the appropriate interval one at a time.) As a consequence if you split a an interpolation formula that could use as 12000 entry array into 12000 individual formulae, you are increasing the calculation time by a factor of 6000 to 12000; i/e/ if the array function would have taken 1/100 sec this has become 1 to 2 minutes. If you change a value only the values directly or indirectly depending on that value will be recalculated. If you change the location of a formula for example by inserting a new column A you change the location of all formulae on that sheet and then those formulae (and whatever depends on them) will be recalculated). Essentially for some formulae the value depends on the location. Normally it is much faster than to recalculate than to try to figure out whether the value could have changed. Your interpolation formulae are an example where that clearly is not the case.
Would not it be possible to figure out whether the values could have changed, as an option?
Users already have the option to switch off recalculation, perform all their insertions, deletions and rearrangement and then recalculate once. In my opinion it does not make sense to add checks in the regular recalculation that would slow them down to address some rare situation. I suspect that you would not see any such significant slowdown if you were to use the array version of the function. Note that if you have a column of array function you can still insert partial rows to the left (or right) of that column.
Please note that I didn't close this bug, since there is clearly room for improvement: We currently cache the retrieval of single arrays of floats. INTERPOLATION benefits of that cache with respect to the third argument. The first two are retrieved as pairs (since we drop incomplete pairs to ensure that a single missing value doesn't offset the numbers). We do not cache those retrievals. This workbook would benefit greatly from such a cache. If the third argument is a single value we may also want to use a binary search for the appropriate interval but that would result in a much smaller improvement than the caching. So I do think there is lots of room of improvement in the speed of this worksheet.
I unfortunately need to insert partial rows inside the arrays.
If you provide a large enough array initially you should not need to have to insert inside the array itself, I think (unless there is some charting issue when you include extra cells that show an error).
I have implemented caching of the retrieval of abscissae/ordinates. On my machine this improves loading speed of your file by a factor of 5 (from 12.5 minutes to 2.5 minutes). A similar improvement should be seen in the recalculation after inserting cols or rows etc. I still may consider modifying the INTERPOLATION call itself to be faster in the case of a single target.
When, for the case of a single target in an INTERPOLATION call, I implement a binary search for the correct interval rather than the current linear search the opening time of the file in fact becomes slightly longer. It seems that the overhead of the search in this case offsets the reduction in the number of comparisons. Since we are just comparing floats this is not really that surprising. So we will not do that. I consider this bug fixed now. This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.