GNOME Bugzilla – Bug 723894
File with many circular referencing calculations crashes Gnumeric
Last modified: 2014-07-18 17:45:25 UTC
Created attachment 268487 [details] The test file, still working Hello Gnumeric developers, I'm working on a .gnumeric spreadsheet with big underlaying calculations needed. It uses much Vlookup() and Indirect() functions and circular reference. But now my file has got too big and I got problems. I attached a file which is exactly in the limits and is working. It's name is test.gnumeric. To get a crash open this file and copy the area from H3 to IV3. Then paste it inside the area from H3 to IV1437(or more). After this, Gnumeric crashes on my computer. When you start it in console you get this german error message: "Gleitkomma-Ausnahme (Speicherabzug geschrieben)" I think it means something like floating point exception in English. And to get a save file you can't load anymore you have to do this: Copy the area from H3 to IV3 again. Paste in the area from H1437 to IV1437. Then save. There's no error message, but when you try to load the file, Gnumeric crashes with this floating point exeption error message or it hangs forever.
Created attachment 268490 [details] The result of my second instructions. I can't load this file.
Program received signal SIGFPE, Arithmetic exception. g_hash_table_lookup_node (hash_return=<synthetic pointer>, key=0x12af9c0, hash_table=0xf00f00) at /build/buildd/glib2.0-2.38.1/./glib/ghash.c:371 371 /build/buildd/glib2.0-2.38.1/./glib/ghash.c: No such file or directory. (gdb) bt
+ Trace 233147
A wild guess says this is caused by the caches we employ for vlookup and friends.
==3269== Invalid read of size 8 ==3269== at 0x639C19E: g_hash_table_lookup_extended (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1) ==3269== by 0x14DE22FE: find_index_linear (functions.c:503) ==3269== by 0x14DE2FC7: gnumeric_vlookup (functions.c:990) ==3269== by 0x4F02E79: function_call_with_exprs (func.c:1991) ==3269== by 0x4EF4D68: gnm_expr_eval (expr.c:1453) ==3269== by 0x4F023D4: function_call_with_exprs (func.c:1823) ==3269== by 0x4EF4D68: gnm_expr_eval (expr.c:1453) ==3269== by 0x4F03FB3: gnumeric_if2 (func-builtin.c:326) ==3269== by 0x4F02207: function_call_with_exprs (func.c:1784) ==3269== by 0x4EF4D68: gnm_expr_eval (expr.c:1453) ==3269== by 0x4F040A4: gnumeric_if2 (func-builtin.c:335) ==3269== by 0x4F02207: function_call_with_exprs (func.c:1784) ==3269== by 0x4EF4D68: gnm_expr_eval (expr.c:1453) ==3269== by 0x4EF8E3B: gnm_expr_top_eval (expr.c:3210) ==3269== by 0x4EED4A3: gnm_cell_eval_content (dependent.c:1641) ==3269== by 0x4EEC815: cell_dep_eval (dependent.c:1228) ==3269== by 0x4EED747: dependent_eval (dependent.c:1731) ==3269== by 0x4EF08B7: workbook_recalc (dependent.c:2835) ==3269== by 0x4FFEAD4: workbook_view_new_from_input (workbook-view.c:1296) ==3269== by 0x4FFEC09: workbook_view_new_from_uri (workbook-view.c:1341) ==3269== by 0x40496E: main (main-application.c:322) ==3269== Address 0x28b51470 is 48 bytes inside a block of size 88 free'd ==3269== at 0x4C2B60C: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==3269== by 0x639B752: ??? (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1) ==3269== by 0x639C490: g_hash_table_remove_all (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1) ==3269== by 0x639C4FD: g_hash_table_destroy (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1) ==3269== by 0x14DE10AB: clear_caches (functions.c:145) ==3269== by 0x14DE206C: get_linear_lookup_cache.isra.5 (functions.c:269) ==3269== by 0x14DE220C: find_index_linear (functions.c:487) ==3269== by 0x14DE2FC7: gnumeric_vlookup (functions.c:990) ==3269== by 0x4F02E79: function_call_with_exprs (func.c:1991) ==3269== by 0x4EF4D68: gnm_expr_eval (expr.c:1453)
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for the bug report. We really like reports with files exhibiting the problem.
Hi, I think I need to reopen this bug. I worked again on the file which caused this bug and gnumeric crashed again with a floating point exeption. I just added one new column and after adding a second column it crashes. Sorry, but the test file was too big to upload it as an attachment, so please download it here: http://www.file-upload.net/download-8736636/TestFile-Steff2.gnumeric.html The save game is after adding the first column and it should crash gnumeric when loading it. Adding the first column took almost 10 minutes, I think, but gnumeric responded then again. Tested with gnumeric 1.12.12. Ok, good luck. Bye.
I don't understand the instructions about how you make it crash. Please spell it out in detail.
Nevermind, I see the problem. We basically have two different code paths for the lookup function: those that are linear search (fixed in February) and those that are bisection searches (fixed in my tree).
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.
The testfile, already mentioned in comment #6, still crashes gnumeric when loading it. Link: http://www.file-upload.net/download-8736636/TestFile-Steff2.gnumeric.html I downloaded and build the latest version of gnumeric I got here: https://git.gnome.org/browse/gnumeric/snapshot/gnumeric-1a45b7203e3f67b871e7f6d9585057c355b86acb.tar.gz I think the testfile might be a broken file, because it's much bigger than previous versions of the file. I made it when I added one full new column before column A. But I did other changes before, I don't remember exactly. The testfile is AFTER adding a new column. So if you need it, I will try to reconstruct a file which is just like it was before I added a new column. But it's hard because all is very slow when working with this file. > I don't understand the instructions about how you make it crash. Please > spell it out in detail. The instruction is: load the testfile and wait until it crashes. Two other little questions: 1. I have to copy the file libspreadsheet.1.12.xx.so from '/usr/local/lib/' to '/usr/lib/' everytime when I compile gnumeric. Otherwise I can't start gnumeric and I get this error: "gnumeric: error while loading shared libraries: libspreadsheet-1.12.14.so: cannot open shared object file: No such file or directory" My OS is Ubuntu 12.04 Precise Pangolin, but I modified it and use Gnome as Desktop (should be version 2.24). 2. Why are new versions of gnumeric not mentioned on https://projects.gnome.org/gnumeric/ ? Last gnumeric version there is 1.12.9 from November 2013.
I'm fairly certain it's fixed. You have two options on compilation. A. /usr/local/. This is the default. You need to make sure that you run the binary in /usr/local/bin if you do that. B. /usr/. Supply --prefix=/usr to configure. In both cases, you need to run "ldconfig" after the install (as root). As to 2, the home page is at www.gnumeric.org. I'll see if I can cancel the old one.
Note: uninstalling the system version of gnumeric might be the best way to ensure that you are running the newly compiled verson.
Created attachment 272696 [details] Console window showing the crash Ok, more testing shows the test file crashes gnumeric only sometimes immediatly when loading with newest gnumeric version. But if you are able to load it successfully, a crash could be triggered in my tests by pressing F9 (to manual recalculate the sheet). I added a screenshot for proofing the crash. Maybe the problem is completely different to the Vlookup problem... And thanks for answering my other questions. I will remove all pre-installed system packages when I compile Gnumeric the next time.
REcalc does not crash for me. The file is too big to run under valgrind on my laptop. I might have more luck at work.
Created attachment 272705 [details] backtrace I read how to make a backtrace. I cut the middle part of it because it's extreme big.
ok, I see how this can happen. It's string related this time.
Created attachment 272708 [details] [review] Tentative patch Tentative patch. It's a bid of a punt, but we'll live.
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report. The caching code is complicated. I'd simplify if it wasn't so absolutely critical for performance of collections of lookup calls.
Created attachment 273260 [details] new backtrace, crashed after successful saving Hi, I was able to produce another Gnumeric crash when working with this file. My Gnumeric version had the patch attached some comments above added. What I did to get a crash: In sheet2 (the one with heavy calculations) I removed the full third column. This took almost 20 minutes. After responding again, I saved. After some minutes gnumeric then crashed. There is a saved file after this, but it is much larger then it should be. I tried to load it, but I cancelled because it takes quite long, so I'm not sure if it is loadable. I could upload it somewhere when needed. But I have a backtrace attached, too. I have a little suspection, that the indirect cell addressing causes gnumeric to hang so much when adding or removing columns. Normal cell addresses point to the same cell as before, just the name of the cell address has changed. But the indirect cell addressing now points to another cell as before because the string with the cell name isn't changed to the new position of the moved cell...
There isn't anything useful in the stack trace. You description sounds like it might have run out of memory, but it's hard to tell. "indirect" definitely causes Gnumeric to use much more memory. I have never seen a use of "indirect" that couldn't be improved by using something else, "index" for example.
Created attachment 278256 [details] complete new testfile Hi, I managed to get gnumeric to hang forever and I'm quite sure some cells with hlookup functions caused this. It happened on a complete different new file, but I add it to this bug report here because I thought it could be related to this problems here. To reproduce the hang you need to open the new test file attached here. Then go to cell K3, copy it, and paste it to the area from K3 to K150. Gnumeric should hang forever now. Tested with 1.12.16 + 1.12.17 and also with windows-build 1.12.9 with wine.
PS: It seems gnumeric don't hangs forever, after almost 10 minutes it responds again. But I started another test and now it crashed with a segmentation fault when it was almost ready to respond again.
PPS: I recognised now that I made a mistake in my formula, and that the range where the hlookup function needs to search is much bigger then intented. So maybe not as much a bug as a thought, just too much calculations as the computer can handle.
I see no problems with that file -- other than things taking a while. For this kind of file, setting calculation to "manual" might be a good idea.
For future reference: 4 cpu hours under valgrind for this sample.
Created attachment 278289 [details] Testfile with the large formula area empty Hi, calculation is set to manual already. And recalculation alone is quite fast, it takes maybe one second. The problem is when you edit one of cells which contains calculations. Then it takes often over 5-10 minutes until gnumeric responds anymore. Even when I correct the mistake in my formula which I mentioned above, it is not much better. I don't know what gnumeric calculates then, but I think gnumeric tries to rebuild something where it stores which cells depend on other cells?? And this takes definitely too long. When you have the cell area from L4 to SR150 in my test file first empty and then paste the formula which is in this cells in my test file, Gnumeric responds much much faster, maybe after 10 seconds. And after this it's fully useable, too. So the performance of gnumeric is quite good when it builds up cell dependencies completely new, but the performance is bad when something needs to be changed in the cell dependencies, even if it's just a minimal change. This is just my theory, but I thought I need to explain it, because recalculation alone is not a problem, just editing cells is a problem. I added another testfile with the problem causing formulas removed. You can use it to reproduce this steps: Paste formula from L1 to the area from L1 to SR150 Should be quite fast, although this are over 75.000 cells. Then you can test recalc, no problem, all works fast. Then edit any cell with a formula inside and it takes forever...
It clear what is going on (using the file of comment 21): when you change a cell or range of cells, they are marked "dirty". Then everything depending on the changed cells are market dirty recursively. The formulas you have are insane. You have roughly 75k different ranges on which something depends, for example: ... Bucket 0 (rows 1-1024): Range hash size 73688: range over which cells in list depend ... K4:DW109 -> (DX109) K4:DT109 -> (DU109) K4:DU109 -> (DV109) K4:DV109 -> (DW109) K4:BH107 -> (BI107) K4:DX109 -> (DY109) K4:EH109 -> (EI109) K4:EI109 -> (EJ109) K4:EJ109 -> (EK109) K4:EB109 -> (EC109) K4:EL109 -> (EM109) K4:EO107 -> (EP107) K4:BC107 -> (BD107) K4:EM109 -> (EN109) K4:EK109 -> (EL109) K4:ER109 -> (ES109) K4:ES109 -> (ET109) K4:ET109 -> (EU109) K4:EU109 -> (EV109) K4:EV109 -> (EW109) K4:EW109 -> (EX109) K4:EX109 -> (EY109) K4:EY109 -> (EZ109) K4:EZ109 -> (FA109) K4:FA109 -> (FB109) K4:FB109 -> (FC109) K4:FC109 -> (FD109) K4:FD109 -> (FE109) K4:FE109 -> (FF109) K4:FF109 -> (FG109) K4:FG109 -> (FH109) K4:FH109 -> (FI109) K4:FI109 -> (FJ109) K4:FK107 -> (FL107) K4:FK109 -> (FL109) K4:FL109 -> (FM109) K4:FM109 -> (FN109) K4:FN109 -> (FO109) K4:FO109 -> (FP109) K4:FP109 -> (FQ109) K4:FQ109 -> (FR109) K4:FR109 -> (FS109) K4:FS109 -> (FT109) K4:FT109 -> (FU109) K4:FU109 -> (FV109) K4:FV109 -> (FW109) K4:FW109 -> (FX109) K4:FX109 -> (FY109) K4:FY109 -> (FZ109) K4:FZ109 -> (GA109) K4:GA109 -> (GB109) K4:GB109 -> (GC109) K4:GC109 -> (GD109) K4:GD109 -> (GE109) K4:FJ109 -> (FK109) K4:GE109 -> (GF109) K4:GF109 -> (GG109) K4:GG109 -> (GH109) K4:GH109 -> (GI109) K4:GJ109 -> (GK109) K4:GK109 -> (GL109) K4:GI109 -> (GJ109) K4:GM109 -> (GN109) K4:GN109 -> (GO109) K4:GL109 -> (GM109) K4:GP109 -> (GQ109) K4:GQ109 -> (GR109) K4:GO109 -> (GP109) K4:GR109 -> (GS109) K4:GT109 -> (GU109) K4:CT109 -> (CU109) K4:FZ111 -> (GA111) ... (To see these, start with GNM_DEBUG=deps and click the lightbulb.) The amount of work implied by these formulas strongly suggests that you should rethink the setup.
Hi Morten, > The formulas you have are insane. You have roughly 75k different ranges > on which something depends, for example: > The amount of work implied by these formulas strongly suggests that you > should rethink the setup. This test file isn't important for anything, it was just a little testing of number sequences I wanted to play around with. So I don't did much optimisation or something. It was made that every row has a starting number and it calculates automatically the sequence of number for the first row, then the second,... just with pressing recalc, just that you don't need to wonder what's the purpose of all this formulas. So, yes all is linked together and some cells have circular reference. And I can say it works well and quite fast, when you don't change any cells. BUT: This is all not the problem: The problem is: why can gnumeric handle this formulas when you insert them for the first time in a cell range, and why not when it has to make some changes to an existing big cell range of formulas. I think gnumeric would handle files with such complex calculations much better, if it wouldn't even try to change the existing data, which stores the cell dependencies inside. Just calculate the cell dependencies new everytime they change, especially for files with very complex cell dependencies. This isn't maybe not optimal, but far more effective then current behavior, when it calculates for minutes just for one little cell changed. Another little test: Open the test file of comment #21 Press CTRL+A, then CTRL+C to copy all cells of the sheet. Then start gnumeric a second time with an empty sheet open. Goto Cell A1 of the empty sheet and press CTRL+V. The time gnumeric calculates until it responds again is minimal, not 5 seconds I think. So it would be able to calculate the data of cell dependencies in 5 seconds and not in more then 5 minutes like when you edit a existing formula in a cell.
I am not going to act on this. The sample file does not reflect sane use of a spreadsheet, so the fact that it takes forever to change doesn't bother me a lot. In this case, the dependency tracking gets upset over having a dense graph of 70k+ nodes. Too bad. It's "Doctor, when I bang my head on the wall, it hurts!" Note: this isn't Gnumeric specific; I can do the similar thing with Excel/LO also.
Hi, Another question is, why a recalculation is triggered when you make changes in a cell and manual recalculation is on. Why can it not be recalculated the next time when the user presses F9? I tried to make a patch myself, but I don't have enough knowledge to do it. But I found that commenting out line 1908 ( dependent_queue_recalc_main(work) ) in depentend.c seems to give what I want, it doesn't trigger a recalc when you edit cells. Also inserting lines and columns works now much much faster, but it is still slow when inserting lines/columns and then click undo. And it seems it does calcalute one less iteration as normal. If this two disadvantages would not be, it would be a huge help for big, complex files, because performance of gnumeric is now much better. Maybe someone can look at it again when you have time. Thanks.
I am not going to act on this.