After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 723894 - File with many circular referencing calculations crashes Gnumeric
File with many circular referencing calculations crashes Gnumeric
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: Main System
1.12.x
Other Linux
: Normal normal
: ---
Assigned To: Morten Welinder
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2014-02-08 13:03 UTC by Steff
Modified: 2014-07-18 17:45 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
The test file, still working (1.04 MB, application/x-gnumeric)
2014-02-08 13:03 UTC, Steff
  Details
The result of my second instructions. I can't load this file. (1.03 MB, application/x-gnumeric)
2014-02-08 13:05 UTC, Steff
  Details
Console window showing the crash (41.83 KB, image/png)
2014-03-23 15:45 UTC, Steff
  Details
backtrace (70.03 KB, text/plain)
2014-03-23 18:47 UTC, Steff
  Details
Tentative patch (3.70 KB, patch)
2014-03-23 20:06 UTC, Morten Welinder
none Details | Review
new backtrace, crashed after successful saving (1.33 KB, text/plain)
2014-03-29 23:09 UTC, Steff
  Details
complete new testfile (194.12 KB, application/x-gnumeric)
2014-06-11 09:21 UTC, Steff
  Details
Testfile with the large formula area empty (3.41 KB, application/x-gnumeric)
2014-06-11 20:33 UTC, Steff
  Details

Description Steff 2014-02-08 13:03:00 UTC
Created attachment 268487 [details]
The test file, still working

Hello Gnumeric developers,

I'm working on a .gnumeric spreadsheet with big underlaying calculations needed.
It uses much Vlookup() and  Indirect() functions and circular reference.

But now my file has got too big and I got problems.
I attached a file which is exactly in the limits and is working.
It's name is test.gnumeric.

To get a crash open this file and copy the area from H3 to IV3.
Then paste it inside the area from H3 to IV1437(or more).
After this, Gnumeric crashes on my computer.
When you start it in console you get this german error message:
"Gleitkomma-Ausnahme (Speicherabzug geschrieben)"
I think it means something like floating point exception in English.

And to get a save file you can't load anymore you have to do this:
Copy the area from H3 to IV3 again.
Paste in the area from H1437 to IV1437.
Then save. There's no error message, but when you try to load the file, Gnumeric crashes with this floating point exeption error message or it hangs forever.
Comment 1 Steff 2014-02-08 13:05:42 UTC
Created attachment 268490 [details]
The result of my second instructions. I can't load this file.
Comment 2 Andreas J. Guelzow 2014-02-08 22:35:34 UTC
Program received signal SIGFPE, Arithmetic exception.
g_hash_table_lookup_node (hash_return=<synthetic pointer>, key=0x12af9c0, 
    hash_table=0xf00f00) at /build/buildd/glib2.0-2.38.1/./glib/ghash.c:371
371	/build/buildd/glib2.0-2.38.1/./glib/ghash.c: No such file or directory.
(gdb) bt
  • #0 g_hash_table_lookup_node
    at /build/buildd/glib2.0-2.38.1/./glib/ghash.c line 371
  • #1 g_hash_table_lookup_extended
    at /build/buildd/glib2.0-2.38.1/./glib/ghash.c line 1112
  • #2 find_index_linear_equal_string
    at functions.c line 503
  • #3 find_index_linear
    at functions.c line 569
  • #4 gnumeric_vlookup
    at functions.c line 990
  • #5 function_call_with_exprs
    at func.c line 1991
  • #6 gnm_expr_eval
    at expr.c line 1453
  • #7 function_call_with_exprs
    at func.c line 1823
  • #8 gnm_expr_eval
    at expr.c line 1453
  • #9 gnumeric_if2
    at func-builtin.c line 326
  • #10 function_call_with_exprs
    at func.c line 1784
  • #11 gnm_expr_eval
    at expr.c line 1453
  • #12 gnumeric_if2
    at func-builtin.c line 335
  • #13 function_call_with_exprs
    at func.c line 1784
  • #14 gnm_expr_eval
    at expr.c line 1453
  • #15 gnm_expr_top_eval
    at expr.c line 3210
  • #16 gnm_cell_eval_content
    at dependent.c line 1641
  • #17 cell_dep_eval
    at dependent.c line 1228
  • #18 dependent_eval
    at dependent.c line 1731
  • #19 workbook_recalc
    at dependent.c line 2835
  • #20 workbook_view_new_from_input
    at workbook-view.c line 1296
  • #21 workbook_view_new_from_uri
    at workbook-view.c line 1341
  • #22 gui_file_read
    at gui-file.c line 120
  • #21 workbook_view_new_from_uri
    at workbook-view.c line 1341
  • #22 gui_file_read
    at gui-file.c line 120

Comment 3 Morten Welinder 2014-02-08 22:53:26 UTC
A wild guess says this is caused by the caches we employ for vlookup and
friends.
Comment 4 Morten Welinder 2014-02-09 15:43:41 UTC
==3269== Invalid read of size 8
==3269==    at 0x639C19E: g_hash_table_lookup_extended (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1)
==3269==    by 0x14DE22FE: find_index_linear (functions.c:503)
==3269==    by 0x14DE2FC7: gnumeric_vlookup (functions.c:990)
==3269==    by 0x4F02E79: function_call_with_exprs (func.c:1991)
==3269==    by 0x4EF4D68: gnm_expr_eval (expr.c:1453)
==3269==    by 0x4F023D4: function_call_with_exprs (func.c:1823)
==3269==    by 0x4EF4D68: gnm_expr_eval (expr.c:1453)
==3269==    by 0x4F03FB3: gnumeric_if2 (func-builtin.c:326)
==3269==    by 0x4F02207: function_call_with_exprs (func.c:1784)
==3269==    by 0x4EF4D68: gnm_expr_eval (expr.c:1453)
==3269==    by 0x4F040A4: gnumeric_if2 (func-builtin.c:335)
==3269==    by 0x4F02207: function_call_with_exprs (func.c:1784)
==3269==    by 0x4EF4D68: gnm_expr_eval (expr.c:1453)
==3269==    by 0x4EF8E3B: gnm_expr_top_eval (expr.c:3210)
==3269==    by 0x4EED4A3: gnm_cell_eval_content (dependent.c:1641)
==3269==    by 0x4EEC815: cell_dep_eval (dependent.c:1228)
==3269==    by 0x4EED747: dependent_eval (dependent.c:1731)
==3269==    by 0x4EF08B7: workbook_recalc (dependent.c:2835)
==3269==    by 0x4FFEAD4: workbook_view_new_from_input (workbook-view.c:1296)
==3269==    by 0x4FFEC09: workbook_view_new_from_uri (workbook-view.c:1341)
==3269==    by 0x40496E: main (main-application.c:322)
==3269==  Address 0x28b51470 is 48 bytes inside a block of size 88 free'd
==3269==    at 0x4C2B60C: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3269==    by 0x639B752: ??? (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1)
==3269==    by 0x639C490: g_hash_table_remove_all (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1)
==3269==    by 0x639C4FD: g_hash_table_destroy (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.3800.1)
==3269==    by 0x14DE10AB: clear_caches (functions.c:145)
==3269==    by 0x14DE206C: get_linear_lookup_cache.isra.5 (functions.c:269)
==3269==    by 0x14DE220C: find_index_linear (functions.c:487)
==3269==    by 0x14DE2FC7: gnumeric_vlookup (functions.c:990)
==3269==    by 0x4F02E79: function_call_with_exprs (func.c:1991)
==3269==    by 0x4EF4D68: gnm_expr_eval (expr.c:1453)
Comment 5 Morten Welinder 2014-02-10 00:09:02 UTC
This problem has been fixed in our software repository. The fix will go into the next software release.

Thank you for the bug report.  We really like reports with files exhibiting
the problem.
Comment 6 Steff 2014-03-22 08:27:48 UTC
Hi,

I think I need to reopen this bug.

I worked again on the file which caused this bug and gnumeric crashed again with a floating point exeption.

I just added one new column and after adding a second column it crashes.

Sorry, but the test file was too big to upload it as an attachment, so please download it here:
http://www.file-upload.net/download-8736636/TestFile-Steff2.gnumeric.html

The save game is after adding the first column and it should crash gnumeric when loading it.

Adding the first column took almost 10 minutes, I think, but gnumeric responded then again.

Tested with gnumeric 1.12.12.

Ok, good luck. Bye.
Comment 7 Morten Welinder 2014-03-22 14:09:59 UTC
I don't understand the instructions about how you make it crash.  Please
spell it out in detail.
Comment 8 Morten Welinder 2014-03-22 15:12:40 UTC
Nevermind, I see the problem.

We basically have two different code paths for the lookup function: those
that are linear search (fixed in February) and those that are bisection
searches (fixed in my tree).
Comment 9 Morten Welinder 2014-03-23 01:37:07 UTC
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.
Comment 10 Steff 2014-03-23 08:12:27 UTC
The testfile, already mentioned in comment #6, still crashes gnumeric when loading it. Link: http://www.file-upload.net/download-8736636/TestFile-Steff2.gnumeric.html

I downloaded and build the latest version of gnumeric I got here:
https://git.gnome.org/browse/gnumeric/snapshot/gnumeric-1a45b7203e3f67b871e7f6d9585057c355b86acb.tar.gz

I think the testfile might be a broken file, because it's much bigger than previous versions of the file.
I made it when I added one full new column before column A.
But I did other changes before, I don't remember exactly.
The testfile is AFTER adding a new column.

So if you need it, I will try to reconstruct a file which is just like it was before I added a new column. But it's hard because all is very slow when working with this file.

> I don't understand the instructions about how you make it crash.  Please
> spell it out in detail.

The instruction is: load the testfile and wait until it crashes.


Two other little questions:

1. I have to copy the file libspreadsheet.1.12.xx.so from '/usr/local/lib/' to '/usr/lib/' everytime when I compile gnumeric.
Otherwise I can't start gnumeric and I get this error:
   "gnumeric: error while loading shared libraries: libspreadsheet-1.12.14.so:
    cannot open shared object file: No such file or directory"

My OS is Ubuntu 12.04 Precise Pangolin, but I modified it and use Gnome as Desktop (should be version 2.24).

2. Why are new versions of gnumeric not mentioned on https://projects.gnome.org/gnumeric/ ?
Last gnumeric version there is 1.12.9 from November 2013.
Comment 11 Morten Welinder 2014-03-23 14:30:59 UTC
I'm fairly certain it's fixed.

You have two options on compilation.

A. /usr/local/.  This is the default.  You need to make sure that you
run the binary in /usr/local/bin if you do that.

B. /usr/.  Supply --prefix=/usr to configure.

In both cases, you need to run "ldconfig" after the install (as root).



As to 2, the home page is at www.gnumeric.org.  I'll see if I can cancel
the old one.
Comment 12 Morten Welinder 2014-03-23 14:50:55 UTC
Note: uninstalling the system version of gnumeric might be the best way to
ensure that you are running the newly compiled verson.
Comment 13 Steff 2014-03-23 15:45:11 UTC
Created attachment 272696 [details]
Console window showing the crash

Ok,
more testing shows the test file crashes gnumeric only sometimes immediatly when loading with newest gnumeric version.

But if you are able to load it successfully, a crash could be triggered in my tests by pressing F9 (to manual recalculate the sheet).
I added a screenshot for proofing the crash.

Maybe the problem is completely different to the Vlookup problem...

And thanks for answering my other questions. I will remove all pre-installed system packages when I compile Gnumeric the next time.
Comment 14 Morten Welinder 2014-03-23 16:04:44 UTC
REcalc does not crash for me.

The file is too big to run under valgrind on my laptop.  I might have more
luck at work.
Comment 15 Steff 2014-03-23 18:47:41 UTC
Created attachment 272705 [details]
backtrace

I read how to make a backtrace. I cut the middle part of it because it's extreme big.
Comment 16 Morten Welinder 2014-03-23 19:33:56 UTC
ok, I see how this can happen.

It's string related this time.
Comment 17 Morten Welinder 2014-03-23 20:06:36 UTC
Created attachment 272708 [details] [review]
Tentative patch

Tentative patch.  It's a bid of a punt, but we'll live.
Comment 18 Morten Welinder 2014-03-24 14:42:58 UTC
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.



The caching code is complicated.  I'd simplify if it wasn't so absolutely
critical for performance of collections of lookup calls.
Comment 19 Steff 2014-03-29 23:09:06 UTC
Created attachment 273260 [details]
new backtrace, crashed after successful saving

Hi,

I was able to produce another Gnumeric crash when working with this file.
My Gnumeric version had the patch attached some comments above added.

What I did to get a crash: In sheet2 (the one with heavy calculations) I removed the full third column. This took almost 20 minutes. After responding again, I saved. After some minutes gnumeric then crashed.

There is a saved file after this, but it is much larger then it should be. I tried to load it, but I cancelled because it takes quite long, so I'm not sure if it is loadable.
I could upload it somewhere when needed.

But I have a backtrace attached, too.


I have a little suspection, that the indirect cell addressing causes gnumeric to hang so much when adding or removing columns.
Normal cell addresses point to the same cell as before, just the name of the cell address has changed.
But the indirect cell addressing now points to another cell as before because the string with the cell name isn't changed to the new position of the moved cell...
Comment 20 Morten Welinder 2014-03-30 19:48:41 UTC
There isn't anything useful in the stack trace.  You description sounds
like it might have run out of memory, but it's hard to tell.

"indirect" definitely causes Gnumeric to use much more memory.  I have
never seen a use of "indirect" that couldn't be improved by using something
else, "index" for example.
Comment 21 Steff 2014-06-11 09:21:04 UTC
Created attachment 278256 [details]
complete new testfile

Hi,
I managed to get gnumeric to hang forever and I'm quite sure some cells with hlookup functions caused this.
It happened on a complete different new file, but I add it to this bug report here because I thought it could be related to this problems here.

To reproduce the hang you need to open the new test file attached here.
Then go to cell K3, copy it, and paste it to the area from K3 to K150.
Gnumeric should hang forever now.

Tested with 1.12.16 + 1.12.17 and also with windows-build 1.12.9 with wine.
Comment 22 Steff 2014-06-11 10:07:01 UTC
PS: It seems gnumeric don't hangs forever, after almost 10 minutes it responds again.
But I started another test and now it crashed with a segmentation fault when it was almost ready to respond again.
Comment 23 Steff 2014-06-11 10:22:50 UTC
PPS: I recognised now that I made a mistake in my formula, and that the range where the hlookup function needs to search is much bigger then intented.

So maybe not as much a bug as a thought, just too much calculations as the computer can handle.
Comment 24 Morten Welinder 2014-06-11 18:31:52 UTC
I see no problems with that file -- other than things taking a while.
For this kind of file, setting calculation to "manual" might be a good
idea.
Comment 25 Morten Welinder 2014-06-11 18:32:22 UTC
For future reference: 4 cpu hours under valgrind for this sample.
Comment 26 Steff 2014-06-11 20:33:01 UTC
Created attachment 278289 [details]
Testfile with the large formula area empty

Hi,
calculation is set to manual already.
And recalculation alone is quite fast, it takes maybe one second.
The problem is when you edit one of cells which contains calculations. Then it takes often over 5-10 minutes until gnumeric responds anymore.
Even when I correct the mistake in my formula which I mentioned above, it is not much better.
I don't know what gnumeric calculates then, but I think gnumeric tries to rebuild something where it stores which cells depend on other cells?? And this takes definitely too long.
When you have the cell area from L4 to SR150 in my test file first empty and then paste the formula which is in this cells in my test file, Gnumeric responds much much faster, maybe after 10 seconds.

And after this it's fully useable, too.

So the performance of gnumeric is quite good when it builds up cell dependencies completely new, but the performance is bad when something needs to be changed in the cell dependencies, even if it's just a minimal change.

This is just my theory, but I thought I need to explain it, because recalculation alone is not a problem, just editing cells is a problem.


I added another testfile with the problem causing formulas removed.

You can use it to reproduce this steps:

Paste formula from L1 to the area from L1 to SR150
Should be quite fast, although this are over 75.000 cells.

Then you can test recalc, no problem, all works fast.

Then edit any cell with a formula inside and it takes forever...
Comment 27 Morten Welinder 2014-06-13 18:47:18 UTC
It clear what is going on (using the file of comment 21): when you change
a cell or range of cells, they are marked "dirty".  Then everything depending
on the changed cells are market dirty recursively.

The formulas you have are insane.  You have roughly 75k different ranges
on which something depends, for example:

...
  Bucket 0 (rows 1-1024): Range hash size 73688: range over which cells in list depend
...
    K4:DW109 -> (DX109)
    K4:DT109 -> (DU109)
    K4:DU109 -> (DV109)
    K4:DV109 -> (DW109)
    K4:BH107 -> (BI107)
    K4:DX109 -> (DY109)
    K4:EH109 -> (EI109)
    K4:EI109 -> (EJ109)
    K4:EJ109 -> (EK109)
    K4:EB109 -> (EC109)
    K4:EL109 -> (EM109)
    K4:EO107 -> (EP107)
    K4:BC107 -> (BD107)
    K4:EM109 -> (EN109)
    K4:EK109 -> (EL109)
    K4:ER109 -> (ES109)
    K4:ES109 -> (ET109)
    K4:ET109 -> (EU109)
    K4:EU109 -> (EV109)
    K4:EV109 -> (EW109)
    K4:EW109 -> (EX109)
    K4:EX109 -> (EY109)
    K4:EY109 -> (EZ109)
    K4:EZ109 -> (FA109)
    K4:FA109 -> (FB109)
    K4:FB109 -> (FC109)
    K4:FC109 -> (FD109)
    K4:FD109 -> (FE109)
    K4:FE109 -> (FF109)
    K4:FF109 -> (FG109)
    K4:FG109 -> (FH109)
    K4:FH109 -> (FI109)
    K4:FI109 -> (FJ109)
    K4:FK107 -> (FL107)
    K4:FK109 -> (FL109)
    K4:FL109 -> (FM109)
    K4:FM109 -> (FN109)
    K4:FN109 -> (FO109)
    K4:FO109 -> (FP109)
    K4:FP109 -> (FQ109)
    K4:FQ109 -> (FR109)
    K4:FR109 -> (FS109)
    K4:FS109 -> (FT109)
    K4:FT109 -> (FU109)
    K4:FU109 -> (FV109)
    K4:FV109 -> (FW109)
    K4:FW109 -> (FX109)
    K4:FX109 -> (FY109)
    K4:FY109 -> (FZ109)
    K4:FZ109 -> (GA109)
    K4:GA109 -> (GB109)
    K4:GB109 -> (GC109)
    K4:GC109 -> (GD109)
    K4:GD109 -> (GE109)
    K4:FJ109 -> (FK109)
    K4:GE109 -> (GF109)
    K4:GF109 -> (GG109)
    K4:GG109 -> (GH109)
    K4:GH109 -> (GI109)
    K4:GJ109 -> (GK109)
    K4:GK109 -> (GL109)
    K4:GI109 -> (GJ109)
    K4:GM109 -> (GN109)
    K4:GN109 -> (GO109)
    K4:GL109 -> (GM109)
    K4:GP109 -> (GQ109)
    K4:GQ109 -> (GR109)
    K4:GO109 -> (GP109)
    K4:GR109 -> (GS109)
    K4:GT109 -> (GU109)
    K4:CT109 -> (CU109)
    K4:FZ111 -> (GA111)
...


(To see these, start with GNM_DEBUG=deps and click the lightbulb.)


The amount of work implied by these formulas strongly suggests that you
should rethink the setup.
Comment 28 Steff 2014-06-14 11:53:35 UTC
Hi Morten,

> The formulas you have are insane.  You have roughly 75k different ranges
> on which something depends, for example:

> The amount of work implied by these formulas strongly suggests that you
> should rethink the setup.

This test file isn't important for anything, it was just a little testing of number sequences I wanted to play around with. So I don't did much optimisation or something.

It was made that every row has a starting number and it calculates automatically the sequence of number for the first row, then the second,...
just with pressing recalc, just that you don't need to wonder what's the purpose of all this formulas.
So, yes all is linked together and some cells have circular reference.

And I can say it works well and quite fast, when you don't change any cells.


BUT:
This is all not the problem:

The problem is: why can gnumeric handle this formulas when you insert them for the first time in a cell range, and why not when it has to make some changes to an existing big cell range of formulas.


I think gnumeric would handle files with such complex calculations much better, if it wouldn't even try to change the existing data, which stores the cell dependencies inside.
Just calculate the cell dependencies new everytime they change, especially for files with very complex cell dependencies.
This isn't maybe not optimal, but far more effective then current behavior, when it calculates for minutes just for one little cell changed.


Another little test:
Open the test file of comment #21

Press CTRL+A, then CTRL+C to copy all cells of the sheet.

Then start gnumeric a second time with an empty sheet open.

Goto Cell A1 of the empty sheet and press CTRL+V.

The time gnumeric calculates until it responds again is minimal, not 5 seconds I think.

So it would be able to calculate the data of cell dependencies in 5 seconds and not in more then 5 minutes like when you edit a existing formula in a cell.
Comment 29 Morten Welinder 2014-06-14 17:21:02 UTC
I am not going to act on this.

The sample file does not reflect sane use of a spreadsheet, so the fact
that it takes forever to change doesn't bother me a lot.  In this case,
the dependency tracking gets upset over having a dense graph of 70k+
nodes.  Too bad.  It's "Doctor, when I bang my head on the wall, it hurts!"

Note: this isn't Gnumeric specific; I can do the similar thing with
Excel/LO also.
Comment 30 Steff 2014-07-03 06:21:45 UTC
Hi,
Another question is, why a recalculation is triggered when you make changes in a cell and manual recalculation is on. Why can it not be recalculated the next time when the user presses F9?

I tried to make a patch myself, but I don't have enough knowledge to do it.
But I found that commenting out line 1908 ( dependent_queue_recalc_main(work) ) in depentend.c seems to give what I want, it doesn't trigger a recalc when you edit cells.
Also inserting lines and columns works now much much faster, but it is still slow when inserting lines/columns and then click undo.
And it seems it does calcalute one less iteration as normal.

If this two disadvantages would not be, it would be a huge help for big, complex files, because performance of gnumeric is now much better. Maybe someone can look at it again when you have time. Thanks.
Comment 31 Morten Welinder 2014-07-18 17:45:25 UTC
I am not going to act on this.