GNOME Bugzilla – Bug 301127
Expression leaks
Last modified: 2005-12-14 15:02:27 UTC
Distribution: Fedora Core release 3 (Heidelberg) Package: Gnumeric Severity: critical Version: GNOME2.10.0 1.4.x Gnome-Distributor: Red Hat, Inc Synopsis: Crash on close Bugzilla-Product: Gnumeric Bugzilla-Component: General Bugzilla-Version: 1.4.x BugBuddy-GnomeVersion: 2.0 (2.10.0) Description: I just closed Gnumeric after editing a spreadsheet, and it crashed. Is the stacktrace helpful? Debugging Information: Backtrace was generated from '/usr/bin/gnumeric' (no debugging symbols found) Using host libthread_db library "/lib/libthread_db.so.1". (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) [Thread debugging using libthread_db enabled] [New Thread -1208301888 (LWP 8999)] [New Thread -1226032208 (LWP 9009)] [New Thread -1225765968 (LWP 9008)] [New Thread -1215276112 (LWP 9007)] (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) 0x00b75402 in ?? ()
+ Trace 58366
Thread 1 (Thread -1208301888 (LWP 8999))
------- Bug moved to this database by unknown@bugzilla.gnome.org 2005-04-18 19:23 ------- Unknown platform unknown. Setting to default platform "Other". Unknown milestone "unknown" in product "Gnumeric". Setting to default milestone for this product, '---' Setting to default status "UNCONFIRMED". Setting qa contact to the default for this product. This bug either had no qa contact or an invalid one.
Unfortunately, the stack trace is no good. But I can guess what the secondary cause is: our leak printer crashes on exit from time to time when it encounters partially=freed structures. There should be some indication of leaks on stderr before the crash. Without more information I won't be able to pin-point what is leaking, so I am closing for now. Feel free to reopen if more information arrives. (Note: the reason this particular crash does not bother us a whole lot is that it never causes data loss.)
Actually, I just discovered some additional info. Three days after reporting this I found that I actually ran gnumeric from the terminal this time. I found the terminal window buried under the windows of all the other stuff I was working on...: -- [lhutchis@smgf-wks12:~/Queries]$ gnumeric genotype_test.csv Reading file:///home/lhutchis/Queries/genotype_test.csv Writing file:///home/lhutchis/Queries/genotype_test.csv Leaking expression at 0x967d1a8: IV1. ** (bug-buddy:10654): WARNING **: Couldn't load icon for Text Editor [lhutchis@smgf-wks12:~/Queries]$ -- The only problem is, I never used cell IV1 -- the furthest-over cell in that CSV file is in column AK (I'm assuming IV1 is a cell ref?). What I did in between when I loaded it and when it crashed upon close is: - Inserted 3 columns - Put a formula in each cell of each of the 3 columns, which processed the value in the cell to the immediate left of the formula (it was a string formatting formula, involving CONCATENATE(), LEFT(), RIGHT() and LEN()) - Copied each column full of formulae and did paste special -> as values, pasting over the column to the left (overwriting the original strings with the formatted strings) - Deleted the columns with formulae in - Saved the file over the top of the original - Closed the file -> Crash! Is this more helpful? I don't think I ever went to the far right (column IV), although it is possible. If IV1 is a cell ref it may signify a memory corruption, and not just a leak.
It's helpful. "IV1" is the expression "=IV1" which probably started out as "=A1" or something like that. However, that not the crash causing one -- we never see that one.
Presumably though if a formula has become corrupt, that could be the cause of the leak printer crashing?
I'll guess that the problem is related to the col deletion. Could you send / attach the resulting file just before that ?
Unfortunately the file is very large and contains sensitive data that my company cannot divulge publicly, but I can describe the nature of it: It is a CSV file containing mostly 3-digit integer data (36 columns, 12000+ rows, 2 to 6 digits per integer). The first row is a header row containing string labels. There is nothing particularly unusual about the label strings: I took a column like "YCAII" containing a number like 223226, and inserted a dash between the first three digits and the last three, for example, "223-226" (hence the use of the functions I described), and created a new column of the same name. I then copied the resulting new YCAII column and pasted as values over the original YCAII column, and then deleted the new YCAII column that contained the formulae. It is possibly one of these last two steps that produced the corruption. Since the original file was just pure CSV data, it may not help to send it anyway. I tried to reproduce the problem and could not, but hopefully the description may allude to what actually happened.
May be it's best to close this as incomplete?
I have seen the IV1 leaks again -- all the following were reported as leaks after closing a single spreadsheet: Writing file:///home/luke/SMGF/Haplotyper%20Benchmark/STAGE_1.gnumeric Leaking expression at 0x865d8e8: IV1. Leaking expression at 0x865d938: IV1. Leaking expression at 0x865d960: "". Leaking expression at 0x865d988: IV1<>"". Leaking expression at 0x865d9b0: "". Leaking expression at 0x865d9d8: . Leaking expression at 0x865da28: "". Leaking expression at 0x865da50: IV1<>"". Leaking expression at 0x865da78: "". Leaking expression at 0x865daa0: IV1. Leaking expression at 0x865dac8: IR2. Leaking expression at 0x865daf0: IV1=IR2. Leaking expression at 0x865db18: 1. Leaking expression at 0x865db40: 0. Leaking expression at 0x865db68: \uffff7\uffffh\uffff(IV1=IR2,1,0). ** (gnumeric:11680): WARNING **: Leaked 15 nodes from expression pool. ** (gnumeric:11680): WARNING **: Leaked 2 nodes from value int/bool pool. ** (gnumeric:11680): WARNING **: Leaked 4 nodes from value string pool. Leaking string [Logic] with ref_count=2. Leaking string [] with ref_count=4. ** (gnumeric:11680): WARNING **: Leaked 2 nodes from string pool.
Also, the last comment was with Gnumeric-1.6.
Ideas clues to what kind of editing was taking place and/or what kind of tools/dialogs were being used?
Very similar -- I opened a CSV file, added a bunch of formulas that derive values from the data in the CSV file, and added another sheet. I also copied/pasted between sheets open in different windows. I may have deleted a column. It is possible that some of the above were due to problems when I typed in a formula and its syntax was wrong, and it asked me if I want to Re-Edit/Accept?
> It is possible that some of the above were due to problems when I typed in a > formula and its syntax was wrong, and it asked me if I want to Re-Edit/Accept? Very much possible. Can you tell us a few of the formulas and ways you might have mistyped?
A couple of examples I remember: (1) mismatched parentheses, I don't remember the exact example (2) =if(M161!="",M161,N160) where I had accidentally substituted "!=" for "<>" (3) =if(L165<>"",max(offset(I166,0,0,L165,1)),"") this is the correct formula, but I had to play with the params to offset() to get them right -- e.g. I166 was a range I166:I198 or something at one point
More leaks... I just edited a sheet in the following way: * Opened the sheet that gave the above leaks, now saved as a .gnumeric * Opened a TSV file, copied data from it, pasted into the first sheet * Hit undo, inserted a couple of columns, pasted again. * Copied some formulas down since there was more data pasted than formula rows on the sheet * Copied some more formulas across 2 more columns * Edited one formula, had another editing error: Q1 was: =if(abs($P1-$K1)<1e-06,A1,"") Edited to: =if(B1<>"",B1,(abs($P1-$K1)<1e-06,A1,"")) ^ left out the other "if" * Hit "Re-Edit" and fixed the problem. * Saved. * Quit. I realize this is not a very reproducible test case, sorry. Terminal warnings below. $ gnumeric STAGE_1.gnumeric Reading file:///home/luke/SMGF/Haplotyper%20Benchmark/STAGE_1.gnumeric Reading file:///home/luke/SMGF/Haplotyper%20Benchmark/out_phase_37.35_pairs Reading file:///home/luke/SMGF/Haplotyper%20Benchmark/out_phase_37.35_pairs Writing file:///home/luke/SMGF/Haplotyper%20Benchmark/STAGE_1.gnumeric ** (gnumeric:16644): CRITICAL **: symbol_unref: assertion `sym != NULL' failed ** (gnumeric:16644): CRITICAL **: symbol_unref: assertion `sym != NULL' failed ** (gnumeric:16644): CRITICAL **: symbol_unref: assertion `sym != NULL' failed ** (gnumeric:16644): CRITICAL **: symbol_unref: assertion `sym != NULL' failed ** (gnumeric:16644): CRITICAL **: symbol_unref: assertion `sym != NULL' failed [...........many times............] ** (gnumeric:16644): CRITICAL **: symbol_unref: assertion `sym != NULL' failed ** (gnumeric:16644): CRITICAL **: gnm_func_free: assertion `func->ref_count == 0' failed Leaking expression at 0xc7ceef0: 1e-06. Leaking expression at 0xc7cef18: $P1. Leaking expression at 0xc7cef40: $K1. Leaking expression at 0xc7cef68: $P1-$K1. Leaking expression at 0xc7cef90: \uffff\uffffk \uffffh\uffff($P1-$K1). Leaking expression at 0xc7cefb8: \uffff\uffffk \uffffh\uffff($P1-$K1)<1e-06. Leaking expression at 0xc7cefe0: IG1. Leaking expression at 0xc7cf008: "". ** (gnumeric:16644): WARNING **: Leaked 8 nodes from expression pool. ** (gnumeric:16644): WARNING **: Leaked 1 nodes from value float pool. ** (gnumeric:16644): WARNING **: Leaked 1 nodes from value string pool. Leaking string [Mathematics] with ref_count=2. Leaking string [] with ref_count=1. ** (gnumeric:16644): WARNING **: Leaked 2 nodes from string pool.
I just emailed an example sheet (too big for Bugzilla) to Morten, with instructions with 100% reproducibility. It turns out this indeed seems to be a problem with erroneous functions and "Re-Edit".
In fact here's a minimalistic test case that doesn't require a special sheet: - Start gnumeric in the terminal - In A1, type "=if (2=3,4,5)" (note the space between "if" and "(", which is illegal currently) - Hit Enter, then Re-Edit, then remove the space, then hit Enter again - Save the sheet - Quit gnumeric - You should see something like this: $ gnumeric Writing file:///home/luke/Book1.gnumeric Leaking expression at 0x9ad0eb0: 2. Leaking expression at 0x9ad0f00: 3. Leaking expression at 0x9ad0f28: 2=3. Leaking expression at 0x9ad0f50: 4. Leaking expression at 0x9ad0f78: 5. ** (gnumeric:16871): WARNING **: Leaked 5 nodes from expression pool. ** (gnumeric:16871): WARNING **: Leaked 4 nodes from value int/bool pool.
By the way, is there a good reason why spaces are not allowed between function names and parentheses? I get in the habit of writing "if (...)" from other languages, so do this frequently. Spaces are removed elsewhere, I can't see why they should not be here. Should I file a bug, or is this by design?
Ok, now we're talking! I had already found an fixed this in my tree, but I didn't see a plausible you could have triggered it. I'll have to double- check the sheet you sent tomorrow, but I think this covers it. Space is, by virtue of MS' inspirational syntax choices, an operator in Excel. (Intersection of references.) That means we have to be quite careful where we ignore spaces as we might otherwise change the meaning of some function. I agree that it is rather irritating. 2005-12-13 Morten Welinder <terra@gnome.org> * src/parser.y (build_set): Make this function unregister argument on success like all the other build_* functions. Fix caller to not leak in error case. Fixes #301127.
Questions: - Does this explain the weird high-columned refs (IV1 etc.)? - Is this related to the crasher that I originally reported? It seems that there was a data corruption, not just a leak, although I'm happy the leak is fixed. - Isn't it possible to always remove spaces before parens, or can intersections between refs be parenthesized? Does Excel remove spaces between function names and parens?
The leak checker runs quite late in the exit phase and things like sheets probably have been deleted at that time. Therefore we get the occasional crash on exit when it tries to print a sheet name. The leak detector is so useful in catching hard-to-find bugs that we choose the risk of a crash on exit (where no data will be lost). Also, relative cell references are stored as, e.g., "the cell one column to the left" and the leak printer prints them relative to A1 because it has no idea where they lived. IV1 is therefore not unlikely. I'll have to check re " (".