GNOME Bugzilla – Bug 551038
Reports: Alt-O shortcut in report options dialog crashes app
Last modified: 2018-06-29 22:09:41 UTC
The Alt-O keyboard shortcut is equivalent to clicking "ok" in the report options dialog. Using Alt-O from within a date entry box in the report options dialog crashes the whole app with a segfault. Here is the backtrace # Program received signal SIGSEGV, Segmentation fault. #
+ Trace 206374
Thread 3062138624 (LWP 12554)
to reproduce this, open (apparently) any report. Enter a custom date range and before tabbing out of the date entry box, press Alt-O to run the report, the app will crash as above. I marked the severity as major because of the possibility of data-loss. I suspect very few people will encounter it because of the specific nature of the requirements to reproduce it. But a crasher bug is a crasher bug.
Here is what looks (to my unpracticed eye) to be a better backtrace: Program received signal SIGSEGV, Segmentation fault.
+ Trace 206382
Thread 3062294272 (LWP 13257)
hopefully that's helpful
I can reproduce this with on WinXP/mingw with the latest SVN (r17493). But the result seems a bit random. For example, one time I got a Scheme backtrace: In unknown file: ?: 0* [gnc:option-widget-changed-proc -4472901] In c:/soft/gnucash/inst/share/gnucash/scm/options.scm: 136: 1* [vector-ref -4472901 16] The GNCDateEdit widget does do some special keystroke stuff for changing dates, so that might have something to do with it. Maybe this is yet another GNCDateEdit bug I can find and fix along with the others. Pressing Alt-O from any other controls seems to work great. On the other hand, I can't reproduce this in the price editor... changing the date and pressing Alt-O there works fine. So maybe it's not the GNCDateEdit widget. The problem may actually lie somewhere in the options code.
Here's yet another gdb trace. Frames 21-24 look interesting. Program received signal SIGSEGV, Segmentation fault. 0x005c7f5c in scm_iprlist (hdr=0x5f447e "(", exp=0x4761dc0, tlr=41, port=0x3cf4888, pstate=0x47a5778) at print.c:823 823 for (; SCM_ECONSP (exp); exp = SCM_CDR (exp)) (gdb) where
+ Trace 206383
Here's an idea... may not be what's happening but I'll throw it out there anyway: 1. new date is typed in 2. without leaving the date entry box, press alt-O 3. the date entry key handler receives the key, but sees that it isn't relevant to changing the date, so it propagates it 4. the key reaches the OK button, causing gnc_options_dialog_response_cb() to get called with response=GTK_RESPONSE_OK 5. custom apply and window closing callbacks get run. 6. ASSUME these callbacks free the memory pointed to by "options" 7. because focus left the date entry box, the date entry callback gets called (the GNCDateEdit control watches the "focus-out-event" signal) 8. the date entry callback processes the date change you entered in step 1 9. the date entry callback issues the "changed" signal 10. the options dialog is listening to the "changed" signal, so its callback of gnc_option_changed_option_cb() gets called. 11. the "option" parameter passed to gnc_option_changed_option_cb() is bad, because the memory it points to was freed in step 6. 12. The infamous "unexpected results" occurs because of that bad pointer. Possible solution: process all gtk events / signals before calling the step 6 callbacks. I'll might try that and see.
Hi Charles, thanks. Now that you mention it, there was a brief scheme backtrace on the console, but I was so mixed up in getting the silly debugging symbols into guile that it slipped my mind. Specifically, I would get In unknown file: ?: 0* *only* When I enabled debugging in libguile, I no longer got that backtrace. just a straight segfault. That may account for some of the difference in my two backtraces. With what little I know of gtk, I would guess that your idea above is not far off the mark. Clearly the Alt-O is short-circuiting the callbacks associated with the date entry widget.
Re comment 5: Here we have it: the callback in step 7 frees the GNCOptionWin data structure when the users clicks OK (by pressing Alt-O) by calling gnc-options-dialog-destroy. That function frees the data structures immediately, but they shouldn't actually be freed until the dialog box window is destroyed. From line 32 of gnome-utils/gnome-utils.scm: (define (close-cb) (gnc-options-dialog-destroy optionwin) (gnc-option-db-destroy optiondb)) So I think the GNCDateEdit control is not the problem. It looks like a bug in the options dialog code. It destroys its own options data structures while there may be pending signals that will call its own callbacks that rely on that data. I'll see if I can fix it...
Well, I was on the right track. Turns out that a naughty destroy callback in the reporting system was destroying the options database before destroying the options dialog (which depends on it). So any remaining gtk events for that dialog that depend on the existence of that database are doomed.
Created attachment 118186 [details] [review] Proposed patch (first try) This patch does the trick for me. But I confess to being mostly ignorant about the depths of the reporting system, so I would prefer to have this reviewed before committing. Andrew, you know something about the reporting system (more than me anyway). Does this look alright to you?
Created attachment 118236 [details] [review] Proposed patch (second try) This is the same as the first patch, except the fix is also duplicated for the style sheet dialog (which also had the wrong order of calls.) The style sheet dialog doesn't seem to currently experience the bug, but this is probably only because it doesn't happen to have callbacks tied to widgets that generate signals in response to a "focus-out-event" (in other words, luck.)
As I understand what you're saying, the options dialog still exists, so it can still generate callbacks, but the database is gone, so those callbacks are bad. I know almost nothing about the glue between the options and the reports themselves. Really jsled is the one to know about that more than anyone, I guess. That said, based on the info you gathered from the backtrace (widget being a bad pointer), your assessment of the situation, and the fact that the patch works for you, it all lines up nicely for me. Have you been able to reproduce this crash in trunk? I tried and failed several times. I suppose I should checkout 2.2.6 and build with and confirm that the patch works... regardless though, your patch makes perfect sense to me. Destroying the db while the widgets are still there trying to call back into it is *bad*, IMO. A ps: good work Charles!
Yes, I can reproduce this in trunk. I tested at SVN revision 17493 on WinXP/mingw. I'll send Josh an email and ask what he can add here.
Fix committed as r17508. Requesting backport for 2.2.x.
Looked good to me, applied to branches/2.2 as r17574 for inclusion in GnuCash 2.2.7. Thanks a lot!
GnuCash bug tracking has moved to a new Bugzilla host. This bug has been copied to https://bugs.gnucash.org/show_bug.cgi?id=551038. Please update any external references or bookmarks.