GNOME Bugzilla – Bug 779792
SIGSEGV on autosave_timeout_cb
Last modified: 2018-06-29 23:55:10 UTC
Created attachment 347530 [details] Diagnostic report Seg fault occurred twice in one data-entry session. Workflow: - Business->Vendor->New Bill... - Choose vendor and date - Enter line items - Post Invoice (whilst in View Bill) - Pay Invoice (whilst in View Bill) - Close View Bill - Repeat The payment account view (MasterCard) tab was open during data entry. Here's an excerpt from the crash reports: ~/Library/Logs/DiagnosticReports/Gnucash-bin_2017-03-08-150547_ZakBook.crash ---------------------------------------------------------------------------- Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x00000000000001d5 Exception Note: EXC_CORPSE_NOTIFY Termination Signal: Segmentation fault: 11 Termination Reason: Namespace SIGNAL, Code 0xb Terminating Process: exc handler [0] VM Regions Near 0x1d5: --> __TEXT 0000000000001000-0000000000016000 [ 84K] r-x/rwx SM=COW /Applications/Gnucash 2.6.15-1/Gnucash.app/Contents/MacOS/Gnucash-bin Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libgncmod-gnome-utils.dylib 0x002d2931 gnc_set_busy_cursor + 97 1 libgncmod-gnome-utils.dylib 0x0030896a gnc_file_save + 186 2 libgncmod-gnome-utils.dylib 0x002f6c21 autosave_timeout_cb + 1009 ~/Library/Logs/DiagnosticReports/Gnucash-bin_2017-03-08-150547_ZakBook.crash ---------------------------------------------------------------------------- Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000041 Exception Note: EXC_CORPSE_NOTIFY Termination Signal: Segmentation fault: 11 Termination Reason: Namespace SIGNAL, Code 0xb Terminating Process: exc handler [0] VM Regions Near 0x41: --> __TEXT 0000000000001000-0000000000016000 [ 84K] r-x/rwx SM=COW /var/folders/*/Gnucash.app/Contents/MacOS/Gnucash-bin Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libgncmod-gnome-utils.dylib 0x002d7931 gnc_set_busy_cursor + 97 1 libgncmod-gnome-utils.dylib 0x0030d96a gnc_file_save + 186 2 libgncmod-gnome-utils.dylib 0x002fbc21 autosave_timeout_cb + 1009 : ---------------------------------------------------------------------------- Speculation: Both appear to be null-pointer issues - dereferencing sizeable structure types with offsets: 0x1d5 0x041
Not null, invalid. The only dereference in gnc_set_busy_cursor is of a node pointer in a linked list of top-level windows, so it would seem that that list is getting corrupted somewhere. Were there any windows other than the main one open at the time? Tabs don't count, just top=level windows.
From memory, no. Do these count: - New Bill: Business->Vendor->New Bill... - Process Payment: View Bill->Pay Invoice The only other windows I remember using are: - Find Bill: Business->Vendor->Find Bill... - Find Vendor: Business->Vendor->Find Vendor - Find Bill: Find Vendor->Vendor's Bills I'm pretty sure I closed these windows once I'd located the bill. At the time of the crashes, I was bashing the workflow in the original post. There were outstanding payments which I have noticed causing a "reminder" window to appear (but usually get obscured by the main window). One other (probably unrelated) point - after the first crash, I could not restart the app from the dock, used the command-line instead. Just had to do the same to flesh-out this post.
It's more likely that the invalid node pointer would be caused by closing a window. The mechanism I have in mind would be that the memory for the node is deleted but the previous node's next pointer isn't NULLed so the list iteration goes to the now invalid node and its next pointer has been reused for something else so that *that* next pointer is invalid, leading to the crash. I'm not familiar enough with the business code to know which of your candidates are dialog boxes and which are windows. I'll ask Geert to have a look, he's far more familiar with that code. If something pops up behind it's almost surely a dialog box, but it's odd that it would go behind the main window. Usually they go between the main window and the window that they apply to because they haven't had "transient_for" set on them so they get the default, which is the main window. What happens when you try to restart from the dock? Does it also fail if you launch it from a Finder window? Oh, and what's with the /var/folders/*/Gnucash.app paths in the second crash report? That's a really strange place to put an application bundle.
Ahh, the Dock issue isn't relevant! When I upgraded last (a month or more ago) I cleared out older versions and forgot to update the Dock link - so the link was pointing to a deleted version (DOH). I'm not sure where the /var/folders... come from. When I add to the dock, the file ~/Library/Preferences/com.apple.dock.plist is updated with a link to: /private/var/folders/6n/chgbg9gx1nz9_vzr7gjlzc0c0000gn/T/AppTranslocation/3949052A-523B-4888-AFEE-429B6450046A/d/Gnucash.app I guess OSX generates symlinks to the binaries - for whatever reason. (The plist is in binary format so needs: > plutil -convert xml1 ~/Library/Preferences/com.apple.dock.plist to render it readable). Anyhow, that's all off-topic now! I'm guessing all the "windows" I proposed are actually dialog boxes - but I'm sure Geert will know the facts.
> I'm guessing all the "windows" I proposed are actually dialog boxes - but I'm sure > Geert will know the facts. He didn't offhand so I'm grepping through them now.
Nope. All dialogs. But I'm mistaken about dialogs not being toplevels, so that list could get corrupted by the incorrect destruction of a dialog as well as of a window. Do you remember which step in the workflow you were in when it crashed? Is it crashing regularly so that you can easily attach a post-crash gnucash.trace (`sudo find /var/folders -name gnucash.trace` is the easiest way to find it on a Mac)?
I don't remember it being a specific step - but then it's triggered by a timeout so is asynchronous to what I was doing. During the second occurrence, I was redoing transactions I'd lost due to the first crash (3-4 cycles of the workflow) so that narrows it down to "New Bill" and "Process Payments". My original post has identical .crash filenames. That's incorrect, the two files are: ~/Library/Logs/DiagnosticReports/Gnucash-bin_2017-03-08-143401_ZakBook.crash ~/Library/Logs/DiagnosticReports/Gnucash-bin_2017-03-08-150547_ZakBook.crash From the .log files: - 14:33:05 posted a bill invoice - 14:33:18 posted a bill payment crash - 14:42:12 posted a bill invoice - 14:42:29 posted a bill payment I still have the checkpoint and log files so can try to recreate the crash using the data: *.gnucash.20170308142826.gnucash *.gnucash.20170308142827.log *.gnucash.20170308143810.log *.gnucash.20170308144548.gnucash I'll find some time over the weekend to give this a shot.
The trigger is asynchronous but the state of the toplevel list isn't. It's a list of the toplevels that you have open at the moment the function is called, plus apparently one invalid one. But I was thinking backwards. The convention for adding items to a list is to *prepend*, so if you're creating and destroying dialogs in a cycle the behavior will be like pushing and popping a stack, and the other end of the list--where the corruption is most likely because of an un-nulled next pointer--isn't going to be affected. That window should be the splash screen or the tip-of-the-day dialog, both of which are created and usually destroyed before the main window. That's not a likely scenario for an intermittent crash. That suggests that it's some other heap corruption that happens to be stomping on the list. When you try to recreate the crash, please set the following environment variables (most easily done by running from the command line and prepending them, and yes, they're really camel case): MallocLogFile=$HOME/crashmalloc.log MallocStackLoggingNoCompact=1 MallocScribble=1 MallocCorruptionAbort=1 After the crash there will be at least one stack log file in /tmp. Please attach it here along with crashmalloc.log.
No luck recreating the crash. - performed 32 cycles of the workflow - 7 checkpoint file created The GUI response was slugged by the additional instrumentation - not sure if that had an effect. Will have another stab over the weekend if time permits.
Had another crash - but not under instrumented conditions unfortunately. Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x000000000000017d Exception Note: EXC_CORPSE_NOTIFY Termination Signal: Segmentation fault: 11 Termination Reason: Namespace SIGNAL, Code 0xb Terminating Process: exc handler [0] VM Regions Near 0x17d: --> __TEXT 0000000000001000-0000000000016000 [ 84K] r-x/rwx SM=COW /var/folders/*/Gnucash.app/Contents/MacOS/Gnucash-bin Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libgncmod-gnome-utils.dylib 0x002d4931 gnc_set_busy_cursor + 97 1 libgncmod-gnome-utils.dylib 0x0030a96a gnc_file_save + 186 2 libgncmod-gnome-utils.dylib 0x002f8c21 autosave_timeout_cb + 1009 3 libglib-2.0.0.dylib 0x02eaaebc g_timeout_dispatch + 28 4 libglib-2.0.0.dylib 0x02eaa4f1 g_main_context_dispatch + 209 5 libglib-2.0.0.dylib 0x02eac72a g_main_context_iterate + 410 6 libglib-2.0.0.dylib 0x02ead7b7 g_main_loop_run + 263 7 libgtk-quartz-2.0.0.dylib 0x02674e91 gtk_main + 177 8 libgncmod-gnome-utils.dylib 0x0030fe51 gnc_ui_start_event_loop + 81 9 Gnucash-bin 0x000145b5 inner_main + 693 10 libguile-2.0.22.dylib 0x023a9cb9 invoke_main_func + 57 11 libguile-2.0.22.dylib 0x0237ddb2 c_body + 18 12 libguile-2.0.22.dylib 0x0241b164 vm_regular_engine + 27060 13 libguile-2.0.22.dylib 0x024131a2 scm_c_vm_run + 114 14 libguile-2.0.22.dylib 0x023872e1 scm_call_4 + 65 15 libguile-2.0.22.dylib 0x0240ea4b scm_catch_with_pre_unwind_handler + 91 16 libguile-2.0.22.dylib 0x0237dd7a scm_i_with_continuation_barrier + 138 17 libguile-2.0.22.dylib 0x0237de5f scm_c_with_continuation_barrier + 79 18 libguile-2.0.22.dylib 0x0240bb4b with_guile_and_parent + 203 19 libgc.1.dylib 0x0254921f GC_call_with_stack_base + 31 20 libguile-2.0.22.dylib 0x0240bbbb scm_i_with_guile_and_parent + 43 21 libguile-2.0.22.dylib 0x023a9c5a scm_boot_guile + 58 22 Gnucash-bin 0x0001419d main + 1037 23 Gnucash-bin 0x000131e6 start + 54 I had just requested a "Reports->Income & Expense->Equity Statement". No bill invoices or payments. I do leave the app running - on a laptop that sleeps when not in use - but that wouldn't have influenced the second reported crash (2017-03-08-150547). Unlikely but...could the open menu dropdown be the culprit? I was just about to or had just click on "Equity Statement" on the drop-down (can't say for sure) when the crash happened. Will bite the bullet and use instrumented configuration for future use!
I leave GnuCash running for weeks at a time with no ill effects, but I don't use the business features and I seldom run reports so I'm not likely to encounter the problem you're having. On MacOS the menus are delegated to the OS, so not only are they not toplevel windows (which they wouldn't be anyway) they're not even Gtk widgets. Remember, though, that I don't think that the problem is an incorrect list operation or improper destruction of a window. Rather I think that something else is stomping on the memory allocated for one of the list nodes, corrupting its pointers. Did you remember to preserve the gnucash.trace from the crash before restarting GnuCash?
DOH!
GnuCash bug tracking has moved to a new Bugzilla host. The new URL for this bug is https://bugs.gnucash.org/show_bug.cgi?id=779792. Please continue processing the bug there and please update any external references or bookmarks.