GNOME Bugzilla – Bug 690118
Crash when closing last tab of a window using Ctrl-w
Last modified: 2013-01-17 14:00:36 UTC
Program received signal SIGSEGV, Segmentation fault. invalid_closure_notify (instance=0x8f2a80, closure=0x7b70a0) at gsignal.c:3741 3741 handler_unref_R (signal_id, instance, handler);
+ Trace 231287
Created attachment 231408 [details] [review] Patch
Created attachment 231410 [details] [review] Misc mangling of dispose for the lols
FWIW, it seems to be unrelated to bug 689919.
Review of attachment 231408 [details] [review]: ::: src/ephy-window.c @@ +3486,3 @@ + gtk_window_remove_accel_group (GTK_WINDOW (window), + gtk_ui_manager_get_accel_group (priv->manager)); + Why do we need to do this at all? Shouldn't GTK+ take care of it?
I'll move this to GTK+. It's silly to think that all apps adding a accel group need to manually remove it in dispose. If anything, this should be done in gtk_window_dispose().
Looks like I'm wrong, and this is indeed another instance of bug 689919. I was bisecting gtk+ with no success, then I tried going back to before the closure changes in glib, and it stopped crashing.
And yeah, by reverting all of the closure invalidation commits up to and including "[gsignal] disconnect invalidated closures" I get rid of the crash (for that only desrt commits need to be reverted) and assertions.
*** Bug 691306 has been marked as a duplicate of this bug. ***
*** Bug 689919 has been marked as a duplicate of this bug. ***
Ryan? People keep reporting this bug and it seems that it is indeed caused by a patch of yours.
Note that the crash goes away if one removes : handler->has_invalid_closure_notify = 0; in glib invalid_closure_notify and uses remove_invalid_closure_notify instead. But critical warnings ensue (which are not visible in the UI , ie no crash still references leaks. (epiphany:19148): GLib-GObject-CRITICAL **: handler_unref_R: assertion `handler->ref_count > 0' failed and a: (epiphany:19148): GLib-GObject-WARNING **: /home/prahal/checkout/gnome/glib/gobject/gclosure.c:697: unable to remove uninstalled invalidation notifier: 0x7ffff1025dad (0x7fff7801f000) I made a rough summary of my workaroud in bug 691306 . Though I admit my patches was not much better than a simple workaround. Also if the accelgroup is removed before the ui manager is finalized in ephy_window_dispose the issue will be hidden the same way as with moving the ui manager finalize in ephy_window_finalize (the accelgroup will be removed before ui manager fnalization). My quest ended at ui manager finalization which ends up removing the signal handler for accel "<Primary>w" at (gdb) bt
+ Trace 231354
I will give it another shot. Thanks for the hint that "references have to be dropped in dispose, not finalize". The fact that it crashes with the glib patch: commit a36028f386708bb10d7c3817b5da5201777dc16c Author: Ryan Lortie <desrt@desrt.ca> Date: Sat Oct 13 12:16:32 2012 -0400 gsignal: really fix closure invalidation Commit 66b0d95f0ba1939882368b47b01f93289c42ae07 missed this part of the patch. diff --git a/gobject/gsignal.c b/gobject/gsignal.c index 70167af..de95fcb 100644 --- a/gobject/gsignal.c +++ b/gobject/gsignal.c @@ -3737,6 +3737,7 @@ invalid_closure_notify (gpointer instance, handler = handler_lookup (instance, 0, closure, &signal_id); /* GClosure removes our notifier when we're done */ + handler->has_invalid_closure_notify = 0; handler_unref_R (signal_id, instance, handler); SIGNAL_UNLOCK (); is easely fixed by checking the return value from handler_lookup though this would really be a harsh workaround and render any attempt to fix the reference count issue of the handler quite an achievement .
Ref counting steps leading to crash 1) Breakpoint 13, g_signal_emit (instance=0x7fff782a1180, signal_id=220, detail=2375) at /home/prahal/checkout/gnome/glib/gobject/gsignal.c:3370 3370 va_start (var_args, detail); (gdb) bt
+ Trace 231356
Created attachment 233004 [details] refcounting steps before the free and NULL dereference bugzilla seems to cleanup this too much so here is the intial paste non mangled.
The commit introducing the crash fixes an obviously-correct testcase failure and I don't believe it's causing a true regression in this case (ie: it's really just upgrading a critical to an outright segfault -- the code is clearly broken either way). I still suspect the real bug is in Gtk (UI manager?) somewhere but I don't dismiss the idea of it being in GLib. I'd absolutely love a simple testcase that demonstrates it.
Created attachment 233019 [details] group refcount lifecycle the closure is attached to The closure is invalidated when the group is finalized that is when gtk_ui_manager_finalize (ephy_window_dispose). The previous "233004: refcounting steps before the free and NULL dereference" backtraces shows invalidation of this group :
+ Trace 231357
I still do not know where the accel is attached to the gtkactiongroup and gtkaction end of life.
Well sorry again for the bugzilla not fu to me. "The closure is invalidated when the group is finalized that is when gtk_ui_manager_finalize (ephy_window_dispose). The previous "233004: refcounting steps before the free and NULL dereference" backtraces shows invalidation of this group . leads to call to the invalidate notifiers (step "5)") when the action is finalized in remove_action (gtkactiongroup.c). And step "6)" when the gtkactiongroup is finalized in signal_emit_unlocked_R. - The latter is an artefact demonstrated in the attached gdb session accel, ie the actiongroup is in fact unref in the gtk ui manager finalization once but as _gtk_action_emit_activate took a reference on it before gtk ui manager was finalized, it ends up freed when _gtk_action_emit_activate remove this reference. As _gtk_action_emit_activate is called from closure_accel_activate."
step 5 of 233004: refcounting steps before the free and NULL dereference. was awkward as the closure was not paired with the handler which was unrefed. It turns out that: 3729 static void 3730 invalid_closure_notify (gpointer instance, 3731 GClosure *closure) 3732 { 3733 Handler *handler; 3734 guint signal_id; 3735 3736 SIGNAL_LOCK (); (gdb) l 3737 3738 handler = handler_lookup (instance, 0, closure, &signal_id); 3739 /* GClosure removes our notifier when we're done */ 3740 handler->has_invalid_closure_notify = 0; 3741 //remove_invalid_closure_notify (handler, instance); 3742 handler_unref_R (signal_id, instance, handler); 3743 3744 SIGNAL_UNLOCK (); 3745 } 3746 as previous g_signal_handler_disconnect set the sequential_number of the handler to 0 , handler_lookup matches our handler even for non matching closure when passed the handler_id 0 in invalid_closure_notify. This fixes the segfault but still leaves two critical messages: (epiphany:7280): GLib-GObject-CRITICAL **: handler_unref_R: assertion `handler->ref_count > 0' failed
Created attachment 233629 [details] small demo program Here is something like the smallest possible use of GtkUIManager that demonstrates the issue.
Created attachment 233639 [details] [review] gtkaccelgroup: ref the action and actiongroup before activating Fixes the two : (epiphany:7280): GLib-GObject-CRITICAL **: handler_unref_R: assertion `handler->ref_count > 0' failed left after fixing the segfault in glib via a hack, that I will follow. This gtk+ fix looks clean while the glib one I will attach is not.
Created attachment 233641 [details] [review] signal: do not set the sequential number to 0 on disconnect. the glib side that fix the segfault but leaves crtical warnings. Crude hack it seems as there might be code that depends on this sequential num equal 0 meaning handler disconnected. Sadly invalid_closure_notify also use it as meaning invalid handler, only select handlers on closure equality. When disconnected a handler matches always in the invalid_closure_notify call to handler_lookup.
Created attachment 233647 [details] [review] gsignal: fix closure invalidation code This is the bug that has been causing segfaults and criticals when accel keys are used to close windows via GtkUIManager. The main cause of this problem was a mistake made in the original patch when modifying the handler_lookup() to take the extra 'closure' parameter. The original check used was: if (handler->sequential_number == handler_id || (closure && handler->closure == closure)) It was called to find a particular closure like so: handler_lookup (instance, 0, closure, &signal_id); The problem is that the check will return if either the signal ID or closure matches (if a closure was given). The calling code assumes 0 to be an invalid signal ID which will match no handlers, but unfortunately the rest of gsignal code uses this to denote a signal that has already been disconnected. The result is that this function was searching for a matching closure _or_ the first already-disconnected handler. When it found the already-disconnected handler, we'd get criticals and crashes. The condition has been corrected; it now ignores the handler_id parameter if the closure parameter is non-NULL. While we're in here, change the lifecycle of the invalidation notify to be easier to understand. Before, the notify was removed when the last reference on the handler dropped. This could happen in very many situations; often at the end of an emission. Instead, we now tie the registration of the notifier to the lifecycle of the signal connection. When the signal is disconnected we remove the notification, even if other references are held (eg: because it is currently being dispatched).
Comment on attachment 233647 [details] [review] gsignal: fix closure invalidation code Many thanks Ryan! That seems to do the trick very nicely.
The Ubuntu guys tried an upload of glib with this patch and ran it against the autopkgtests. It fixed all of the known issues and didn't introduce any new ones. Good enough for me. Attachment 233647 [details] pushed as d89fb7b - gsignal: fix closure invalidation code