GNOME Bugzilla – Bug 642787
MetaWindowActor has a dangling reference to its MetaWindow
Last modified: 2011-04-05 12:28:49 UTC
Occasionally get a crash when exiting the overview; here's backtrace output from one instance of a crash. running on ubuntu 10.10 + xorg-edgers packages for latest open source ati/radeon driver for a "VGA compatible controller: ATI Technologies Inc Broadway PRO [Mobility Radeon HD 5800 Series]" [Thread 0x7fffc4555700 (LWP 31140) exited] Window manager warning: Log level 8: _gather_pid_callback: assertion `data != NULL' failed [New Thread 0x7fffc4555700 (LWP 31186)] (gnome-control-center:31185): user-accounts-cc-panel-DEBUG: calling 'ListCachedUsers' (gnome-control-center:31185): user-accounts-cc-panel-WARNING **: Failed to run apg: Failed to execute child process "apg" (No such file or directory) (gnome-control-center:31185): user-accounts-cc-panel-WARNING **: Failed to load faces: Error opening directory '/home/jbarnett/gnome-shell/install/share/pixmaps/faces': No such file or directory (gnome-control-center:31185): user-accounts-cc-panel-DEBUG: Got 0 users [Thread 0x7fffc4555700 (LWP 31186) exited] Window manager warning: Log level 16: invalid uninstantiatable type `(null)' in cast to `GObject' Window manager warning: Log level 8: g_object_get_data: assertion `G_IS_OBJECT (object)' failed Window manager warning: Log level 8: g_ascii_strncasecmp: assertion `s2 != NULL' failed Program received signal SIGSEGV, Segmentation fault. 0x00007ffff1b78ae4 in ?? () from /lib/libc.so.6 (gdb) t a a bt
+ Trace 226033
Thread 1 (Thread 0x7ffff7fba960 (LWP 30578))
(gdb) call gjs_dumpstack() == Stack trace for context 0xd73450 == 0 [native frame] 1 anonymous(animate = true, actor = [0xd497c0 MetaWindowActor]) ["/home/jbarnett/gnome-shell/source/gnome-shell/js/ui/windowManager.js":272] 2 anonymous([object Object]) ["/home/jbarnett/gnome-shell/source/gnome-shell/js/ui/windowManager.js":123] 3 anonymous([object Object]) ["/home/jbarnett/gnome-shell/install/share/gjs-1.0/lang.js":110] 4 _emit(name = "showing") ["/home/jbarnett/gnome-shell/install/share/gjs-1.0/signals.js":124] 5 anonymous() ["/home/jbarnett/gnome-shell/source/gnome-shell/js/ui/overview.js":584] 6 anonymous() ["/home/jbarnett/gnome-shell/source/gnome-shell/js/ui/overview.js":513] 7 anonymous() ["/home/jbarnett/gnome-shell/source/gnome-shell/js/ui/overview.js":638] 8 anonymous([0x10c6830 ClutterRectangle], [object _private_Clutter_Event]) ["/home/jbarnett/gnome-shell/source/gnome-shell/js/ui/panel.js":863] 9 anonymous([0x10c6830 ClutterRectangle], [object _private_Clutter_Event]) ["/home/jbarnett/gnome-shell/install/share/gjs-1.0/lang.js":110] (gdb) q
It seems that actor.get_meta_window() is returning something that is not a GObject, probably a finalized one, thus making GJS believe it is a new GType (G_TYPE_INVALID, most likely), and looking up introspection info fails with a segfault. Currently MetaWindowActor does not reference its MetaWindow. It is ever possible that outside code finalizes MetaWindow without notifying MetaWindowActor? As I read the code, priv->window is only set when setting the meta-window GObject property, so it is indeed possible, if the lifetime of MetaWindowActor is not predictable enough. If that changed, would it create a reference cycle? Should it add a weak pointer / weak reference?
Joe: does this happen after using the (X) icon to close windows from the overview maybe? (In reply to comment #1) > Currently MetaWindowActor does not reference its MetaWindow. It is ever > possible that outside code finalizes MetaWindow without notifying > MetaWindowActor? It seems like if you have a non-instantaneous window-destroy animation (which we do), then during the animation, the MetaWindow will have been finalized, but the MetaWindowActor's priv->window will still point to the finalized memory. > If that changed, would it create a reference cycle? Should it add a weak > pointer / weak reference? I don't think we have to worry about reference cycles here, since the object lifetime is controlled by external events; as long as MetaWindowActor unrefs its priv->window when its destroy effect completes (rather than from dispose/finalize), then there shouldn't be a leak. (Alternatively, we could just set priv->window to NULL from meta_window_actor_destroy(), but that would probably break code elsewhere (like in the stack trace above) that assumes that actor.get_meta_window() will never return NULL.)
haven't been able to discern a pattern yet, but possible the (X) is related... will see if I can figure anything out...
(In reply to comment #2) > (Alternatively, we could just set priv->window to NULL from > meta_window_actor_destroy(), but that would probably break code elsewhere (like > in the stack trace above) that assumes that actor.get_meta_window() will never > return NULL.) Though we'd still have the issue that code in MetaWindow is likely not expecting to be called on a destroyed/unmanaged X window... e.g., if you tried to move the window it might go ahead and move a new window that the X id was recycled for. My initial instinct is that it's better to deal with that rather than have priv->window NULL: - you might want to know what window the actor *was* rather than have it be an empty shell - References to a metaWindow can be saved in JS anyways
(In reply to comment #4) > (In reply to comment #2) > > (Alternatively, we could just set priv->window to NULL from > > meta_window_actor_destroy(), but that would probably break code elsewhere (like > > in the stack trace above) that assumes that actor.get_meta_window() will never > > return NULL.) > > Though we'd still have the issue that code in MetaWindow is likely not > expecting to be called on a destroyed/unmanaged X window... e.g., if you tried > to move the window it might go ahead and move a new window that the X id was > recycled for. > > My initial instinct is that it's better to deal with that rather than have > priv->window NULL: > > - you might want to know what window the actor *was* rather than have it be an > empty shell > - References to a metaWindow can be saved in JS anyways meta_window_actor_destroy calls clutter_actor_destroy (at some point), which does g_object_run_dispose. Unless you want to unref the MetaWindow in finalize, which is the opposite of what Dan said, you need to deal with NULL windows at the JS level, or avoid accessing disposed MetaWindowActors.
Another possibility would be to get rid of the "window is destroyed but MetaWindowActor still exists" state, by actually destroying the MetaWindowActor when its MetaWindow is destroyed, but reparenting its contents out into another container first, and letting the destroy animation act on that actor (which it would destroy itself when it was done). This would involve non-MetaWindowActors in the window_group though, so it could create new kinds of lossage.
(In reply to comment #5) > > - you might want to know what window the actor *was* rather than have it be an > > empty shell > > - References to a metaWindow can be saved in JS anyways > > meta_window_actor_destroy calls clutter_actor_destroy (at some point), which > does g_object_run_dispose. Unless you want to unref the MetaWindow in finalize, > which is the opposite of what Dan said, you need to deal with NULL windows at > the JS level, or avoid accessing disposed MetaWindowActors. Well, depends on whether the MetaWindowActor has been destroyed yet or nor. My assumption was that it hadn't. After the MetaWindowActor has been destroyed, I don't think there is any interest in knowing what it pointed to.
(In reply to comment #6) > Another possibility would be to get rid of the "window is destroyed but > MetaWindowActor still exists" state, by actually destroying the MetaWindowActor > when its MetaWindow is destroyed, but reparenting its contents out into another > container first, and letting the destroy animation act on that actor (which it > would destroy itself when it was done). This would involve non-MetaWindowActors > in the window_group though, so it could create new kinds of lossage. I'd eventually like to get rid of MetaWindowActor being a group with contents in favor of it just painting itself. Already, the shadow isn't an actor - it's a non-actor object painted from meta_window_actor_paint().
(In reply to comment #7) > (In reply to comment #5) > > > - you might want to know what window the actor *was* rather than have it be an > > > empty shell > > > - References to a metaWindow can be saved in JS anyways > > > > meta_window_actor_destroy calls clutter_actor_destroy (at some point), which > > does g_object_run_dispose. Unless you want to unref the MetaWindow in finalize, > > which is the opposite of what Dan said, you need to deal with NULL windows at > > the JS level, or avoid accessing disposed MetaWindowActors. > > Well, depends on whether the MetaWindowActor has been destroyed yet or nor. My > assumption was that it hadn't. After the MetaWindowActor has been destroyed, I > don't think there is any interest in knowing what it pointed to. If you don't destroy the actor, it will continue to show. OTOH, if you don't drop all references to the actor (which is difficult to predict, in a GC'ed runtime), you either have NULL, a dangling pointer, or a pointer to an invalid MetaWindow (corresponding to a recycled XID).
I think given the current state of things, allowing the corresponding MetaWindow to be NULL is the least invasive fix here. This only matters in theory during the destroy animation, and I think it's fine if the MetaWindow is no longer available there.
Created attachment 182312 [details] [review] MetaWindowActor: Use a weak pointer for corresponding MetaWindow If the MetaWindow goes away, we'll be pointing to free'd memory if something (e.g. gnome-shell JS code) calls meta_window_actor_get_window().
Review of attachment 182312 [details] [review]: Weak reference seems the worst of both worlds ... it means that actor.meta_window works until the Javascript garbage collector runs and collects the proxy object for the window, which sounds like a recipe for unpredictable bugs. I don't think this patch would fix the crash either - it would just turn the crash into a series of Javascript backtraces that probably would keep the user from entering and leaving the overview. I attached a fix for the problem that was resulting in references destroyed actors to bug 644167 (I think, I couldn't easily reproduce this bug - it was hard to figure out the logic to figure out the exact circumstances we could end up with a destroyed window and easier top just rewrite it.) Let's see how it works to just NULL out priv-.window field when the window is unmanaged, will attach a patch to that effect, but not going to apply it immediately before the release tonight.
Created attachment 182796 [details] [review] MetaWindowActor: keep a reference to the MetaWindow In testing the NULL-out approach didn't even come close to working for windows with a destroy animation since we accessed the window to find out how to draw the shadow. We'd probably need to cache how we last drew the shadow and audit all other uses of priv->window. Here's the simple approach of keeping a strong reference that we break when the actor is destroyed, which it will reliably be at some point. === Until the actor is destroyed, we need to have access to the MetaWindow to access some fields used for painting. Keep a strong reference to the window rather than just hoping the window will not be freed.
Review of attachment 182796 [details] [review]: This seems the best approach to me, except for the part about the MetaWindow kept alive while holding a possibly invalid XID (but it's not a regression, its lifetime was dependent on GC and thus unpredictable anyway).
(In reply to comment #12) > Review of attachment 182312 [details] [review]: > > Weak reference seems the worst of both worlds ... it means that > actor.meta_window works until the Javascript garbage collector runs and > collects the proxy object for the window, which sounds like a recipe for > unpredictable bugs. I don't think this patch would fix the crash either - it > would just turn the crash into a series of Javascript backtraces that probably > would keep the user from entering and leaving the overview. (Just for future reference, my assumption was that we would start testing for null inside the JS code)
Review of attachment 182796 [details] [review]: ::: src/compositor/meta-window-actor.c @@ +542,3 @@ + g_object_ref (priv->window); + if (old_window) + g_object_unref (old_window); Why not write this as: if (priv->window) g_object_unref (priv->window); priv->window = g_value_dup_object (value);
bug 644753 may be the same thing as this?
*** Bug 644753 has been marked as a duplicate of this bug. ***
(In reply to comment #16) > Review of attachment 182796 [details] [review]: > > ::: src/compositor/meta-window-actor.c > @@ +542,3 @@ > + g_object_ref (priv->window); > + if (old_window) > + g_object_unref (old_window); > > Why not write this as: > > if (priv->window) > g_object_unref (priv->window); > priv->window = g_value_dup_object (value); Think it's good habit to be in not to write that because of refcounting problems when set back to the same value. Either something like I did or write: MetaWindow *new_window = g_value_get_object (value); if (new_window != priv->window) { if (priv->window) g_object_unref (priv->window); priv->window = g_object_ref (new_window); } are correct idioms. It doesn't matter here, of course.
Hmm, actually, your variant is fine because the gvalue holds a reference in that case.
Pushed (with suggested shortening). Though more complicated changes to NULL out the field might make some sense, I think this is good enough to close the bug. If problems come up from having a reference to an unmanaged MetaWindow we can fix later. Attachment 182796 [details] pushed as 349fb7c - MetaWindowActor: keep a reference to the MetaWindow
*** Bug 637045 has been marked as a duplicate of this bug. ***