GNOME Bugzilla – Bug 743865
MetaSurfaceActorWayland: unset the surface when it goes away
Last modified: 2015-02-23 21:25:17 UTC
In some cases we outlive the wayland surface and end up trying to access an already free'd structure resulting in crashes. This patch provides a way to unset the surface and calls it when the wayland surface gets destroyed.
Created attachment 295956 [details] [review] MetaSurfaceActorWayland: unset the surface when it goes away
There's probably a better way to deal with this but this patch fixes the following crash for me: Breakpoint 1, meta_surface_actor_get_window (self=0x320b100) at compositor/meta-surface-actor.c:363 363 return META_SURFACE_ACTOR_GET_CLASS (self)->get_window (self); (gdb) bt
+ Trace 234609
$7 = {parent = {parent_instance = {g_type_instance = {g_class = 0x2f11890}, ref_count = 2, qdata = 0x4bf6ca0}, flags = 22, private_flags = 0, priv = 0x320ae00}, priv = 0x320ade0} (gdb) s [Thread 0x7fb117ffe700 (LWP 10917) exited] [Thread 0x7fb1239c6700 (LWP 10916) exited] meta_surface_actor_wayland_get_window (actor=0x320b100) at compositor/meta-surface-actor-wayland.c:140 140 MetaSurfaceActorWaylandPrivate *priv = meta_surface_actor_wayland_get_instance_private (META_SURFACE_ACTOR_WAYLAND (actor)); (gdb) n 142 return priv->surface->window; (gdb) p priv->surface $8 = (MetaWaylandSurface *) 0x4c7dd20 (gdb) p *priv->surface $9 = {resource = 0x4c3ddd0, compositor = 0x0, surface_actor = 0xaaaaaaaaaaaaaaaa, window = 0xaaaaaaaaaaaaaaaa, buffer = 0xaaaaaaaaaaaaaaaa, buffer_destroy_listener = {link = {prev = 0xaaaaaaaaaaaaaaaa, next = 0xaaaaaaaaaaaaaaaa}, notify = 0xaaaaaaaaaaaaaaaa}, scale = -1431655766, offset_x = -1431655766, offset_y = -1431655766, subsurfaces = 0xaaaaaaaaaaaaaaaa, pending = {newly_attached = -1431655766, buffer = 0xaaaaaaaaaaaaaaaa, buffer_destroy_listener = {link = {prev = 0xaaaaaaaaaaaaaaaa, next = 0xaaaaaaaaaaaaaaaa}, notify = 0xaaaaaaaaaaaaaaaa}, dx = -1431655766, dy = -1431655766, scale = -1431655766, damage = 0xaaaaaaaaaaaaaaaa, input_region = 0xaaaaaaaaaaaaaaaa, opaque_region = 0xaaaaaaaaaaaaaaaa, frame_callback_list = {prev = 0xaaaaaaaaaaaaaaaa, next = 0xaaaaaaaaaaaaaaaa}, new_geometry = {x = -1431655766, y = -1431655766, width = -1431655766, height = -1431655766}, has_new_geometry = -1431655766}, xdg_surface = 0xaaaaaaaaaaaaaaaa, xdg_popup = 0xaaaaaaaaaaaaaaaa, wl_shell_surface = 0xaaaaaaaaaaaaaaaa, gtk_surface = 0xaaaaaaaaaaaaaaaa, wl_subsurface = 0xaaaaaaaaaaaaaaaa, xdg_shell_resource = 0xaaaaaaaaaaaaaaaa, acked_configure_serial = {set = -1431655766, value = 2863311530}, has_set_geometry = -1431655766, sub = {parent = 0xaaaaaaaaaaaaaaaa, parent_destroy_listener = {link = {prev = 0xaaaaaaaaaaaaaaaa, next = 0xaaaaaaaaaaaaaaaa}, notify = 0xaaaaaaaaaaaaaaaa}, synchronous = -1431655766, pending = {newly_attached = -1431655766, buffer = 0xaaaaaaaaaaaaaaaa, buffer_destroy_listener = {link = {prev = 0xaaaaaaaaaaaaaaaa, next = 0xaaaaaaaaaaaaaaaa}, notify = 0xaaaaaaaaaaaaaaaa}, dx = -1431655766, dy = -1431655766, scale = -1431655766, damage = 0xaaaaaaaaaaaaaaaa, input_region = 0xaaaaaaaaaaaaaaaa, opaque_region = 0xaaaaaaaaaaaaaaaa, frame_callback_list = {prev = 0xaaaaaaaaaaaaaaaa, next = 0xaaaaaaaaaaaaaaaa}, new_geometry = {x = -1431655766, y = -1431655766, width = -1431655766, height = -1431655766}, has_new_geometry = -1431655766}, pending_x = -1431655766, pending_y = -1431655766, pending_pos = -1431655766, pending_placement_ops = 0xaaaaaaaaaaaaaaaa}}
Do you have a repro?
(In reply to comment #3) > Do you have a repro? In g-c-c's Displays panel, open one of the output configuration dialogs and just dismiss it with Escape. Crashes reliably for me that way. I tried to do a standalone gtk+ test with the same kind of dialog but then it doesn't crash. Either a timing thing or I'm not mimicing exactly what g-c-c is doing.
See, the patch doesn't really make sense to me when I think about the proper protocol semantics. While I don't want to leave an open crasher out there if the client does something against protocol, we should identify what's going wrong, fix GTK+ and have mutter send out an error to the client if they use the protocol wrong.
Ok, here's what I found: - xdg_surface_destructor() and thus meta_window_unmanage() and meta_stack_tracker_queue_sync_stack() are called - wl_surface_destructor() is called and free()'s the MetaWaylandSurface - the queued meta_stack_tracker_sync_stack() is called from the main loop and emits MetaScreen::restacked which gnome-shell's windowManager.js handles by calling global.sync_pointer() which pushes a CLUTTER_MOTION event - that motion event, gets turned into a CLUTTER_LEAVE event in _clutter_input_device_set_actor() for the MetaWindowActor corresponding to the MetaWaylandSurface that's already destroyed - mutter's handling of this leave event ends up in get_window_for_event() and meta_surface_actor_get_window() for the MetaSurfaceActorWayland which is still alive since we're animating the window unmanagement and this is where we crash because MetaSurfaceActorWayland->surface is already free()'d. So, unless there's something above which we shouldn't be doing (or should do differently), we have to protect against with something like this patch or probably something more elegant which I can't think of at the moment :-)
Created attachment 296043 [details] [review] input-device: Reset the focused actor when it becomes unreactive
Review of attachment 296043 [details] [review]: ::: clutter/clutter-input-device.c @@ +686,3 @@ +{ + if (!clutter_actor_get_reactive (actor)) + _clutter_input_device_unassociate_actor (device, actor, TRUE); shouldn't this be FALSE?
Yes, it should. Nice catch.
Review of attachment 296043 [details] [review]: looks generally good, with the change Rui identified.
Just to leave some context here for future reference, here's how Jasper's patch ended up in this bug: 16:45 < Jasper> rtcm, actually, shouldn't meta_window_unmanage destroy the window actor as well, or at least make it unreactive? 16:50 < rtcm> Jasper: make it unreactive makes sense, let me try 16:51 < Jasper> rtcm, we already do 16:51 < Jasper> rtcm, meta_window_unmanage calls meta_wayland_surface_set_window, which makes it unreactive 16:51 < Jasper> rtcm, note that MetaWindowActor isn't reactive to begin with -- you're probably confusing MetaWindowActor with MetaSurfaceActorWayland 17:22 < rtcm> Jasper: it's complicated. clutter generates a leave event for the actor that was associated with the device without checking if it's reactive: https://git.gnome.org/browse/clutter/tree/clutter/clutter-input-device.c#n692 17:23 < rtcm> so maybe that's a clutter bug? 17:23 < Jasper> rtcm, I had a patch for that somewhere in my local tree
*** Bug 745046 has been marked as a duplicate of this bug. ***