GNOME Bugzilla – Bug 745734
Closing an app may crash gnome wayland session
Last modified: 2015-03-16 17:06:38 UTC
QXL driver, continuous 20150306.2 gnome-shell 3.15.91-12-g986b14fb1b0d70d01adef44e5358ec72407f1a93 mutter 3.15.91-4-g94c3c8f412ce997f1ff5fadb6f9a78f7ba5891c3 Start gda-browser-5.0 and click Cancel button / Quit from gapp menu
+ Trace 234792
Thread 12 (Thread 0x7f4d67fff700 (LWP 1738))
Thread 1 (Thread 0x7f4daaffb980 (LWP 1435))
Other affected apps: nm-connection-editor, epiphany, clocks Unaffected: gpk-application
Looks like it could be a regression from https://bugzilla.gnome.org/show_bug.cgi?id=745163
Hmm, this was amusing. Bug #745163 is not the cause, but apparently introduced an almost sure crash in this situation. What's happening here is that mutter receives wl_client_destroy(), this call attempts to destroy all client resources, although in an unordered manner. When this results in the wl_surface being destroyed before any role, the still living resources are left with a dangling pointer as user data. I'm attaching a patch preparing mutter for that
Created attachment 298986 [details] [review] wayland: Protect against unordered destruction of surface resources If the wl_surface resource happens to be destroyed before any other role resource, the destructor for the latter will attempt to access/modify random memory. Fix this by ensuring the resources pending destruction are left with NULL data after freeing the MetaWaylandSurface, and preparing the destructors for this situation.
Review of attachment 298986 [details] [review]: ::: src/wayland/meta-wayland-surface.c @@ +1625,3 @@ + if (!surface) + return; Looks like a couple of these (subsurface and popup, at least) will leak this way. I think a better method for handling this situation would be to add destroy listeners to surface extensions and handle destruction properly even in this cases.
(In reply to Jonas Ådahl from comment #5) > Review of attachment 298986 [details] [review] [review]: > > ::: src/wayland/meta-wayland-surface.c > @@ +1625,3 @@ > > + if (!surface) > + return; > > Looks like a couple of these (subsurface and popup, at least) will leak this > way. I think a better method for handling this situation would be to add > destroy listeners to surface extensions and handle destruction properly even > in this cases. Oops right, I was expecting further resource destruction to eventually tear down the parent surface and free all child related data. But eg. subsurfaces indeed have additional data that would be left there. I guess that approach would also prepare for misbehaving clients, even if unintended.
Created attachment 299111 [details] [review] I tried the destructor callback approach, but the extra wl_listeners and cross-checking made things somewhat complex. I realized then that we can just destroy the resources (and their data with it) there, AFAICS the resources will be removed before wl_client_destroy() gets to these. With this, if some client destroys the wl_surface before its role, it will surely get an "invalid object" error when destroying the role. Docs generally state "always destroy foo_surface before its wl_surface", so we're doing nothing outside the book here. -- wayland: Protect against unordered destruction of surface resources If the wl_surface resource happens to be destroyed before any other role resource, the destructor for the latter will attempt to access/modify random memory. Fix this by ensuring the associated resources are destroyed on the wl_surface destructor, this will free all associated memory and remove the resources ahead of their imminent destruction.
FWIW, what triggers the bug is: - wl_client_destroy() destroys resources, although not in any special order - wl_surface_destructor() happens to be called before any other role destructor, the MetaWaylandSurface is freed, but the other resources outlive the wl_subsurface/data destruction - When the role that defines this wl_surface is freed, it tries to access the MetaWaylandSurface struct (set as its user data, but not owned by it), and unset the relevant fields from the struct. As the MetaWaylandSurface is already freed, this results in invalid reads/writes, crashes... The patch in comment #7 ensures that the server-side resource for the surface role is destroyed together in wl_surface_destructor(), this ensures xdg_surface_destructor/wl_subsurface_destructor/etc are called before the MetaWaylandSurface is freed. This would happen to cause "invalid object" errors if clients are just destroying things in the wrong order, but AFAICT all surface roles recommend to destroy the role before its supporting wl_surface, so it wouldn't be mutter which started the misbehavior anyway.
The patch is now on master