After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 790932 - [wayland] wl_seat capabilities is racy and can cause a crash in the clients
[wayland] wl_seat capabilities is racy and can cause a crash in the clients
Status: RESOLVED OBSOLETE
Product: mutter
Classification: Core
Component: wayland
3.26.x
Other Linux
: Normal normal
: ---
Assigned To: mutter-maint
mutter-maint
https://bugzilla.redhat.com/show_bug....
Depends on:
Blocks:
 
 
Reported: 2017-11-28 09:52 UTC by Olivier Fourdan
Modified: 2021-07-05 13:44 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[PATCH] wayland: avoid a race in wl_seat capabilities (7.19 KB, patch)
2017-11-28 09:57 UTC, Olivier Fourdan
none Details | Review
[PATCH] wayland: avoid a race in wl_seat capabilities (6.88 KB, patch)
2017-12-14 09:55 UTC, Olivier Fourdan
none Details | Review

Description Olivier Fourdan 2017-11-28 09:52:30 UTC
Description

The way wl_seat capabilities work, by notifying clients of capabilities changes, and clients consequently requesting the relevant interface objects (pointer, keyboard, touch) is inherently racy.

On quick VT chnages for example, capabilities on the seat will be added and removed, an by the time the client receives the capability notification and requests the relevant keyboard, pointer or touch, another VT switch might have occured and the wl_pointer, wl_keyboard or wl_touch already destroyed, leading to a protocol error which kilsl the client (and when that client is Xwayland, it also kill gnome-shell/mutter).

To avoid this, we need to return a wl_keyboard, wl_pointer or wl_touch when requested even when the capabilities are not present.

This is what westons does.

Steps to reproduce:

1. log on GNOME on Wayland
2. during the GNOME startup, switch VTs back and forth.

References:

This issues was initially reported downstream in https://bugzilla.redhat.com/show_bug.cgi?id=1516859

Pekka pointed out out to avoid this issue in a reply to this thread https://lists.x.org/archives/xorg-devel/2017-November/055340.html

Relevant code in weston as pointed out by Pekka: https://cgit.freedesktop.org/wayland/weston/tree/libweston/input.c?id=3.0.0#n2342
Comment 1 Olivier Fourdan 2017-11-28 09:57:29 UTC
Created attachment 364549 [details] [review]
[PATCH] wayland: avoid a race in wl_seat capabilities

The way wl_seat capabilities work, by notifying clients of capabilities
changes, and clients consequently requesting the relevant interface
objects (pointer, keyboard, touch) is inherently racy.

On quick VT changes for example, capabilities on the seat will be added
and removed, and by the time the client receives the capability change
notification and requests the relevant keyboard, pointer or touch,
another VT switch might have occurred and the wl_pointer, wl_keyboard or
wl_touch already destroyed, leading to a protocol error which kills the
client.

To avoid this, we need to return a wl_keyboard, wl_pointer or wl_touch
when requested even when the capabilities are not present.
Comment 2 Jonas Ådahl 2017-12-11 08:46:31 UTC
Review of attachment 364549 [details] [review]:

::: src/wayland/meta-wayland-keyboard.c
@@ +997,3 @@
+   */
+  if (keyboard->xkb_info.keymap_fd < 0)
+    return;

Adding a _is_enabled() that checks this would make the early-out self explanatory IMHO. The comment is slightly missleading. Sure it's about the compositor sending a new capabilies mask, but it's really about quickly *removing* capabilities. I suggest something like

/*
 * Create an inert object if the corresponding capability was removed since
 * the client created the resource.
 */

::: src/wayland/meta-wayland-pointer.c
@@ +484,3 @@
+    pointer->pointer_clients =
+      g_hash_table_new_full (NULL, NULL, NULL,
+                             (GDestroyNotify) meta_wayland_pointer_client_free);

I don't think this is correct. The pointer client is used for managing focus of a clients pointer (including all the extensions) but resources created in the race this should not result in a pointer client being created (as that'd mean code incorrectly would think there is actually real resource(s) that should be focused).

Instead, the table should be destroyed and created as before, but with the following other changes:

1. Only ensure the pointer client when creating an active wl_pointer resource
2. Add a meta_wayland_pointer_is_inert()
3. Handle the race in the extensions, and create inert resources the above function returns TRUE

::: src/wayland/meta-wayland-touch.c
@@ +566,3 @@
+   * well...), but rather create the resource which will be destroyed
+   * shortly after anyway as the client catches up on capabilities...
+   */

I think a comment like I suggested is clearer, and it should be turned into an early-out before adding the resource to the resource list.
Comment 3 Olivier Fourdan 2017-12-12 15:43:09 UTC
(In reply to Jonas Ådahl from comment #2)
> Review of attachment 364549 [details] [review] [review]:
> [...]
> 
> ::: src/wayland/meta-wayland-pointer.c
> @@ +484,3 @@
> +    pointer->pointer_clients =
> +      g_hash_table_new_full (NULL, NULL, NULL,
> +                             (GDestroyNotify)
> meta_wayland_pointer_client_free);
> 
> I don't think this is correct. The pointer client is used for managing focus
> of a clients pointer (including all the extensions) but resources created in
> the race this should not result in a pointer client being created (as that'd
> mean code incorrectly would think there is actually real resource(s) that
> should be focused).
> 
> Instead, the table should be destroyed and created as before, but with the
> following other changes:
> 
> 1. Only ensure the pointer client when creating an active wl_pointer resource
> 2. Add a meta_wayland_pointer_is_inert()
> 3. Handle the race in the extensions, and create inert resources the above
> function returns TRUE

Sorry I am a bit lost here, “pointer->pointer_clients” is a wayland only API, I don't see it being used anywhere outside of meta-wayland-pointer.[ch], therfore I don;t undersatnd how could gnome-shell extension make use of that?

We can avoid creating an empty hash (i.e. leaving the code as it was) and check for the pointer->pointer_clients being non-NULL (as did commit b64b1591 before), but I do not understand what you mean in point #2 and #3 above...
Comment 4 Jonas Ådahl 2017-12-13 02:55:35 UTC
(In reply to Olivier Fourdan from comment #3)
> (In reply to Jonas Ådahl from comment #2)
> > Review of attachment 364549 [details] [review] [review] [review]:
> > [...]
> > 
> > ::: src/wayland/meta-wayland-pointer.c
> > @@ +484,3 @@
> > +    pointer->pointer_clients =
> > +      g_hash_table_new_full (NULL, NULL, NULL,
> > +                             (GDestroyNotify)
> > meta_wayland_pointer_client_free);
> > 
> > I don't think this is correct. The pointer client is used for managing focus
> > of a clients pointer (including all the extensions) but resources created in
> > the race this should not result in a pointer client being created (as that'd
> > mean code incorrectly would think there is actually real resource(s) that
> > should be focused).
> > 
> > Instead, the table should be destroyed and created as before, but with the
> > following other changes:
> > 
> > 1. Only ensure the pointer client when creating an active wl_pointer resource
> > 2. Add a meta_wayland_pointer_is_inert()
> > 3. Handle the race in the extensions, and create inert resources the above
> > function returns TRUE
> 
> Sorry I am a bit lost here, “pointer->pointer_clients” is a wayland only
> API, I don't see it being used anywhere outside of
> meta-wayland-pointer.[ch], therfore I don;t undersatnd how could gnome-shell
> extension make use of that?
> 
> We can avoid creating an empty hash (i.e. leaving the code as it was) and
> check for the pointer->pointer_clients being non-NULL (as did commit
> b64b1591 before), but I do not understand what you mean in point #2 and #3
> above...

Sorry, was not clear here. I meant wl_pointer Wayland extensions (such as relative pointer and pointer gestures).
Comment 5 Olivier Fourdan 2017-12-14 09:55:06 UTC
Created attachment 365525 [details] [review]
[PATCH] wayland: avoid a race in wl_seat capabilities

Updated patch
Comment 6 Adam Williamson 2017-12-15 03:02:28 UTC
Note, https://bugzilla.redhat.com/show_bug.cgi?id=1523952 is another similar downstream crash report. Not totally sure if it's the same, so didn't dupe it downstream yet.
Comment 7 Adam Williamson 2018-04-12 19:35:59 UTC
Ping? Any movement on getting this merged?
Comment 8 Olivier Fourdan 2018-04-13 06:44:02 UTC
It probably fell through the cracks with gitlab migration, I'll post a new MR in gitlab.
Comment 9 Olivier Fourdan 2018-04-13 07:46:57 UTC
https://gitlab.gnome.org/GNOME/mutter/merge_requests/77
Comment 10 Olivier Fourdan 2018-04-13 08:06:49 UTC
FWIW, I doubt bug 1523952 is the same, it's not the same interface (wl_display not wl_seat) - That other bug should be already addressed upstream though.
Comment 11 GNOME Infrastructure Team 2021-07-05 13:44:38 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/mutter/-/issues/

Thank you for your understanding and your help.