GNOME Bugzilla – Bug 745008
Race in releasing devices on VT switch
Last modified: 2021-07-05 13:44:49 UTC
When switching VT, we can hit error if there is get_keyboard/pointer/touch pending event in queue. On VT switch all devices are released and this event then uses invalid resources (There is a bug caused by that: https://bugzilla.redhat.com/show_bug.cgi?id=1193051.) First I thought that we should not release the devices on vt switch, but it is on purpose (bug 733563). So another solution is to ignore such events. The client will get new capabilities right after get_keyboard/pointer/touch and will destroy the object without using it anyway. They do it this way in Weston, too (http://cgit.freedesktop.org/wayland/weston/tree/src/input.c#n1772)
Created attachment 297634 [details] [review] Ignore seat getters with released devices
That is not the right fix. This will just break the wayland's id logic. We must create the resource, but we cannot use invalid stuff in keyboard. The resource will be destroyed right after that, because client will get new capabilities after releasing the devices.
Created attachment 297957 [details] [review] Fix racy crash on VT switch
Review of attachment 297957 [details] [review]: Obsolete
This fix is obsolete now, the same thing was introduced in 2aa6dcd9d. Anyway, the race is still there, just now it won't crash gnome-shell, but break libwayland. There are some attempts to fix this kind of race in libwayland (http://lists.freedesktop.org/archives/wayland-devel/2015-February/020276.html).
While the protocol says that wl_seat.get_pointer/keyboard/seat will only take effect if the corresponding capability is currently set, I wonder whether it would be better to change the spec to require that the requests always succeed, but creating placeholder resources (more or less what your patch does, but for wl_touch and wl_pointer as well, with the 2aa6dcd9d reverted). Wouldn't that fix both breaking libwayland and the crash, while getting rid of the wl_seat.get_* races all together?
(In reply to Jonas Ådahl from comment #6) > While the protocol says that wl_seat.get_pointer/keyboard/seat will only > take effect if the corresponding capability is currently set, I wonder > whether it would be better to change the spec to require that the requests > always succeed, but creating placeholder resources (more or less what your I don't know if it is written somewhere, but AFAIK the object creation in wayland is supposed to always succeed when the client can not get an error back (and always succeed on client-side, with the exception of lack of memory). However, even in Weston are places that do not follow this "rule". (i. .e the get_keyboard/touch/pointer that consequently suffer from this race). > patch does, but for wl_touch and wl_pointer as well, with the 2aa6dcd9d > reverted). Wouldn't that fix both breaking libwayland and the crash, while > getting rid of the wl_seat.get_* races all together? Yes, always creating resource would fix it IMO. Actually, I'm just running mutter with similar patch applied. It is basically this: ... create resource ... + if (!keyboard->display) + { + wl_resource_set_implementation (cr, &keyboard_interface, keyboard, NULL); + return; + } + else + wl_resource_set_implementation (cr, &keyboard_interface, keyboard, unbind_resource); + Generally, the problem with this approach is in implementation. This patch works well with keyboard, because keyboard's interface has just release request that does not touch the keyboard object and only destroys the resource. With other objects (i. e. pointer) this patch is not good enough, because (pointer) has request like set_cursor that uses MetaWaylandPointer object and would use it in invalid state. I think that it is what David's patch with intact objects accounts for.
(In reply to Marek Chalupa from comment #7) > (In reply to Jonas Ådahl from comment #6) > > While the protocol says that wl_seat.get_pointer/keyboard/seat will only > > take effect if the corresponding capability is currently set, I wonder > > whether it would be better to change the spec to require that the requests > > always succeed, but creating placeholder resources (more or less what your > > I don't know if it is written somewhere, but AFAIK the object creation in > wayland is supposed to always succeed when the client can not get an error > back (and always succeed on client-side, with the exception of lack of > memory). However, even in Weston are places that do not follow this "rule". > (i. .e the get_keyboard/touch/pointer that consequently suffer from this > race). > > > patch does, but for wl_touch and wl_pointer as well, with the 2aa6dcd9d > > reverted). Wouldn't that fix both breaking libwayland and the crash, while > > getting rid of the wl_seat.get_* races all together? > > Yes, always creating resource would fix it IMO. Actually, I'm just running > mutter with similar patch applied. It is basically this: > > ... create resource ... > > + if (!keyboard->display) > + { > + wl_resource_set_implementation (cr, &keyboard_interface, keyboard, > NULL); > + return; > + } > + else > + wl_resource_set_implementation (cr, &keyboard_interface, keyboard, > unbind_resource); > + Yes, except I suppose we still would want to add the resource to keyboard->resource_list as well as keep the unbind_resource destructor function when creating the placeholder object. > > Generally, the problem with this approach is in implementation. This patch > works well with keyboard, because keyboard's interface has just release > request that does not touch the keyboard object and only destroys the > resource.With other objects (i. e. pointer) this patch is not good enough, > because (pointer) has request like set_cursor that uses MetaWaylandPointer > object and would use it in invalid state. Creating a resource that eventually would work wouldn't make the situation very much worse than it already is since the situation where wl_pointer.set_cursor won't have any effect can already occur if the client doesn't stop interacting with the wl_pointer after the pointer capability is dropped from the seat. > I think that it is what David's patch with intact objects accounts for. Hmm. The inert object patch by David, doesn't it only handle the case where the server has destroyed an object but the client is still sending events to it, meaning its only relevant in the cases which involves destructor events (like wl_callback.done if I understand it correctly), meaning it won't effect the wl_seat.get_* races.
(In reply to Jonas Ådahl from comment #8) > (In reply to Marek Chalupa from comment #7) > > (In reply to Jonas Ådahl from comment #6) > > > While the protocol says that wl_seat.get_pointer/keyboard/seat will only > > > take effect if the corresponding capability is currently set, I wonder > > > whether it would be better to change the spec to require that the requests > > > always succeed, but creating placeholder resources (more or less what your > > > > I don't know if it is written somewhere, but AFAIK the object creation in > > wayland is supposed to always succeed when the client can not get an error > > back (and always succeed on client-side, with the exception of lack of > > memory). However, even in Weston are places that do not follow this "rule". > > (i. .e the get_keyboard/touch/pointer that consequently suffer from this > > race). > > > > > patch does, but for wl_touch and wl_pointer as well, with the 2aa6dcd9d > > > reverted). Wouldn't that fix both breaking libwayland and the crash, while > > > getting rid of the wl_seat.get_* races all together? > > > > Yes, always creating resource would fix it IMO. Actually, I'm just running > > mutter with similar patch applied. It is basically this: > > > > ... create resource ... > > > > + if (!keyboard->display) > > + { > > + wl_resource_set_implementation (cr, &keyboard_interface, keyboard, > > NULL); > > + return; > > + } > > + else > > + wl_resource_set_implementation (cr, &keyboard_interface, keyboard, > > unbind_resource); > > + > > Yes, except I suppose we still would want to add the resource to > keyboard->resource_list as well as keep the unbind_resource destructor > function when creating the placeholder object. > This would need more changes to code (my patch is just fast workaround until I have something better), because once the keyboard is announced again, it initializes the resource_list, thus removing anything which is there, making unbind_resource use dangling pointers (and eventually mutter crash) > > > > Generally, the problem with this approach is in implementation. This patch > > works well with keyboard, because keyboard's interface has just release > > request that does not touch the keyboard object and only destroys the > > resource.With other objects (i. e. pointer) this patch is not good enough, > > because (pointer) has request like set_cursor that uses MetaWaylandPointer > > object and would use it in invalid state. > > Creating a resource that eventually would work wouldn't make the situation > very much worse than it already is since the situation where > wl_pointer.set_cursor won't have any effect can already occur if the client > doesn't stop interacting with the wl_pointer after the pointer capability is > dropped from the seat. > Yeah, maybe it would work, but you need to add guards to handlers, right? Client interacting with pointer after capability is dropped is basically the same race. We need to make sure that we don't free (memory of) device until client gets new capability and destroys the client-side counterpart object (thus calling release on server-side object, destroying it) to suppress this race as well that we ignore requests to such object. > > I think that it is what David's patch with intact objects accounts for. > > Hmm. The inert object patch by David, doesn't it only handle the case where > the server has destroyed an object but the client is still sending events to > it, meaning its only relevant in the cases which involves destructor events > (like wl_callback.done if I understand it correctly), meaning it won't > effect the wl_seat.get_* races. I thought that something like: if (device_is_released) create_inert_resource() else create_regular_resource() should work, shouldn't? I must confess I haven't read the patch thoroughly, but from the commit message I got the feeling this could work. I'll read it ;)
(In reply to Marek Chalupa from comment #9) > (In reply to Jonas Ådahl from comment #8) > > (In reply to Marek Chalupa from comment #7) > > > (In reply to Jonas Ådahl from comment #6) > > > > While the protocol says that wl_seat.get_pointer/keyboard/seat will only > > > > take effect if the corresponding capability is currently set, I wonder > > > > whether it would be better to change the spec to require that the requests > > > > always succeed, but creating placeholder resources (more or less what your > > > > > > I don't know if it is written somewhere, but AFAIK the object creation in > > > wayland is supposed to always succeed when the client can not get an error > > > back (and always succeed on client-side, with the exception of lack of > > > memory). However, even in Weston are places that do not follow this "rule". > > > (i. .e the get_keyboard/touch/pointer that consequently suffer from this > > > race). > > > > > > > patch does, but for wl_touch and wl_pointer as well, with the 2aa6dcd9d > > > > reverted). Wouldn't that fix both breaking libwayland and the crash, while > > > > getting rid of the wl_seat.get_* races all together? > > > > > > Yes, always creating resource would fix it IMO. Actually, I'm just running > > > mutter with similar patch applied. It is basically this: > > > > > > ... create resource ... > > > > > > + if (!keyboard->display) > > > + { > > > + wl_resource_set_implementation (cr, &keyboard_interface, keyboard, > > > NULL); > > > + return; > > > + } > > > + else > > > + wl_resource_set_implementation (cr, &keyboard_interface, keyboard, > > > unbind_resource); > > > + > > > > Yes, except I suppose we still would want to add the resource to > > keyboard->resource_list as well as keep the unbind_resource destructor > > function when creating the placeholder object. > > > > This would need more changes to code (my patch is just fast workaround until > I have something better), because once the keyboard is announced again, it > initializes the resource_list, thus removing anything which is there, making > unbind_resource use dangling pointers (and eventually mutter crash) I suppose that as long as clients release their pointers/keyboards/... when the capability goes away, and requests new ones when they come back, we could just create the objects, not set an implementation, and forget about it, and it'll work anyway. Whenever the pointer is back "for good" the client would recreate all its pointer objects and they'd work as expected. > > > > > > > Generally, the problem with this approach is in implementation. This patch > > > works well with keyboard, because keyboard's interface has just release > > > request that does not touch the keyboard object and only destroys the > > > resource.With other objects (i. e. pointer) this patch is not good enough, > > > because (pointer) has request like set_cursor that uses MetaWaylandPointer > > > object and would use it in invalid state. > > > > Creating a resource that eventually would work wouldn't make the situation > > very much worse than it already is since the situation where > > wl_pointer.set_cursor won't have any effect can already occur if the client > > doesn't stop interacting with the wl_pointer after the pointer capability is > > dropped from the seat. > > > > Yeah, maybe it would work, but you need to add guards to handlers, right? > Client interacting with pointer after capability is dropped is basically the > same race. We need to make sure that we don't free (memory of) device until > client gets new capability and destroys the client-side counterpart object > (thus calling release on server-side object, destroying it) to suppress this > race as well that we ignore requests to such object. Yes, those things would need to be adapted as well. > > > > I think that it is what David's patch with intact objects accounts for. > > > > Hmm. The inert object patch by David, doesn't it only handle the case where > > the server has destroyed an object but the client is still sending events to > > it, meaning its only relevant in the cases which involves destructor events > > (like wl_callback.done if I understand it correctly), meaning it won't > > effect the wl_seat.get_* races. > > I thought that something like: > > if (device_is_released) > create_inert_resource() > else > create_regular_resource() > > should work, shouldn't? I must confess I haven't read the patch thoroughly, > but from the commit message I got the feeling this could work. I'll read it > ;) To be honest, I didn't read the content of the patch either (would probably rather want to run it and see what it produces), but the commit message confuses me a bit since it begins with "...with object that the server has deleted but that the client is still requesting." which I thought could only happen with destructor events. But then he talks about get_pointer which is racy since the spec, weston and mutter violates the rule "object creation never fails" regarding those requests. I suppose one could change the spec to say that wl_seat.get_pointer/... always creates a an object (instead of say it only takes effect when ...) but the object my be non-functional if the capability has been removed on the server, and that the client is expected to recreate its objects when the capability comes back. Since we don't have any requests creating new objects, we can just say all requests are ineffective, but that'd probably mean that that cannot change in the future. As far as I know the wl_registry.remove_global is not a 'destructor event', and the global needs to be destroyed the way that particular global should be destroyed. Right now there is no way to destroy wl_seat so we'll always leak those no matter what in that case.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/mutter/-/issues/ Thank you for your understanding and your help.