GNOME Bugzilla – Bug 762618
wayland: Switching between fullscreen and unfullscreen using F11 in google-chrome causes gnome-shell to hang
Last modified: 2021-07-05 13:51:54 UTC
+++ This bug was initially created as a clone of Bug #762468 +++ Description Using F11 to toggle google-chrome fullscreen and back in Wayland cause gnome-shell to hang, calling the fullscreen animations continuously. Steps to reproduce 1. Open google-chrome 2. Press F11 to switch to fullscreen 3. Press F11 to switch back to normal Actual result gnome-shell becomes unresponsive and takes a lot of CPU Expected result google-chrome switches back to normal and gnome-shell remains usable Additional data Atatching gdb to the gnome-shell process shows it's taking a lot of CPU in mesa. Thread 1 "gnome-shell" received signal SIGINT, Interrupt. 0x00007fb44b8a620e in convert_ubyte (void_dst=<optimized out>, num_dst_channels=<optimized out>, void_src=<optimized out>, src_type=<optimized out>, num_src_channels=<optimized out>, swizzle=<optimized out>, normalized=true, count=1024) at main/format_utils.c:1005 1005 SWIZZLE_CONVERT(uint8_t, uint8_t, src); (gdb) bt
+ Trace 235995
Thiss is because shell_util_get_content_for_window_actor() from src/shell-util.c is being called continuously. This leads us to Javascript code in _fullscreenAnimation() in js/ui/windowManager.js Adding traces to _sizeChangeWindow() _fullscreenWindow() _unfullscreenWindow() and _fullscreenAnimation() shows that sizeChangeWindow() is called continuously leading to fullscreen/unfulscreen, thus triggering the fullscreenAnimation continuously: Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _unfullscreenWindow: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 3 Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _fullscreenWindow: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 2 Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _unfullscreenWindow: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 3 Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _fullscreenWindow: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 2 Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _unfullscreenWindow: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 3 Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _fullscreenWindow: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 2 Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _unfullscreenWindow: [0x564885a00bd0 MetaWindowActor] Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 3 But this doesn;t seem to be coming from an actual state change because breaking on xdg_surface_set_fullscreen()/xdg_surface_unset_fullscreen() in gdb shows these are not triggered while the Javascript routines are called continuously.
(In reply to Olivier Fourdan from comment #0) > But this doesn;t seem to be coming from an actual state change because > breaking on xdg_surface_set_fullscreen()/xdg_surface_unset_fullscreen() in gdb > shows these are not triggered while the Javascript routines are called > continuously. Isn't google-chrome an X11 application? So it wouldn't go via xdg_surface_* functions. I don't know how X11 applications fullscreen themself, but there is the meta_window_make_fullscreen and friends that might be helpful. Have you only seen it on google-chrome or other clients as well?
Ah right, my bad, obviously X11 aps don't use wayland stuff, they change their state using the EWMH "protocol" (set_net_wm_state() in mutter) Now what's interesting, because listening to events on the root window, it looks like the keypress/keyrelease event is repeated continuously: KeyRelease event, serial 23, synthetic NO, window 0x263, root 0x263, subw 0x200122, time 2969205, (383,484), root:(383,484), state 0x0, keycode 95 (keysym 0xffc8, F11), same_screen YES, XLookupString gives 0 bytes: XFilterEvent returns: False KeyPress event, serial 23, synthetic NO, window 0x263, root 0x263, subw 0x200122, time 2969205, (383,484), root:(383,484), state 0x0, keycode 95 (keysym 0xffc8, F11), same_screen YES, XLookupString gives 0 bytes: XmbLookupString gives 0 bytes: XFilterEvent returns: False KeyRelease event, serial 23, synthetic NO, window 0x263, root 0x263, subw 0x200122, time 2969235, (383,484), root:(383,484), state 0x0, keycode 95 (keysym 0xffc8, F11), same_screen YES, XLookupString gives 0 bytes: XFilterEvent returns: False KeyPress event, serial 23, synthetic NO, window 0x263, root 0x263, subw 0x200122, time 2969235, (383,484), root:(383,484), state 0x0, keycode 95 (keysym 0xffc8, F11), same_screen YES, XLookupString gives 0 bytes: XmbLookupString gives 0 bytes: XFilterEvent returns: False
Same can be observed with kate using ctrl-shift-f for fullscreen, the key press/release events get repeated continuously. Looks like an Xwayland issue.
So here is what I think is happening: 1. Press F11 to switch to fullscreen 2. Press F11 to switch back to normal 3. The HW is pretty slow, shell_util_get_content_for_window_actor() takes a noticeable time to complete 4. the Key release is lost/eaten somehow between gnome-shell and Xwayland 5. Xwayland doesn't get the Key release and therefore performs auto-repeat 6. Xwayland generates Key press/release events sent to the application 7. the application changes its state to/from fullscreen 8. gnome-shell is too busy dealing with the state changes that queue up (as shell_util_get_content_for_window_actor takes a lot of time) and doesn't process input events (the pointer is stuck as well) 9. we're doomed, XWayland keeps sending press/release events faster than gnome-shell can deal wit hteh consequences of the app changing state ... => The question is why don't Xwayland get the Key release in the first place, that's what is causing the key repeat. (hint, disabling key auto-repeat avoids the issue)
Input events need to be processed with a higher priority than protocol events. I don't know if it will fix this bug, but it's the correct thing to do.
(In reply to Jasper St. Pierre from comment #5) > Input events need to be processed with a higher priority than protocol > events. I don't know if it will fix this bug, but it's the correct thing to > do. Yes, you're right, I am now convinced there is nothing that can be done at the xserver/Xwayland level here because key release events are to be processed by the compositor first and mutter/gnome-shell is too busy dealing with wm state changes and animations that it doesn't process the key release events.
Created attachment 322382 [details] annotated log of events So I added traces in gnome-shell when playing the fullscreen animations, in mutter when processing key events and in the xserver when processing xkb events, and reproduced the issue. It clearly show what is happening: 1. once the animation takes longer than the key repeast delay, the xserver starts emitting key press events by itself (auto-repeat) until a key release evenst is received 2. That triggers even more fullscreen transitions in the client 3. more fullscreen transitions cause more animations 4. killing the client doesn't immediately solves the problem because there is a lot of events queued 5. once things settle down, mutter processed the key release events at last
Olivier, can you add the following printfs to clutter and re-post that log? diff --git a/clutter/clutter-stage.c b/clutter/clutter-stage.c index 2c67ca3..5f55190 100644 --- a/clutter/clutter-stage.c +++ b/clutter/clutter-stage.c @@ -1034,6 +1034,8 @@ _clutter_stage_process_queued_events (ClutterStage *stage) clutter_event_free (event); } + fprintf (stderr, "clutter stage events processed\n"); + g_list_free (events); g_object_unref (stage); diff --git a/clutter/evdev/clutter-device-manager-evdev.c b/clutter/evdev/clutter-device-manager-evdev.c index df9cd19..057aa38 100644 --- a/clutter/evdev/clutter-device-manager-evdev.c +++ b/clutter/evdev/clutter-device-manager-evdev.c @@ -295,6 +295,7 @@ static void queue_event (ClutterEvent *event) { _clutter_event_push (event, FALSE); + fprintf (stderr, "pushed libinput event\n"); } static void @@ -916,6 +917,7 @@ clutter_event_dispatch (GSource *g_source, /* forward the event into clutter for emission etc. */ _clutter_stage_queue_event (event->any.stage, event, FALSE); + fprintf (stderr, "_clutter_stage_queue_event()\n"); /* update the device states *after* the event */ event_state = seat->button_state |
Created attachment 322440 [details] annotated log of events (including clutter traces) (In reply to Rui Matos from comment #8) > Olivier, can you add the following printfs to clutter and re-post that log? Sure, here it is.
(In reply to Jasper St. Pierre from comment #5) > Input events need to be processed with a higher priority than protocol > events. I don't know if it will fix this bug, but it's the correct thing to > do. Yeah, we're not doing that unfortunately. We have several event sources in our main loop: a) libinput's fd in clutter's evdev backend b) mutter's X frontend fd (which actually is from gdk's x11 backend) c) mutter's wayland frontend fd d) cogl's kms winsys fd which process drm page flips e) random glib timeouts f) clutter's master clock clutter's master clock uses G_PRIORITY_HIGH_IDLE + 50 and all others use G_PRIORITY_DEFAULT. The problem is that our input event processing happens from within clutter's master clock dispatch function which given its lower priority means that it won't run as long as there are events on any other source which explains the behavior reported here. Note that event reading from libinput just queues events into the clutter stage which wakes the main loop and thus causes the master clock source to dispatch only indirectly. One thing we could do is process clutter input events directly from the evdev backend. But I'm not sure if clutter's internals would break due to that. Emmanuele? Another option is of course to leave this unfixed until we merge clutter?
some pared down discussion from irc: <ofourdan> I meant Shell.util_get_content_for_window_actor() in js/ui/windowManager.js <rtcm_> that's to get a snapshot of the window for the (un)fullscreen animation <ofourdan> yeap <rtcm_> the animation is basically a fade out of the window as it was before resize and a fade in of the current window's contents <halfline> meta_shaped_texture_get_image seems a bit suspect <halfline> it reads the texture into a cairo image surface and then does compositing with cairo <halfline> it looks like it was originally added to implement screenshots <halfline> using cairo for somethign that's going to get thrown in a png makes sense <rtcm_> I'm sure this isn't optimal. as halfline points out, the shell is re-using code that was originally meant for screenshots <ebassi> halfline: Looking at the code, I'm suspecting the reason why it goes through cairo is because of the texture mask on top for shaped windows <ebassi> halfline: We could create a new multi-texturing pipeline content, though <ebassi> It does; and when drawing it does multi-texturing to blend the texture and mask together <ebassi> So it should be possible to just extract the same thing into an object and paint it again, instead of doing GPU → CPU → GPU <halfline> though we'll want to keep the -> cairo for screenshots <ebassi> halfline: Sure, that has to stay there <ebassi> halfline: I started looking at it, but I have been busy <halfline> okay <ebassi> halfline: The idea was to add a MetaWindowContent that would be used by the window actor to paint itself <ebassi> halfline: Then the code that currently returns a ClutterCanvas would return that content object <ebassi> halfline: Which can be used by the animation, and avoid the read-back
FWIW in Wayland, this issue is mitigated in both gtk+ and Xwayland with the following patches: - gtk+: https://git.gnome.org/browse/gtk+/commit/?id=b52818 - Xwayland: https://cgit.freedesktop.org/xorg/xserver/commit/?id=88e981e
I think I'm running into this with Firefox.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/mutter/-/issues/ Thank you for your understanding and your help.