After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 762618 - wayland: Switching between fullscreen and unfullscreen using F11 in google-chrome causes gnome-shell to hang
wayland: Switching between fullscreen and unfullscreen using F11 in google-ch...
Status: RESOLVED OBSOLETE
Product: mutter
Classification: Core
Component: wayland
3.19.x
Other Linux
: Normal normal
: ---
Assigned To: mutter-maint
mutter-maint
Depends on:
Blocks:
 
 
Reported: 2016-02-24 14:30 UTC by Olivier Fourdan
Modified: 2021-07-05 13:51 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
annotated log of events (40.33 KB, text/plain)
2016-02-25 15:18 UTC, Olivier Fourdan
Details
annotated log of events (including clutter traces) (49.02 KB, text/plain)
2016-02-26 08:17 UTC, Olivier Fourdan
Details

Description Olivier Fourdan 2016-02-24 14:30:50 UTC
+++ This bug was initially created as a clone of Bug #762468 +++

Description

Using F11 to toggle google-chrome fullscreen and back in Wayland cause gnome-shell to hang, calling the fullscreen animations continuously.

Steps to reproduce

1. Open google-chrome
2. Press F11 to switch to fullscreen
3. Press F11 to switch back to normal

Actual result

gnome-shell becomes unresponsive and takes a lot of CPU

Expected result

google-chrome switches back to normal and gnome-shell remains usable

Additional data

Atatching gdb to the gnome-shell process shows it's taking a lot of CPU in mesa.

Thread 1 "gnome-shell" received signal SIGINT, Interrupt.
0x00007fb44b8a620e in convert_ubyte (void_dst=<optimized out>, num_dst_channels=<optimized out>, void_src=<optimized out>, src_type=<optimized out>, 
    num_src_channels=<optimized out>, swizzle=<optimized out>, normalized=true, count=1024) at main/format_utils.c:1005
1005	      SWIZZLE_CONVERT(uint8_t, uint8_t, src);
(gdb) bt
  • #0 convert_ubyte
    at main/format_utils.c line 1005
  • #1 _mesa_format_convert
    at main/format_utils.c line 354
  • #2 _mesa_GetTexSubImage_sw
    at main/texgetimage.c line 544
  • #3 _mesa_GetTexSubImage_sw
    at main/texgetimage.c line 601
  • #4 _mesa_GetTexSubImage_sw
    at main/texgetimage.c line 758
  • #5 _mesa_meta_GetTexSubImage
    at drivers/common/meta.c line 3284
  • #6 intel_get_tex_sub_image
    at intel_tex_image.c line 508
  • #7 get_texture_image
    at main/texgetimage.c line 1337
  • #8 _mesa_GetTexImage
    at main/texgetimage.c line 1405
  • #9 _cogl_texture_driver_gl_get_tex_image
    at driver/gl/gl/cogl-texture-driver-gl.c line 426
  • #10 _cogl_texture_2d_gl_get_data
    at driver/gl/cogl-texture-2d-gl.c line 751
  • #11 _cogl_texture_2d_get_data
    at cogl-texture-2d.c line 641
  • #12 texture_get_cb
    at cogl-texture.c line 989
  • #13 normalize_meta_coords_cb
    at cogl-meta-texture.c line 466
  • #14 padded_grid_repeat_cb
    at cogl-meta-texture.c line 96
  • #15 _cogl_texture_spans_foreach_in_region
    at cogl-texture.c line 1360
  • #16 create_grid_and_repeat_cb
    at cogl-meta-texture.c line 209
  • #17 _cogl_sub_texture_foreach_sub_texture_in_region
    at cogl-sub-texture.c line 173
  • #18 cogl_meta_texture_foreach_in_region
    at cogl-meta-texture.c line 604
  • #19 cogl_texture_get_data
    at cogl-texture.c line 1142
  • #20 meta_shaped_texture_get_image
    at compositor/meta-shaped-texture.c line 843
  • #21 shell_util_get_content_for_window_actor
    at shell-util.c line 448
  • #22 ffi_call_unix64
    at ../src/x86/unix64.S line 76
  • #23 ffi_call
    at ../src/x86/ffi64.c line 525
  • #24 gjs_invoke_c_function(JSContext*, Function*, JSObject*, unsigned int, jsval*, jsval*, GArgument*)
    at gi/function.cpp line 999
  • #25 function_call(JSContext*, unsigned int, jsval*)
    at gi/function.cpp line 1322
  • #26 js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct)
    at /usr/src/debug/mozjs-24.2.0/js/src/jscntxtinlines.h line 321
  • #27 js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct)
    at /usr/src/debug/mozjs-24.2.0/js/src/vm/Interpreter.cpp line 474
  • #28 js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value*, JS::Value*)
    at /usr/src/debug/mozjs-24.2.0/js/src/vm/Interpreter.cpp line 531
  • #29 js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICCall_Fallback*, uint32_t, JS::Value*, JS::MutableHandleValue)
    at /usr/src/debug/mozjs-24.2.0/js/src/jit/BaselineIC.cpp line 7007
  • #30 0x00007fb477ab1dfb in
  • #31 0x0000000000000000 in

Thiss is because shell_util_get_content_for_window_actor() from src/shell-util.c is being called continuously.

This leads us to Javascript code in _fullscreenAnimation() in js/ui/windowManager.js 

Adding traces to _sizeChangeWindow() _fullscreenWindow() _unfullscreenWindow() and _fullscreenAnimation() shows that sizeChangeWindow() is called continuously leading to fullscreen/unfulscreen, thus triggering the fullscreenAnimation continuously:

Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _unfullscreenWindow: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 3
Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _fullscreenWindow: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 2
Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _unfullscreenWindow: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 3
Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _fullscreenWindow: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:47 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 2
Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _unfullscreenWindow: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 3
Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _fullscreenWindow: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 2
Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _fullscreenAnimation: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _unfullscreenWindow: [0x564885a00bd0 MetaWindowActor]
Feb 24 15:22:46 x61 org.gnome.Shell.desktop[4684]: _sizeChangeWindow: 3

But this doesn;t seem to be coming from an actual state change because breaking on xdg_surface_set_fullscreen()/xdg_surface_unset_fullscreen() in gdb shows these are not triggered while the Javascript routines are called continuously.
Comment 1 Jonas Ådahl 2016-02-24 14:49:55 UTC
(In reply to Olivier Fourdan from comment #0)
> But this doesn;t seem to be coming from an actual state change because
> breaking on xdg_surface_set_fullscreen()/xdg_surface_unset_fullscreen() in gdb
> shows these are not triggered while the Javascript routines are called
> continuously.

Isn't google-chrome an X11 application? So it wouldn't go via xdg_surface_* functions. I don't know how X11 applications fullscreen themself, but there is the meta_window_make_fullscreen and friends that might be helpful.

Have you only seen it on google-chrome or other clients as well?
Comment 2 Olivier Fourdan 2016-02-24 15:03:30 UTC
Ah right, my bad, obviously X11 aps don't use wayland stuff, they change their state using the EWMH "protocol" (set_net_wm_state() in mutter)

Now what's interesting, because listening to events on the root window, it looks like the keypress/keyrelease event is repeated continuously:

KeyRelease event, serial 23, synthetic NO, window 0x263,
    root 0x263, subw 0x200122, time 2969205, (383,484), root:(383,484),
    state 0x0, keycode 95 (keysym 0xffc8, F11), same_screen YES,
    XLookupString gives 0 bytes: 
    XFilterEvent returns: False

KeyPress event, serial 23, synthetic NO, window 0x263,
    root 0x263, subw 0x200122, time 2969205, (383,484), root:(383,484),
    state 0x0, keycode 95 (keysym 0xffc8, F11), same_screen YES,
    XLookupString gives 0 bytes: 
    XmbLookupString gives 0 bytes: 
    XFilterEvent returns: False

KeyRelease event, serial 23, synthetic NO, window 0x263,
    root 0x263, subw 0x200122, time 2969235, (383,484), root:(383,484),
    state 0x0, keycode 95 (keysym 0xffc8, F11), same_screen YES,
    XLookupString gives 0 bytes: 
    XFilterEvent returns: False

KeyPress event, serial 23, synthetic NO, window 0x263,
    root 0x263, subw 0x200122, time 2969235, (383,484), root:(383,484),
    state 0x0, keycode 95 (keysym 0xffc8, F11), same_screen YES,
    XLookupString gives 0 bytes: 
    XmbLookupString gives 0 bytes: 
    XFilterEvent returns: False
Comment 3 Olivier Fourdan 2016-02-24 16:11:49 UTC
Same can be observed with kate using ctrl-shift-f for fullscreen, the key press/release events get repeated continuously.

Looks like an Xwayland issue.
Comment 4 Olivier Fourdan 2016-02-24 17:23:16 UTC
So here is what I think is happening:

1. Press F11 to switch to fullscreen
2. Press F11 to switch back to normal
3. The HW is pretty slow, shell_util_get_content_for_window_actor() takes a noticeable time to complete
4. the Key release is lost/eaten somehow between gnome-shell and Xwayland
5. Xwayland doesn't get the Key release and therefore performs auto-repeat
6. Xwayland generates Key press/release events sent to the application
7. the application changes its state to/from fullscreen
8. gnome-shell is too busy dealing with the state changes that queue up (as shell_util_get_content_for_window_actor takes a lot of time) and doesn't process input events (the pointer is stuck as well)
9. we're doomed, XWayland keeps sending press/release events faster than gnome-shell can deal wit hteh consequences of the app changing state ...

=> The question is why don't Xwayland get the Key release in the first place, that's what is causing the key repeat.

(hint, disabling key auto-repeat avoids the issue)
Comment 5 Jasper St. Pierre (not reading bugmail) 2016-02-24 17:25:25 UTC
Input events need to be processed with a higher priority than protocol events. I don't know if it will fix this bug, but it's the correct thing to do.
Comment 6 Olivier Fourdan 2016-02-25 15:12:30 UTC
(In reply to Jasper St. Pierre from comment #5)
> Input events need to be processed with a higher priority than protocol
> events. I don't know if it will fix this bug, but it's the correct thing to
> do.

Yes, you're right, I am now convinced there is nothing that can be done at the xserver/Xwayland level here because key release events are to be processed by the compositor first and mutter/gnome-shell is too busy dealing with wm state changes and animations that it doesn't process the key release events.
Comment 7 Olivier Fourdan 2016-02-25 15:18:56 UTC
Created attachment 322382 [details]
annotated log of events

So I added traces in gnome-shell when playing the fullscreen animations, in mutter when processing key events and in the xserver when processing xkb events, and reproduced the issue.

It clearly show what is happening:

1. once the animation takes longer than the key repeast delay, the xserver starts emitting key press events by itself (auto-repeat) until a key release evenst is received

2. That triggers even more fullscreen transitions in the client

3. more fullscreen transitions cause more animations

4. killing the client doesn't immediately solves the problem because there is a lot of events queued

5. once things settle down, mutter processed the key release events at last
Comment 8 Rui Matos 2016-02-25 18:30:21 UTC
Olivier, can you add the following printfs to clutter and re-post that log?

diff --git a/clutter/clutter-stage.c b/clutter/clutter-stage.c
index 2c67ca3..5f55190 100644
--- a/clutter/clutter-stage.c
+++ b/clutter/clutter-stage.c
@@ -1034,6 +1034,8 @@ _clutter_stage_process_queued_events (ClutterStage *stage)
       clutter_event_free (event);
     }
 
+  fprintf (stderr, "clutter stage events processed\n");
+
   g_list_free (events);
 
   g_object_unref (stage);
diff --git a/clutter/evdev/clutter-device-manager-evdev.c b/clutter/evdev/clutter-device-manager-evdev.c
index df9cd19..057aa38 100644
--- a/clutter/evdev/clutter-device-manager-evdev.c
+++ b/clutter/evdev/clutter-device-manager-evdev.c
@@ -295,6 +295,7 @@ static void
 queue_event (ClutterEvent *event)
 {
   _clutter_event_push (event, FALSE);
+  fprintf (stderr, "pushed libinput event\n");
 }
 
 static void
@@ -916,6 +917,7 @@ clutter_event_dispatch (GSource     *g_source,
 
       /* forward the event into clutter for emission etc. */
       _clutter_stage_queue_event (event->any.stage, event, FALSE);
+      fprintf (stderr, "_clutter_stage_queue_event()\n");
 
       /* update the device states *after* the event */
       event_state = seat->button_state |
Comment 9 Olivier Fourdan 2016-02-26 08:17:51 UTC
Created attachment 322440 [details]
annotated log of events (including clutter traces)

(In reply to Rui Matos from comment #8)
> Olivier, can you add the following printfs to clutter and re-post that log?

Sure, here it is.
Comment 10 Rui Matos 2016-03-02 20:43:47 UTC
(In reply to Jasper St. Pierre from comment #5)
> Input events need to be processed with a higher priority than protocol
> events. I don't know if it will fix this bug, but it's the correct thing to
> do.

Yeah, we're not doing that unfortunately. We have several event sources in our main loop:

a) libinput's fd in clutter's evdev backend
b) mutter's X frontend fd (which actually is from gdk's x11 backend)
c) mutter's wayland frontend fd
d) cogl's kms winsys fd which process drm page flips
e) random glib timeouts
f) clutter's master clock

clutter's master clock uses G_PRIORITY_HIGH_IDLE + 50 and all others use G_PRIORITY_DEFAULT. The problem is that our input event processing happens from within clutter's master clock dispatch function which given its lower priority means that it won't run as long as there are events on any other source which explains the behavior reported here.

Note that event reading from libinput just queues events into the clutter stage which wakes the main loop and thus causes the master clock source to dispatch only indirectly.

One thing we could do is process clutter input events directly from the evdev backend. But I'm not sure if clutter's internals would break due to that. Emmanuele?

Another option is of course to leave this unfixed until we merge clutter?
Comment 11 Ray Strode [halfline] 2016-03-07 16:16:04 UTC
some pared down discussion from irc:

<ofourdan> I meant Shell.util_get_content_for_window_actor() in js/ui/windowManager.js
<rtcm_> that's to get a snapshot of the window for the (un)fullscreen animation
<ofourdan> yeap
<rtcm_> the animation is basically a fade out of the window as it was before resize and a fade in of the current window's contents
<halfline> meta_shaped_texture_get_image seems a bit suspect
<halfline> it reads the texture into a cairo image surface and then does compositing with cairo 
<halfline> it looks like it was originally added to implement screenshots
<halfline> using cairo for somethign that's going to get thrown in a png makes sense
<rtcm_> I'm sure this isn't optimal. as halfline points out, the shell is re-using code that was originally meant for screenshots
<ebassi> halfline: Looking at the code, I'm suspecting the reason why it goes through cairo is because of the texture mask on top for shaped windows
<ebassi> halfline: We could create a new multi-texturing pipeline content, though
<ebassi> It does; and when drawing it does multi-texturing to blend the texture and mask together
<ebassi> So it should be possible to just extract the same thing into an object and paint it again, instead of doing GPU → CPU → GPU
<halfline> though we'll want to keep the -> cairo for screenshots
<ebassi> halfline: Sure, that has to stay there 
<ebassi> halfline: I started looking at it, but I have been busy
<halfline> okay
<ebassi> halfline: The idea was to add a MetaWindowContent that would be used by the window actor to paint itself
<ebassi> halfline: Then the code that currently returns a ClutterCanvas would return that content object
<ebassi> halfline: Which can be used by the animation, and avoid the read-back
Comment 12 Olivier Fourdan 2016-10-13 09:25:39 UTC
FWIW in Wayland, this issue is mitigated in both gtk+ and Xwayland with the following patches:

 - gtk+: https://git.gnome.org/browse/gtk+/commit/?id=b52818
 - Xwayland: https://cgit.freedesktop.org/xorg/xserver/commit/?id=88e981e
Comment 13 Jan Niklas Hasse (Account disabled) 2017-09-18 09:51:51 UTC
I think I'm running into this with Firefox.
Comment 14 GNOME Infrastructure Team 2021-07-05 13:51:54 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/mutter/-/issues/

Thank you for your understanding and your help.