GNOME Bugzilla – Bug 762468
wayland: Switching between fullscreen and unfullscreen too fast may result in a fullscreen->unfullscreen feedback loop
Last modified: 2016-03-11 12:37:09 UTC
Description Using F11 to toggle fullscreen in Wayland cause mutter to hang, wpinning on the state change continuously. Steps to reproduce 1. Open gnome-terminal 2. Press F11 Actual result gnome-terminal blinks, gnome-shell/mutter hangs, session is dead spinning continuously between fullscreen/non-fullscreen state, apparently Expected result gnome-terminal switches to fullscreen Additional data The same can be achieved with Xwayland apps as well, e.g. google-chrome
Created attachment 321870 [details] video capture
Created attachment 321871 [details] wayland_debug capture
I can't reproduce this on current master of gtk+ and mutter. Given the debug output it looks like GTK+ goes back and forth for some reason. Could you check whether the state array sent by mutter in xdg_surface.configure contains the fullscreen enum value? I wonder what happens if mutter denies a GTK+ client going fullscreen. FWIW GTK+ should not draw itself in fullscreen mode until it received a .configure event with the fullscreen enum. I suspect this is what happens although I have no idea why it would reject it. Do you have other Xwayland apps it reproduces with? I tried with Firefox, but don't have google-chrome installed.
A few more info, this happens on both F23 and rawhide (so I reckon not too far from git master), it's not Xwayland specific (gnome-terminal is Wayland native) and it's possibly a race condition as it's not 100% reproducible (seems to be easier to reproduce using multiple monitors).
Created attachment 321957 [details] GDK_DEBUG=events log This is weird, from the gdk wayland backend point of view, it behaves as if mutter was switching continuously between fullscreen and non-fulscreen. At first, gnome-terminal would switch back immediately to fullscreen all by itself (ie, it's fullscreen, press F11, it goes back to normal then switches back immediately to fullscreen), Then, after some retries, it will enter that loop switching continuously between fullscreen and non-fullscreen all by itself. I tried to come up with a simple reproducer (something a lot simpler than gnome-terminal) but failed to reproduce.
Does mutter ever send xdg_surface.configure with XDG_SURFACE_STATE_FULLSCREEN in the state array?
Yes, that message "Gdk-Message: configure, window... fullscreen" is logged in gdk/wayland/gdkwindow-wayland.c xdg_surface_configure() called from the xdg_surface_listener so I guess mutter sends us that, or is my logic flawed?
from here: https://git.gnome.org/browse/gtk+/tree/gdk/wayland/gdkwindow-wayland.c#n1111
Ah, right. No, your logic is sound :) But then again I would assume mutter is just responding to the xdg_surface.set_fullscreen and xdg_surface.unset_fullscreen GTK seems to be sending. Can you see where they come from? That would not explain why Xwayland clients do the same thing though.
Could be a gtk+ bug or a gnome-terminal bug after all... having both gnome-shell and gnome-terminal-server in gdb simultaneously, breaking on xdg_surface_set_fullscreen()/xdg_surface_unset_fullscreen() in mutter and gdk_window_fullscreen()/gdk_window_unfullscreen() in gnome-terminal-server shows that both seem to be playing some sort of ping pong game. Interestingly, in gnome-terminal-server we hit this code path:
+ Trace 235987
$7 = 0 So I suspect both mutter and gnome-terminal are replying each other switching states back and forth. Or something along these lines... Thread 1 "gnome-shell" hit Breakpoint 3, xdg_surface_set_fullscreen (client=0x55e64f807b00, resource=0x55e64f37fd90, output_resource=0x0) at wayland/meta-wayland-surface.c:1439 Thread 1 "gnome-terminal-" hit Breakpoint 2, gdk_window_unfullscreen (window=0x557485b90450 [GdkWaylandWindow]) at gdkwindow.c:10562 Thread 1 "gnome-shell" hit Breakpoint 4, xdg_surface_unset_fullscreen (client=0x55e64f807b00, resource=0x55e64f37fd90) at wayland/meta-wayland-surface.c:1447 Thread 1 "gnome-terminal-" hit Breakpoint 1, gdk_window_fullscreen (window=0x557485b90450 [GdkWaylandWindow]) at gdkwindow.c:10452 Thread 1 "gnome-shell" hit Breakpoint 3, xdg_surface_set_fullscreen (client=0x55e64f807b00, resource=0x55e64f37fd90, output_resource=0x0) at wayland/meta-wayland-surface.c:1439
Can't make this happen with gnome-terminal 3.19.2 and gtk+ 3.19.9
(In reply to Matthias Clasen from comment #11) > Can't make this happen with gnome-terminal 3.19.2 and gtk+ 3.19.9 It's on Fedora rawhide so the same versions as yours: mutter-3.19.90-1.fc24.x86_64 gnome-shell-3.19.90-1.fc24.x86_64 gtk3-3.19.9-1.fc24.x86_64 gnome-terminal-3.19.2-1.fc24.x86_64 It's a race condition somewhere, sometimes not easy to reproduce.
Note: reason I still suspect something dodgy in mutter is because nothing of the sort occurs when using weston, neither with gnome-terminal (Wayland native) nor with google-chrome (running through Xwayland), but as with any race condition, I can't be sure of anything until I can pinpoint the problem precisely...
I think the best way to trigger this is by pressing F11 again while the window animation is still playing. Repeatedly.
And I think google-chrome issue is different, gdb shows it doesn't switch back and forth between fullscreen/unfullscreen like gnome-terminal does (it's just that it ends up spending a lot of time in the same code path in mesa). So something else to investigate...
I just managed to reproduce it by holding down F11 for a while. After some debugging I think the issue is with gtk. What I see happening is that gtk is doing this: * = code run in gtk+ <- = gtk+ sends wayland request -> = gtk+ receives wayland event * window-state event received (fullscreen) <- xdg_surface.set_fullscreen * window-state event received (not fullscreen) <- xdg_surface.unset_fullscreen -> xdg_surface.configure(fullscreen) * queue window state (fullscreen) -> xdg_surface.configure(unfullscreen) * queue window state (not fullscreen) * window-state event received (fullscreen) <- xdg_surface.set_fullscreen * window-state event received (not fullscreen) <- xdg_surface.unset_fullscreen ... then it goes on. In other words, there seems to happen when both a fullscreen and an unfullscreen window-state GDK event gets queued before the other one is dispatched, causing GDK to enter a feedback loop.
Yes, a agree, even more now that I'm convinced the google-chrome is a different issue.
Created attachment 322225 [details] A set of backtraces from gtk+ I'm attaching a log of a series of breakpoint hits in gtk. I set one breakpoint in _gdk_set_window_state, which, if I understand correctly, queues the window state event. Another one in gdk_wayland_window_fullscreen, which sends the xdg_surface.set_fullscreen request A third one in gdk_wayland_window_unfullscreen, which sends the xdg_surface.unset_fullscreen request. Now, how to fix this I'm not so sure. Can we avoid doing these things asynchronously, or maybe peek if there is already another window state event in the queue and drop any earlier, when dispatching?
Moving this to gtk+ then.
what you see in the stacktraces is that we get a window state event which changes the fullscreen state, then terminal-window updates its toggle action to match, which in turn triggers the callback to call gtk_window_[un]fullscreen. Seems fine so far, since we are just calling the function for the state we already changed to, but something ought to notice this and fizzle out, probably on the mutter side.
(In reply to Matthias Clasen from comment #20) > what you see in the stacktraces is that we get a window state event which > changes the fullscreen state, then terminal-window updates its toggle action > to match, which in turn triggers the callback to call > gtk_window_[un]fullscreen. Seems fine so far, since we are just calling the > function for the state we already changed to, but something ought to notice > this and fizzle out, probably on the mutter side. What could mutter do? When it receives the make-fullscreen request, it immediately responds with "ok, go fullscreen then". Then when it receives the unmake-fullscreen request, it immediately responds with "ok, go unfullscreen then". I have a patch which "compresses" GDK_WINDOW_STATE events by dropping any previous event to the same window while updating the "changed_mask". That seems to fix the issue to me (though it triggers another bug, which I suspect is in mutter, causing the window to not display any more, and then display with the wrong size). Another possible solution I intended to try is to only allow dispatching one wayland event at once before dispatching the internal GDK event queue, but this would have a lot larger impact on things.
(In reply to Jonas Ådahl from comment #21) > > I have a patch which "compresses" GDK_WINDOW_STATE events by dropping any > previous event to the same window while updating the "changed_mask". That > seems to fix the issue to me (though it triggers another bug, which I > suspect is in mutter, causing the window to not display any more, and then > display with the wrong size). (It displays again, with the wrong size, when switching workspace back and forth once and other things)
(In reply to Jonas Ådahl from comment #21) > > Another possible solution I intended to try is to only allow dispatching one > wayland event at once before dispatching the internal GDK event queue, but > this would have a lot larger impact on things. No, this is not possible with the current available libwayland-client API. It only exposes dispatch-all.
Created attachment 322435 [details] [review] gdk: Compress window state events If there are already a window state event for a given window queued when the window state is changed, drop that event and queue a new event with a changed_mask based on the state before last event that was queue without compression.
The window-disappears is fixed by bug 762716. The incorrect window size I have yet to debug, so I don't know whether it is the same as bug 762713.
Review of attachment 322435 [details] [review]: Looks fine with me.
Jonas, I assume this is fine to commit ?
Attachment 322435 [details] pushed as 9e2207b - gdk: Compress window state events
+ _gdk_event_queue_remove_link (display, pending_event_link); + g_list_free_1 (pending_event_link); This is leaking the GdkEvent...
You need something like: (untested) a/gdk/gdkevents.c b/gdk/gdkevents.c index a094d4a..e14f9ab 100644 --- a/gdk/gdkevents.c +++ b/gdk/gdkevents.c @@ -2201,6 +2201,7 @@ _gdk_set_window_state (GdkWindow *window, { old = window->old_state; _gdk_event_queue_remove_link (display, pending_event_link); + gdk_event_free ((GdkEvent *)pending_event_link->data); g_list_free_1 (pending_event_link); } else
Oops, thanks!
Created attachment 323680 [details] [review] gdk: Don't leak discarded window state event When compressing window state events, we didn't free the discarded event after removing it from the queue, causing us to leak it. This commit makes sure to free the discarded event after unqueuing it.
(In reply to Alexander Larsson from comment #30) > + _gdk_event_queue_remove_link (display, pending_event_link); > + g_list_free_1 (pending_event_link); > > This is leaking the GdkEvent... Oops indeed. Thanks for noticing!
Attachment 323680 [details] pushed as f8bbbbf - gdk: Don't leak discarded window state event