GNOME Bugzilla – Bug 769835
On Wayland, application containing GtkGLArea stops responding if it's not on current workspace
Last modified: 2017-01-09 18:36:21 UTC
Created attachment 333214 [details] test application On Wayland, if the window of the application contains GtkGLArea and it's not on the current workspace, the application will not respond to D-Bus messages until it is moved to the current workspace or otherwise made visible again. Steps to reproduce: 1. Compile the test application and run it. 2. Run the application again from another terminal. The command should exit immediately, and the application should print "ACTIVATE" in the first terminal indicating that the activate signal was received. 3. Move the window of the application to a different workspace. 4. Run the application again in the second terminal. 5. Observe that no text is printed in the first terminal, and the command ran in the second terminal doesn't exit. 6. If the window of the application is moved back to the current workspace, the command in the second terminal exits and "ACTIVATE" is printed in the first terminal.
Looks like an issue inside the mesa platform support for wayland.
+ Trace 236566
This is also the source of https://bugs.webkit.org/show_bug.cgi?id=164983 , as I can reproduce that c&p behavior on gtk3-demo after opening the "OpenGL area" demo.
Also https://bugs.webkit.org/show_bug.cgi?id=164985, basically Epiphany is totally broken if you open two different windows on different workspaces. :)
robclark my guess is somehow gs doesn't respond to queries from other workspaces somehow?? mclasen__ it could certainly be the case robclark probably I'd ask someone who knows the g-s wayland side of things.. but from mesa side it looks like we are blocking on compositor
Jonas, can you have a look at this ?
Note that there is some hopefully useful debugging from Carlos Garnacho in https://bugs.webkit.org/show_bug.cgi?id=164983#c2
As far as I can tell, what happens is that after the surface has been moved to another workspace (and isn't showing anymore), it'll still attempt to draw a frame, causing EGL to swap buffers and trigger the vsync throttler (ask for a frame callback). When we then try to draw another frame (while still hidden), it'll get stuck because EGL will wait for the frame callback that won't come before continuing. Now, with mesa 13.0 and older, EGL will block on the vsync throttler (i.e. the surface frame callback) as early as when one queries any state related to the next buffer (in this case the buffer age). Commit 9ca6711faa03a9527c0946f2489e42cd9a62235c on mesa master changes this behavior to instead not block until eglSwapBuffers() but I don't think that will help us here really, since we seem to be drawing anyway thus will end up in eglSwapBuffers thus being blocked anyway. I think what we need to do is make sure we don't rely on EGL's vsync throttler, i.e. we should set the swap interval to 0 (don't throttle eglSwapBuffers), and just rely on GTK+s own throttling.
I came to the same thinking yesterday, although I also had a few realizations: - The window is however trying to redraw because of the focus out animations. If I focus the window I'm going to paste on, and move it past the webkit/gl process workspace with ctrl+shift+alt+up/down, pasting still works afterwards, presumably because during the process the window didn't have to do the focus in/out animations. - Setting eglSwapInterval(..., 0) alone doesn't entirely help, but - Commenting out the "if (!pending_commit) return;" part on on_frame_clock_after_paint() alone does. Even if mesa throttling still applies. I get this code path is only for cairo-rendering paths, so I don't quite fully get why scheduling another frame on the gdk side untangles things.
(moving back to gtk+ since this is about frame throttling getting away of unrelated IO) I think that, for whatever reason a frame is scheduled (be it because some focus-out animation or an application deliberately asking for it for whatever reason), we should still handle the situation where we actually don't want to paint because of the vsync throttling. As far as I can see, we use _gdk_frame_clock_freeze() and _gdk_frame_clock_thaw() to do the throttling, but maybe what we really should do is gdk_window_freeze_updates()/gdk_window_thaw_updates()? We handle those here and there in gdkwindow-wayland.c (the checking of window->update_freeze_count).
I don't know if amarok uses GTKGLArea. But, in Fedora 25, with Gnome-Wayland mouse stop working over any active windows/desktops. Gnome bars keeps working. After minimize and restore Amarok using letf-bottom bar, mouse come to life again in all desktops/windows.
(In reply to qbox from comment #10) > I don't know if amarok uses GTKGLArea. It doesn't. Amarok uses Qt, not GTK+. This is a GTK+ bug, so I suggest you file a new bug report.
Created attachment 342399 [details] [review] wayland: Disable EGL swap interval We have a frame clock that ensures rendering is done as per the output vsync. There is no need to have Mesa do the same for us. This, most notably, ensures Mesa doesn't schedule frame callbacks that will be left unattended if the compositor stops throttling frames for its surface, this is eg. the case if the toplevel is moved to another workspace.
Review of attachment 342399 [details] [review]: Should we maybe check the EGL_MIN_SWAP_INTERVAL and fail up front if we cannot configure the EGL context to let us handle the frame scheduling ourself?
(In reply to Jonas Ådahl from comment #13) > Review of attachment 342399 [details] [review] [review]: > > Should we maybe check the EGL_MIN_SWAP_INTERVAL and fail up front if we > cannot configure the EGL context to let us handle the frame scheduling > ourself? Yeah, probably makes sense, it seems Mesa eglSwapInterval implementations will just clamp and return true, so we might still hit this if the context can't disable frame throttling.
Created attachment 342410 [details] [review] wayland: Disable EGL swap interval We have a frame clock that ensures rendering is done as per the output vsync. There is no need to have Mesa do the same for us. This, most notably, ensures Mesa doesn't schedule frame callbacks that will be left unattended if the compositor stops throttling frames for its surface, this is eg. the case if the toplevel is moved to another workspace. Also, given a SwapInterval!=0 will always bring these unexpected side effects, check that it's possible to disable it, and error out if that isn't the case.
On X11 we already disable swap interval when running composited — and since Wayland is using a compositor by default, I think it makes complete sense to rely on the compositor signaling us when to render. The only issue I can think of is that the Wayland compositor may not be synchronised to the vertical refresh rate; or could be a very simple process that just sets up the output and then relies on the toolkit to deal with the timing of submitting frames. Having said that, I seriously doubt this is even a case GTK should contemplate.
Review of attachment 342410 [details] [review]: ::: gdk/wayland/gdkglcontext-wayland.c @@ +441,3 @@ +static gboolean +egl_config_can_disable_swap_interval (GdkDisplay *display, + const EGLConfig *config) EGLConfig is already a pointer; I don't think you need to pass it as a pointer… @@ +446,3 @@ + EGLint interval; + + if (!eglGetConfigAttrib (display_wayland->egl_display, *config, … especially since you're just dereferencing it here. @@ +483,3 @@ return NULL; + if (!egl_config_can_disable_swap_interval (display, &config)) I'm not convinced this is an appropriate action to take. If swap interval cannot be disabled then we should fall back to the existing behaviour; it's not optimal, but saying "you can't have GL because your driver does not allow disabling swap interval" seems to me far more drastic than necessary. Just save the return value inside the GdkGLContext and emit a warning or a debug message. @@ +543,3 @@ eglMakeCurrent(display_wayland->egl_display, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT); + eglSwapInterval (dpy, 0); Not sure it's needed, here; we're unbinding everything, which means there's no context and no surface to swap.
(In reply to Emmanuele Bassi (:ebassi) from comment #17) > Review of attachment 342410 [details] [review] [review]: > > ::: gdk/wayland/gdkglcontext-wayland.c > @@ +441,3 @@ > +static gboolean > +egl_config_can_disable_swap_interval (GdkDisplay *display, > + const EGLConfig *config) > > EGLConfig is already a pointer; I don't think you need to pass it as a > pointer… Hmm, I see. I was confused by the weird memory management at: https://git.gnome.org//browse/gtk+/tree/gdk/wayland/gdkglcontext-wayland.c#n410 where it seems to return stuff from a freed block, knowing that EGLConfig is a pointer itself makes sense then. > > @@ +446,3 @@ > + EGLint interval; > + > + if (!eglGetConfigAttrib (display_wayland->egl_display, *config, > > … especially since you're just dereferencing it here. > > @@ +483,3 @@ > return NULL; > > + if (!egl_config_can_disable_swap_interval (display, &config)) > > I'm not convinced this is an appropriate action to take. > > If swap interval cannot be disabled then we should fall back to the existing > behaviour; it's not optimal, but saying "you can't have GL because your > driver does not allow disabling swap interval" seems to me far more drastic > than necessary. > > Just save the return value inside the GdkGLContext and emit a warning or a > debug message. Sure, question is when it's more appropriate to warn then. Before it locks presumably... > > @@ +543,3 @@ > eglMakeCurrent(display_wayland->egl_display, EGL_NO_SURFACE, > EGL_NO_SURFACE, > EGL_NO_CONTEXT); > + eglSwapInterval (dpy, 0); > > Not sure it's needed, here; we're unbinding everything, which means there's > no context and no surface to swap. Right, I was unsure about this one. Thanks for the review!
Created attachment 342424 [details] [review] wayland: Disable EGL swap interval We have a frame clock that ensures rendering is done as per the output vsync. There is no need to have Mesa do the same for us. This, most notably, ensures Mesa doesn't schedule frame callbacks that will be left unattended if the compositor stops throttling frames for its surface, this is eg. the case if the toplevel is moved to another workspace. Also, given a SwapInterval!=0 will always bring these unexpected side effects, check that it's possible to disable it, and warn if that isn't the case.
Review of attachment 342424 [details] [review]: Looks definitely better. ::: gdk/wayland/gdkglcontext-wayland.c @@ +472,3 @@ + EGL_MIN_SWAP_INTERVAL, + &display_wayland->egl_min_swap_interval)) + return NULL; You should set the GError if you return NULL here. Something like: if (!eglGetConfigAttrib (...)) { g_set_error_literal (error, GDK_GL_ERROR, GDK_GL_ERROR_NOT_AVAILABLE, "Could not retrieve the minimum swap interval"); return NULL; } But maybe we should move this check inside find_eglconfig_for_window(). @@ +552,3 @@ + eglSwapInterval (display_wayland->egl_display, 0); + else + g_warning ("Can't disable GL swap interval"); I'd probably use g_debug(), not a warning.
Created attachment 342427 [details] [review] wayland: Disable EGL swap interval We have a frame clock that ensures rendering is done as per the output vsync. There is no need to have Mesa do the same for us. This, most notably, ensures Mesa doesn't schedule frame callbacks that will be left unattended if the compositor stops throttling frames for its surface, this is eg. the case if the toplevel is moved to another workspace. Also, given a SwapInterval!=0 will always bring these unexpected side effects, check that it's possible to disable it, and spew a debug message if that isn't the case.
(Just want to add: I hope we can do a 3.22 release with this fix soon, since it's an extremely high-impact issue for WebKit. Thanks Carlos. :)
Going to keep poking about this, since all apps using WebKit are in a really bad state on Wayland right now: (In reply to Michael Catanzaro from comment #22) > (Just want to add: I hope we can do a 3.22 release with this fix soon, since > it's an extremely high-impact issue for WebKit. Thanks Carlos. :)
Review of attachment 342427 [details] [review]: Looks good to me.
\o/ Attachment 342427 [details] pushed as ab66c3d - wayland: Disable EGL swap interval