After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 769835 - On Wayland, application containing GtkGLArea stops responding if it's not on current workspace
On Wayland, application containing GtkGLArea stops responding if it's not on ...
Status: RESOLVED FIXED
Product: gtk+
Classification: Platform
Component: Backend: Wayland
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtk-bugs
gtk-bugs
Depends on:
Blocks: wayland
 
 
Reported: 2016-08-13 12:59 UTC by kritphong
Modified: 2017-01-09 18:36 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
test application (616 bytes, application/gzip)
2016-08-13 12:59 UTC, kritphong
  Details
wayland: Disable EGL swap interval (1.69 KB, patch)
2016-12-22 18:35 UTC, Carlos Garnacho
none Details | Review
wayland: Disable EGL swap interval (3.05 KB, patch)
2016-12-23 10:15 UTC, Carlos Garnacho
none Details | Review
wayland: Disable EGL swap interval (2.30 KB, patch)
2016-12-23 17:02 UTC, Carlos Garnacho
none Details | Review
wayland: Disable EGL swap interval (3.53 KB, patch)
2016-12-23 17:49 UTC, Carlos Garnacho
committed Details | Review

Description kritphong 2016-08-13 12:59:07 UTC
Created attachment 333214 [details]
test application

On Wayland, if the window of the application contains GtkGLArea and it's not on the current workspace, the application will not respond to D-Bus messages until it is moved to the current workspace or otherwise made visible again.

Steps to reproduce:
1. Compile the test application and run it.
2. Run the application again from another terminal. The command should exit immediately, and the application should print "ACTIVATE" in the first terminal indicating that the activate signal was received.
3. Move the window of the application to a different workspace.
4. Run the application again in the second terminal.
5. Observe that no text is printed in the first terminal, and the command ran in the second terminal doesn't exit.
6. If the window of the application is moved back to the current workspace, the command in the second terminal exits and "ACTIVATE" is printed in the first terminal.
Comment 1 Matthias Clasen 2016-08-20 03:47:36 UTC
Looks like an issue inside the mesa platform support for wayland.

  • #0 poll
    from /lib64/libc.so.6
  • #1 wl_display_poll
    at src/wayland-client.c line 1571
  • #2 wl_display_dispatch_queue
    at src/wayland-client.c line 1644
  • #3 get_back_bo
    from /lib64/libEGL.so.1
  • #4 dri2_wl_query_buffer_age
    from /lib64/libEGL.so.1
  • #5 _eglQuerySurface
    from /lib64/libEGL.so.1
  • #6 eglQuerySurface
    from /lib64/libEGL.so.1
  • #7 gdk_wayland_window_invalidate_for_new_frame
    at gdkglcontext-wayland.c line 65
  • #8 gdk_window_process_updates_internal
    at gdkwindow.c line 3955
  • #9 gdk_window_process_updates_with_mode
    at gdkwindow.c line 4172
  • #10 gdk_window_paint_on_clock
    at gdkwindow.c line 11641

Comment 2 Carlos Garnacho 2016-11-29 14:04:21 UTC
This is also the source of https://bugs.webkit.org/show_bug.cgi?id=164983 , as I can reproduce that c&p behavior on gtk3-demo after opening the "OpenGL area" demo.
Comment 3 Michael Catanzaro 2016-11-29 14:47:25 UTC
Also https://bugs.webkit.org/show_bug.cgi?id=164985, basically Epiphany is totally broken if you open two different windows on different workspaces. :)
Comment 4 Matthias Clasen 2016-11-30 19:47:28 UTC
robclark	my guess is somehow gs doesn't respond to queries from other workspaces somehow??
mclasen__	it could certainly be the case
robclark	probably I'd ask someone who knows the g-s wayland side of things.. but from mesa side it looks like we are blocking on compositor
Comment 5 Matthias Clasen 2016-11-30 19:48:36 UTC
Jonas, can you have a look at this ?
Comment 6 Michael Catanzaro 2016-11-30 19:53:50 UTC
Note that there is some hopefully useful debugging from Carlos Garnacho in https://bugs.webkit.org/show_bug.cgi?id=164983#c2
Comment 7 Jonas Ådahl 2016-12-01 06:33:56 UTC
As far as I can tell, what happens is that after the surface has been moved to another workspace (and isn't showing anymore), it'll still attempt to draw a frame, causing EGL to swap buffers and trigger the vsync throttler (ask for a frame callback). When we then try to draw another frame (while still hidden), it'll get stuck because EGL will wait for the frame callback that won't come before continuing.

Now, with mesa 13.0 and older, EGL will block on the vsync throttler (i.e. the surface frame callback) as early as when one queries any state related to the next buffer (in this case the buffer age). Commit 9ca6711faa03a9527c0946f2489e42cd9a62235c on mesa master changes this behavior to instead not block until eglSwapBuffers() but I don't think that will help us here really, since we seem to be drawing anyway thus will end up in eglSwapBuffers thus being blocked anyway.

I think what we need to do is make sure we don't rely on EGL's vsync throttler, i.e. we should set the swap interval to 0 (don't throttle eglSwapBuffers), and just rely on GTK+s own throttling.
Comment 8 Carlos Garnacho 2016-12-01 10:48:56 UTC
I came to the same thinking yesterday, although I also had a few realizations:

- The window is however trying to redraw because of the focus out animations. If I focus the window I'm going to paste on, and move it past the webkit/gl process workspace with ctrl+shift+alt+up/down, pasting still works afterwards, presumably because during the process the window didn't have to do the focus in/out animations.

- Setting eglSwapInterval(..., 0) alone doesn't entirely help, but

- Commenting out the "if (!pending_commit) return;" part on on_frame_clock_after_paint() alone does. Even if mesa throttling still applies. I get this code path is only for cairo-rendering paths, so I don't quite fully get why scheduling another frame on the gdk side untangles things.
Comment 9 Jonas Ådahl 2016-12-01 13:39:37 UTC
(moving back to gtk+ since this is about frame throttling getting away of unrelated IO)

I think that, for whatever reason a frame is scheduled (be it because some focus-out animation or an application deliberately asking for it for whatever reason), we should still handle the situation where we actually don't want to paint because of the vsync throttling.

As far as I can see, we use _gdk_frame_clock_freeze() and _gdk_frame_clock_thaw() to do the throttling, but maybe what we really should do is gdk_window_freeze_updates()/gdk_window_thaw_updates()? We handle those here and there in gdkwindow-wayland.c (the checking of window->update_freeze_count).
Comment 10 qbox 2016-12-04 09:56:55 UTC
I don't know if amarok uses GTKGLArea.

But, in Fedora 25, with Gnome-Wayland mouse stop working over any active windows/desktops. Gnome bars keeps working.

After minimize and restore Amarok using letf-bottom bar, mouse come to life again in all desktops/windows.
Comment 11 Michael Catanzaro 2016-12-04 16:23:10 UTC
(In reply to qbox from comment #10)
> I don't know if amarok uses GTKGLArea.

It doesn't. Amarok uses Qt, not GTK+. This is a GTK+ bug, so I suggest you file a new bug report.
Comment 12 Carlos Garnacho 2016-12-22 18:35:28 UTC
Created attachment 342399 [details] [review]
wayland: Disable EGL swap interval

We have a frame clock that ensures rendering is done as per the
output vsync. There is no need to have Mesa do the same for us.

This, most notably, ensures Mesa doesn't schedule frame callbacks
that will be left unattended if the compositor stops throttling
frames for its surface, this is eg. the case if the toplevel is
moved to another workspace.
Comment 13 Jonas Ådahl 2016-12-23 03:29:38 UTC
Review of attachment 342399 [details] [review]:

Should we maybe check the EGL_MIN_SWAP_INTERVAL and fail up front if we cannot configure the EGL context to let us handle the frame scheduling ourself?
Comment 14 Carlos Garnacho 2016-12-23 09:55:12 UTC
(In reply to Jonas Ådahl from comment #13)
> Review of attachment 342399 [details] [review] [review]:
> 
> Should we maybe check the EGL_MIN_SWAP_INTERVAL and fail up front if we
> cannot configure the EGL context to let us handle the frame scheduling
> ourself?

Yeah, probably makes sense, it seems Mesa eglSwapInterval implementations will just clamp and return true, so we might still hit this if the context can't disable frame throttling.
Comment 15 Carlos Garnacho 2016-12-23 10:15:37 UTC
Created attachment 342410 [details] [review]
wayland: Disable EGL swap interval

We have a frame clock that ensures rendering is done as per the
output vsync. There is no need to have Mesa do the same for us.

This, most notably, ensures Mesa doesn't schedule frame callbacks
that will be left unattended if the compositor stops throttling
frames for its surface, this is eg. the case if the toplevel is
moved to another workspace.

Also, given a SwapInterval!=0 will always bring these unexpected
side effects, check that it's possible to disable it, and error
out if that isn't the case.
Comment 16 Emmanuele Bassi (:ebassi) 2016-12-23 13:33:31 UTC
On X11 we already disable swap interval when running composited — and since Wayland is using a compositor by default, I think it makes complete sense to rely on the compositor signaling us when to render.

The only issue I can think of is that the Wayland compositor may not be synchronised to the vertical refresh rate; or could be a very simple process that just sets up the output and then relies on the toolkit to deal with the timing of submitting frames. Having said that, I seriously doubt this is even a case GTK should contemplate.
Comment 17 Emmanuele Bassi (:ebassi) 2016-12-23 13:38:45 UTC
Review of attachment 342410 [details] [review]:

::: gdk/wayland/gdkglcontext-wayland.c
@@ +441,3 @@
+static gboolean
+egl_config_can_disable_swap_interval (GdkDisplay      *display,
+                                      const EGLConfig *config)

EGLConfig is already a pointer; I don't think you need to pass it as a pointer…

@@ +446,3 @@
+  EGLint interval;
+
+  if (!eglGetConfigAttrib (display_wayland->egl_display, *config,

… especially since you're just dereferencing it here.

@@ +483,3 @@
     return NULL;
 
+  if (!egl_config_can_disable_swap_interval (display, &config))

I'm not convinced this is an appropriate action to take.

If swap interval cannot be disabled then we should fall back to the existing behaviour; it's not optimal, but saying "you can't have GL because your driver does not allow disabling swap interval" seems to me far more drastic than necessary.

Just save the return value inside the GdkGLContext and emit a warning or a debug message.

@@ +543,3 @@
       eglMakeCurrent(display_wayland->egl_display, EGL_NO_SURFACE, EGL_NO_SURFACE,
                      EGL_NO_CONTEXT);
+      eglSwapInterval (dpy, 0);

Not sure it's needed, here; we're unbinding everything, which means there's no context and no surface to swap.
Comment 18 Carlos Garnacho 2016-12-23 14:25:51 UTC
(In reply to Emmanuele Bassi (:ebassi) from comment #17)
> Review of attachment 342410 [details] [review] [review]:
> 
> ::: gdk/wayland/gdkglcontext-wayland.c
> @@ +441,3 @@
> +static gboolean
> +egl_config_can_disable_swap_interval (GdkDisplay      *display,
> +                                      const EGLConfig *config)
> 
> EGLConfig is already a pointer; I don't think you need to pass it as a
> pointer…

Hmm, I see. I was confused by the weird memory management at:

https://git.gnome.org//browse/gtk+/tree/gdk/wayland/gdkglcontext-wayland.c#n410

where it seems to return stuff from a freed block, knowing that EGLConfig is a pointer itself makes sense then.

> 
> @@ +446,3 @@
> +  EGLint interval;
> +
> +  if (!eglGetConfigAttrib (display_wayland->egl_display, *config,
> 
> … especially since you're just dereferencing it here.
> 
> @@ +483,3 @@
>      return NULL;
>  
> +  if (!egl_config_can_disable_swap_interval (display, &config))
> 
> I'm not convinced this is an appropriate action to take.
> 
> If swap interval cannot be disabled then we should fall back to the existing
> behaviour; it's not optimal, but saying "you can't have GL because your
> driver does not allow disabling swap interval" seems to me far more drastic
> than necessary.
> 
> Just save the return value inside the GdkGLContext and emit a warning or a
> debug message.

Sure, question is when it's more appropriate to warn then. Before it locks presumably...

> 
> @@ +543,3 @@
>        eglMakeCurrent(display_wayland->egl_display, EGL_NO_SURFACE,
> EGL_NO_SURFACE,
>                       EGL_NO_CONTEXT);
> +      eglSwapInterval (dpy, 0);
> 
> Not sure it's needed, here; we're unbinding everything, which means there's
> no context and no surface to swap.

Right, I was unsure about this one.

Thanks for the review!
Comment 19 Carlos Garnacho 2016-12-23 17:02:12 UTC
Created attachment 342424 [details] [review]
wayland: Disable EGL swap interval

We have a frame clock that ensures rendering is done as per the
output vsync. There is no need to have Mesa do the same for us.

This, most notably, ensures Mesa doesn't schedule frame callbacks
that will be left unattended if the compositor stops throttling
frames for its surface, this is eg. the case if the toplevel is
moved to another workspace.

Also, given a SwapInterval!=0 will always bring these unexpected
side effects, check that it's possible to disable it, and warn
if that isn't the case.
Comment 20 Emmanuele Bassi (:ebassi) 2016-12-23 17:33:16 UTC
Review of attachment 342424 [details] [review]:

Looks definitely better.

::: gdk/wayland/gdkglcontext-wayland.c
@@ +472,3 @@
+                           EGL_MIN_SWAP_INTERVAL,
+                           &display_wayland->egl_min_swap_interval))
+    return NULL;

You should set the GError if you return NULL here.

Something like:

  if (!eglGetConfigAttrib (...))
    {
      g_set_error_literal (error, GDK_GL_ERROR, GDK_GL_ERROR_NOT_AVAILABLE,
                           "Could not retrieve the minimum swap interval");
      return NULL;
    }

But maybe we should move this check inside find_eglconfig_for_window().

@@ +552,3 @@
+    eglSwapInterval (display_wayland->egl_display, 0);
+  else
+    g_warning ("Can't disable GL swap interval");

I'd probably use g_debug(), not a warning.
Comment 21 Carlos Garnacho 2016-12-23 17:49:33 UTC
Created attachment 342427 [details] [review]
wayland: Disable EGL swap interval

We have a frame clock that ensures rendering is done as per the
output vsync. There is no need to have Mesa do the same for us.

This, most notably, ensures Mesa doesn't schedule frame callbacks
that will be left unattended if the compositor stops throttling
frames for its surface, this is eg. the case if the toplevel is
moved to another workspace.

Also, given a SwapInterval!=0 will always bring these unexpected
side effects, check that it's possible to disable it, and spew
a debug message if that isn't the case.
Comment 22 Michael Catanzaro 2016-12-23 18:24:00 UTC
(Just want to add: I hope we can do a 3.22 release with this fix soon, since it's an extremely high-impact issue for WebKit. Thanks Carlos. :)
Comment 23 Michael Catanzaro 2017-01-09 15:42:46 UTC
Going to keep poking about this, since all apps using WebKit are in a really bad state on Wayland right now:

(In reply to Michael Catanzaro from comment #22)
> (Just want to add: I hope we can do a 3.22 release with this fix soon, since
> it's an extremely high-impact issue for WebKit. Thanks Carlos. :)
Comment 24 Emmanuele Bassi (:ebassi) 2017-01-09 15:55:17 UTC
Review of attachment 342427 [details] [review]:

Looks good to me.
Comment 25 Carlos Garnacho 2017-01-09 18:36:14 UTC
\o/

Attachment 342427 [details] pushed as ab66c3d - wayland: Disable EGL swap interval