Bug 768190 – Use a separate thread to wait for VBlank for NVIDIA drivers

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 768190 - Use a separate thread to wait for VBlank for NVIDIA drivers


Summary:	Use a separate thread to wait for VBlank for NVIDIA drivers


Status:	RESOLVED DUPLICATE of bug 779039

Product:	cogl
Classification:	Platform
Component:	GLX
Version:	unspecified
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Cogl maintainer(s)
QA Contact:	Cogl maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-06-29 14:43 UTC by Owen Taylor
Modified:	2017-02-22 16:13 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
examples/cogl-crate.c: fix bug when waiting for next frame (898 bytes, patch) 2016-06-29 14:43 UTC, Owen Taylor	accepted-commit_now	Details \| Review
CoglGPUInfo - fix check for NVIDIA (984 bytes, patch) 2016-06-29 14:43 UTC, Owen Taylor	accepted-commit_now	Details \| Review
CoglWinsysGLX: factor out some duplicated code (2.64 KB, patch) 2016-06-29 14:43 UTC, Owen Taylor	none	Details \| Review
Usability of SGI_video_sync is per-display not per-renderer (7.16 KB, patch) 2016-06-29 14:43 UTC, Owen Taylor	accepted-commit_now	Details \| Review
Fix the get_clock_time() without GLX_OML_sync_control (1.12 KB, patch) 2016-06-29 14:43 UTC, Owen Taylor	accepted-commit_now	Details \| Review
For NVIDIA proprietary drivers, implement sync events with a thread (13.41 KB, patch) 2016-06-29 14:43 UTC, Owen Taylor	none	Details \| Review
Add cogl_xlib_renderer_set_threaded_swap_wait_enabled() (6.57 KB, patch) 2016-06-30 15:35 UTC, Owen Taylor	accepted-commit_now	Details \| Review
clutter: add an api to enable swap wait (2.57 KB, patch) 2016-06-30 15:36 UTC, Owen Taylor	accepted-commit_now	Details \| Review
mutter: enable threaded swap waiting (1010 bytes, patch) 2016-06-30 15:36 UTC, Owen Taylor	accepted-commit_now	Details \| Review
CoglWinsysGLX: factor out some duplicated code (2.64 KB, patch) 2016-06-30 18:10 UTC, Owen Taylor	accepted-commit_now	Details \| Review
For NVIDIA proprietary drivers, implement sync events with a thread (14.52 KB, patch) 2016-06-30 18:11 UTC, Owen Taylor	accepted-commit_now	Details \| Review

Description Owen Taylor 2016-06-29 14:43:08 UTC

There's a general problem with using Cogl with the NVIDIA drivers vs.
using it with free drivers: if you have a constantly rendering application,
then with the NVIDIA drivers, g_idle_add() will never run because the
application will always be blocked in glXSwapBuffers(), while with the free
drivers (that support the INTEL_swap_event extension), the application
will go idle waiting for the next frame.

This turns out to trigger a range of subtle (and not so subtle) bugs in
GNOME Shell and it would be better to wait for new frames asynchronously.
This patch does that by after calling glXSwapBuffers(), using a separate
thread with a shared GL content that waits for the next vblank and says
that the frame is complete at that point.

(I've discussed this with NVIDIA and they think that this method should
work correctly and there isn't a better way to do it.)

The current version enables this behavior by default with the NVIDIA drivers, but
that probably isn't OK:

 * The application needs to call XInitThreads() with the standard caveat
   that it has to be called before all other X calls.
 * This will effectively disable triple buffering if the driver is doing
   that, since we only ever will have one frame in flight. That might
   not be right for an arbitrary Cogl applications.
 * There is some performance overhead from adding glFinish() before glXSwapBuffers()
   since that causes a CPU busy-wait for GPU completion.

So this probably needs a:

 cogl_xlib_renderer_enable_swap_wait()

or something that GNOME Shell can call.

Comment 1 Owen Taylor 2016-06-29 14:43:11 UTC

Created attachment 330593 [details] [review]
examples/cogl-crate.c: fix bug when waiting for next frame

The swap_ready variable was never reset, meaning that we weren't
effectively waiting for SYNC events that signal frame completion.

Comment 2 Owen Taylor 2016-06-29 14:43:15 UTC

Created attachment 330594 [details] [review]
CoglGPUInfo - fix check for NVIDIA

NVIDIA drivers have a vendor of "NVIDIA Corporation" not "NVIDIA".
Check for both in case older drivers did use "NVIDIA"

Comment 3 Owen Taylor 2016-06-29 14:43:19 UTC

Created attachment 330595 [details] [review]
CoglWinsysGLX: factor out some duplicated code

Add a helper function for repeated calls to clock_gettime(CLOCK_MONOTONIC)

Comment 4 Owen Taylor 2016-06-29 14:43:23 UTC

Created attachment 330596 [details] [review]
Usability of SGI_video_sync is per-display not per-renderer

As previously commented in the code, SGI_video_sync is per-display, rather
than per-renderer. The is_direct flag for the renderer was tested before
it was initialized (per-display) and that resulted in SGI_video_sync
never being used.

Comment 5 Owen Taylor 2016-06-29 14:43:28 UTC

Created attachment 330597 [details] [review]
Fix the get_clock_time() without GLX_OML_sync_control

When we don't have GLX_OML_sync_control, we still can set the
frame presentation time, but we always use the system monotonic time,
so return that from get_clock_time().

Comment 6 Owen Taylor 2016-06-29 14:43:32 UTC

Created attachment 330598 [details] [review]
For NVIDIA proprietary drivers, implement sync events with a thread

It's a good guess that the buffer swap will occur at the next vblank,
so use glXWaitVideoSync in a separate thread to deliver a sync event
rather than just letting the client block when frame drawing, which
can signficantly change app logic as compared to the INTEL_swap_event
case.

Comment 7 Rui Matos 2016-06-29 16:32:29 UTC

Review of attachment 330593 [details] [review]:

indeed

Comment 8 Rui Matos 2016-06-29 16:33:38 UTC

Review of attachment 330594 [details] [review]:

yep, I've seen something like this elsewhere too

Comment 9 Rui Matos 2016-06-29 16:34:22 UTC

Review of attachment 330595 [details] [review]:

::: cogl/winsys/cogl-winsys-glx.c
@@ +236,3 @@
   /* This is the time source that the newer (fixed) linux drm
    * drivers use (Linux >= 3.8) */
+  current_monotonic_time = get_monotonic_time_ns ();

This one wants microseconds

Comment 10 Rui Matos 2016-06-29 17:53:44 UTC

Review of attachment 330596 [details] [review]:

looks good, but yeah, this is confusing because CoglDisplay initializes the GLXContext and CoglRenderer just initializes the basic GLX support and the renderer is initialized before the display

Comment 11 Rui Matos 2016-06-29 18:04:27 UTC

Review of attachment 330597 [details] [review]:

right

Comment 12 Rui Matos 2016-06-30 14:41:17 UTC

Review of attachment 330598 [details] [review]:

::: cogl/winsys/cogl-winsys-glx.c
@@ +1927,3 @@
+      ensure_ust_type (display->renderer, drawable);
+
+      if ((pipe (glx_onscreen->swap_wait_pipe) == -1))

pipe2 for O_CLOEXEC ?

@@ +1942,3 @@
+      g_cond_init (&glx_onscreen->swap_wait_cond);
+      glx_onscreen->swap_wait_context =
+         glx_renderer->glXCreateNewContext (xlib_renderer->xdpy,

Does this work if the existing glx_context was created by glXCreateContextAttribs (i.e. by the gl3 code path) ?

@@ +2207,3 @@
+          if (_cogl_has_private_feature (context, COGL_PRIVATE_FEATURE_THREADED_SWAP_WAIT))
+            {
+              _cogl_winsys_wait_for_gpu (onscreen);

is this always needed or only the first time when we create the thread and its glx context ?

@@ +2208,3 @@
+            {
+              _cogl_winsys_wait_for_gpu (onscreen);
+              start_threaded_swap_wait (onscreen, _cogl_winsys_get_vsync_counter (context));

having THREADED_SWAP_WAIT implies can_vblank_wait but that doesn't imply have_vblank_counter does it?

Comment 13 Rui Matos 2016-06-30 14:41:53 UTC

Review of attachment 330595 [details] [review]:

with that comment fixed this is obviously fine

Comment 14 Owen Taylor 2016-06-30 15:35:30 UTC

Created attachment 330670 [details] [review]
Add cogl_xlib_renderer_set_threaded_swap_wait_enabled()

[PATCH] Add cogl_xlib_renderer_set_threaded_swap_wait_enabled()

Because the threaded-swap-wait functionality requires XInitThreads(),
and because it isn't clear that it is a win for all applications,
add a API function to conditionally enable it.

Fix the cogl-crate example not to just have a hard-coded dependency
on libX11.

Comment 15 Owen Taylor 2016-06-30 15:36:25 UTC

Created attachment 330671 [details] [review]
clutter: add an api to enable swap wait

Comment 16 Owen Taylor 2016-06-30 15:36:52 UTC

Created attachment 330672 [details] [review]
mutter: enable threaded swap waiting

Comment 17 Owen Taylor 2016-06-30 15:38:41 UTC

Added patches to wire this up with an optional enablement API. This follows the video memory purge stuff pretty closely, so touches the same areasm and was rebased on top of that to avoid conflicts.

Comment 18 Owen Taylor 2016-06-30 17:15:42 UTC

(In reply to Rui Matos from comment #12)
> Review of attachment 330598 [details] [review] [review]:
> 
> ::: cogl/winsys/cogl-winsys-glx.c
> @@ +1927,3 @@
> +      ensure_ust_type (display->renderer, drawable);
> +
> +      if ((pipe (glx_onscreen->swap_wait_pipe) == -1))
> 
> pipe2 for O_CLOEXEC ?

True - it would be good practice to set CLOEXEC. I'll probably do it with fcntl() to avoid worrying about non-Linux.

> @@ +1942,3 @@
> +      g_cond_init (&glx_onscreen->swap_wait_cond);
> +      glx_onscreen->swap_wait_context =
> +         glx_renderer->glXCreateNewContext (xlib_renderer->xdpy,
> 
> Does this work if the existing glx_context was created by
> glXCreateContextAttribs (i.e. by the gl3 code path) ?

Seems to work fine in combination with the prefer-gl3 patches.

> @@ +2207,3 @@
> +          if (_cogl_has_private_feature (context,
> COGL_PRIVATE_FEATURE_THREADED_SWAP_WAIT))
> +            {
> +              _cogl_winsys_wait_for_gpu (onscreen);
> 
> is this always needed or only the first time when we create the thread and
> its glx context ?

What this is about is trying to avoid the case where:

 We get the value of the vsync counter, and send it to the wait thread
 We call glXSwapBuffers()
 It takes a while for the frame to finish on th GPU and the vsync counter increments
 The wait thread sends the completion event back
 We start the next frame
 The buffer swap actually happens at the next vblank interval

Adding the glFinish() makes this much less likely, though it has the downside of causing GPU usage in gnome-shell as it busy-waits for GPU idle. It's actually the same thing, more or less, as in the commented call to _cogl_winsys_wait_for_gpu() in the no-glXSwapInterval() branch of this function.

This doesn't entirely eliminate the race condition - even with the glFinish() there's a small chance that vblank will happen between getting the value of the vblank counter and swapping the buffers, causing the next frame to start early. But as far as understand it, glFinish() on the NVIDIA drivers actually waits for any preceding glXSwapBuffers() to complete, so the glFinish() provides a secondary function to throttle this case and get things back in sync.

Clearly the code could use a comment here.

> @@ +2208,3 @@
> +            {
> +              _cogl_winsys_wait_for_gpu (onscreen);
> +              start_threaded_swap_wait (onscreen,
> _cogl_winsys_get_vsync_counter (context));
> 
> having THREADED_SWAP_WAIT implies can_vblank_wait but that doesn't imply
> have_vblank_counter does it?

Good catch, I should change:

      if (glx_display->can_vblank_wait &&
          context->display->renderer->xlib_enable_threaded_swap_wait &&
          info->vendor == COGL_GPU_INFO_VENDOR_NVIDIA)

To be have_vblank_counter instead. (Practically harmless with the NVIDIA check, but this is confusing enough without having it be wrong!)

Comment 19 Owen Taylor 2016-06-30 18:10:29 UTC

Created attachment 330676 [details] [review]
CoglWinsysGLX: factor out some duplicated code

Add a helper function for repeated calls to clock_gettime(CLOCK_MONOTONIC)

Comment 20 Owen Taylor 2016-06-30 18:11:02 UTC

Created attachment 330677 [details] [review]
For NVIDIA proprietary drivers, implement sync events with a thread

It's a good guess that the buffer swap will occur at the next vblank,
so use glXWaitVideoSync in a separate thread to deliver a sync event
rather than just letting the client block when frame drawing, which
can signficantly change app logic as compared to the INTEL_swap_event
case.

Comment 21 Rui Matos 2016-07-01 13:28:23 UTC

Review of attachment 330670 [details] [review]:

looks good

Comment 22 Rui Matos 2016-07-01 13:29:36 UTC

Review of attachment 330671 [details] [review]:

ok

Comment 23 Rui Matos 2016-07-01 13:30:29 UTC

Review of attachment 330676 [details] [review]:

++

Comment 24 Rui Matos 2016-07-01 13:52:35 UTC

Review of attachment 330677 [details] [review]:

makes sense. The thread sync code looks correct too AFAICT

Comment 25 Rui Matos 2016-07-01 14:01:08 UTC

Review of attachment 330672 [details] [review]:

ok

Comment 26 Owen Taylor 2017-02-22 16:13:48 UTC

Forgot I filed this, and refiled it against mutter's integrated Cogl. I don't see a point in landing this for standalone Cogl - I've only enabled or tested this with Mutter, so I'm going to mark this as a duplicate.

*** This bug has been marked as a duplicate of bug 779039 ***