GNOME Bugzilla – Bug 768190
Use a separate thread to wait for VBlank for NVIDIA drivers
Last modified: 2017-02-22 16:13:48 UTC
There's a general problem with using Cogl with the NVIDIA drivers vs. using it with free drivers: if you have a constantly rendering application, then with the NVIDIA drivers, g_idle_add() will never run because the application will always be blocked in glXSwapBuffers(), while with the free drivers (that support the INTEL_swap_event extension), the application will go idle waiting for the next frame. This turns out to trigger a range of subtle (and not so subtle) bugs in GNOME Shell and it would be better to wait for new frames asynchronously. This patch does that by after calling glXSwapBuffers(), using a separate thread with a shared GL content that waits for the next vblank and says that the frame is complete at that point. (I've discussed this with NVIDIA and they think that this method should work correctly and there isn't a better way to do it.) The current version enables this behavior by default with the NVIDIA drivers, but that probably isn't OK: * The application needs to call XInitThreads() with the standard caveat that it has to be called before all other X calls. * This will effectively disable triple buffering if the driver is doing that, since we only ever will have one frame in flight. That might not be right for an arbitrary Cogl applications. * There is some performance overhead from adding glFinish() before glXSwapBuffers() since that causes a CPU busy-wait for GPU completion. So this probably needs a: cogl_xlib_renderer_enable_swap_wait() or something that GNOME Shell can call.
Created attachment 330593 [details] [review] examples/cogl-crate.c: fix bug when waiting for next frame The swap_ready variable was never reset, meaning that we weren't effectively waiting for SYNC events that signal frame completion.
Created attachment 330594 [details] [review] CoglGPUInfo - fix check for NVIDIA NVIDIA drivers have a vendor of "NVIDIA Corporation" not "NVIDIA". Check for both in case older drivers did use "NVIDIA"
Created attachment 330595 [details] [review] CoglWinsysGLX: factor out some duplicated code Add a helper function for repeated calls to clock_gettime(CLOCK_MONOTONIC)
Created attachment 330596 [details] [review] Usability of SGI_video_sync is per-display not per-renderer As previously commented in the code, SGI_video_sync is per-display, rather than per-renderer. The is_direct flag for the renderer was tested before it was initialized (per-display) and that resulted in SGI_video_sync never being used.
Created attachment 330597 [details] [review] Fix the get_clock_time() without GLX_OML_sync_control When we don't have GLX_OML_sync_control, we still can set the frame presentation time, but we always use the system monotonic time, so return that from get_clock_time().
Created attachment 330598 [details] [review] For NVIDIA proprietary drivers, implement sync events with a thread It's a good guess that the buffer swap will occur at the next vblank, so use glXWaitVideoSync in a separate thread to deliver a sync event rather than just letting the client block when frame drawing, which can signficantly change app logic as compared to the INTEL_swap_event case.
Review of attachment 330593 [details] [review]: indeed
Review of attachment 330594 [details] [review]: yep, I've seen something like this elsewhere too
Review of attachment 330595 [details] [review]: ::: cogl/winsys/cogl-winsys-glx.c @@ +236,3 @@ /* This is the time source that the newer (fixed) linux drm * drivers use (Linux >= 3.8) */ + current_monotonic_time = get_monotonic_time_ns (); This one wants microseconds
Review of attachment 330596 [details] [review]: looks good, but yeah, this is confusing because CoglDisplay initializes the GLXContext and CoglRenderer just initializes the basic GLX support and the renderer is initialized before the display
Review of attachment 330597 [details] [review]: right
Review of attachment 330598 [details] [review]: ::: cogl/winsys/cogl-winsys-glx.c @@ +1927,3 @@ + ensure_ust_type (display->renderer, drawable); + + if ((pipe (glx_onscreen->swap_wait_pipe) == -1)) pipe2 for O_CLOEXEC ? @@ +1942,3 @@ + g_cond_init (&glx_onscreen->swap_wait_cond); + glx_onscreen->swap_wait_context = + glx_renderer->glXCreateNewContext (xlib_renderer->xdpy, Does this work if the existing glx_context was created by glXCreateContextAttribs (i.e. by the gl3 code path) ? @@ +2207,3 @@ + if (_cogl_has_private_feature (context, COGL_PRIVATE_FEATURE_THREADED_SWAP_WAIT)) + { + _cogl_winsys_wait_for_gpu (onscreen); is this always needed or only the first time when we create the thread and its glx context ? @@ +2208,3 @@ + { + _cogl_winsys_wait_for_gpu (onscreen); + start_threaded_swap_wait (onscreen, _cogl_winsys_get_vsync_counter (context)); having THREADED_SWAP_WAIT implies can_vblank_wait but that doesn't imply have_vblank_counter does it?
Review of attachment 330595 [details] [review]: with that comment fixed this is obviously fine
Created attachment 330670 [details] [review] Add cogl_xlib_renderer_set_threaded_swap_wait_enabled() [PATCH] Add cogl_xlib_renderer_set_threaded_swap_wait_enabled() Because the threaded-swap-wait functionality requires XInitThreads(), and because it isn't clear that it is a win for all applications, add a API function to conditionally enable it. Fix the cogl-crate example not to just have a hard-coded dependency on libX11.
Created attachment 330671 [details] [review] clutter: add an api to enable swap wait
Created attachment 330672 [details] [review] mutter: enable threaded swap waiting
Added patches to wire this up with an optional enablement API. This follows the video memory purge stuff pretty closely, so touches the same areasm and was rebased on top of that to avoid conflicts.
(In reply to Rui Matos from comment #12) > Review of attachment 330598 [details] [review] [review]: > > ::: cogl/winsys/cogl-winsys-glx.c > @@ +1927,3 @@ > + ensure_ust_type (display->renderer, drawable); > + > + if ((pipe (glx_onscreen->swap_wait_pipe) == -1)) > > pipe2 for O_CLOEXEC ? True - it would be good practice to set CLOEXEC. I'll probably do it with fcntl() to avoid worrying about non-Linux. > @@ +1942,3 @@ > + g_cond_init (&glx_onscreen->swap_wait_cond); > + glx_onscreen->swap_wait_context = > + glx_renderer->glXCreateNewContext (xlib_renderer->xdpy, > > Does this work if the existing glx_context was created by > glXCreateContextAttribs (i.e. by the gl3 code path) ? Seems to work fine in combination with the prefer-gl3 patches. > @@ +2207,3 @@ > + if (_cogl_has_private_feature (context, > COGL_PRIVATE_FEATURE_THREADED_SWAP_WAIT)) > + { > + _cogl_winsys_wait_for_gpu (onscreen); > > is this always needed or only the first time when we create the thread and > its glx context ? What this is about is trying to avoid the case where: We get the value of the vsync counter, and send it to the wait thread We call glXSwapBuffers() It takes a while for the frame to finish on th GPU and the vsync counter increments The wait thread sends the completion event back We start the next frame The buffer swap actually happens at the next vblank interval Adding the glFinish() makes this much less likely, though it has the downside of causing GPU usage in gnome-shell as it busy-waits for GPU idle. It's actually the same thing, more or less, as in the commented call to _cogl_winsys_wait_for_gpu() in the no-glXSwapInterval() branch of this function. This doesn't entirely eliminate the race condition - even with the glFinish() there's a small chance that vblank will happen between getting the value of the vblank counter and swapping the buffers, causing the next frame to start early. But as far as understand it, glFinish() on the NVIDIA drivers actually waits for any preceding glXSwapBuffers() to complete, so the glFinish() provides a secondary function to throttle this case and get things back in sync. Clearly the code could use a comment here. > @@ +2208,3 @@ > + { > + _cogl_winsys_wait_for_gpu (onscreen); > + start_threaded_swap_wait (onscreen, > _cogl_winsys_get_vsync_counter (context)); > > having THREADED_SWAP_WAIT implies can_vblank_wait but that doesn't imply > have_vblank_counter does it? Good catch, I should change: if (glx_display->can_vblank_wait && context->display->renderer->xlib_enable_threaded_swap_wait && info->vendor == COGL_GPU_INFO_VENDOR_NVIDIA) To be have_vblank_counter instead. (Practically harmless with the NVIDIA check, but this is confusing enough without having it be wrong!)
Created attachment 330676 [details] [review] CoglWinsysGLX: factor out some duplicated code Add a helper function for repeated calls to clock_gettime(CLOCK_MONOTONIC)
Created attachment 330677 [details] [review] For NVIDIA proprietary drivers, implement sync events with a thread It's a good guess that the buffer swap will occur at the next vblank, so use glXWaitVideoSync in a separate thread to deliver a sync event rather than just letting the client block when frame drawing, which can signficantly change app logic as compared to the INTEL_swap_event case.
Review of attachment 330670 [details] [review]: looks good
Review of attachment 330671 [details] [review]: ok
Review of attachment 330676 [details] [review]: ++
Review of attachment 330677 [details] [review]: makes sense. The thread sync code looks correct too AFAICT
Review of attachment 330672 [details] [review]: ok
Forgot I filed this, and refiled it against mutter's integrated Cogl. I don't see a point in landing this for standalone Cogl - I've only enabled or tested this with Mutter, so I'm going to mark this as a duplicate. *** This bug has been marked as a duplicate of bug 779039 ***