GNOME Bugzilla – Bug 691752
add fence/sync API
Last modified: 2013-05-28 20:41:25 UTC
Right now, it's impossible for external Cogl users to find out when their submitted commands have finished executing on the GPU. This is a particular problem when using the patchset to enable texture_external importing, since the specification allows implementations to not implement any behind-the-scenes synchronisation, and explicitly notes that glFinish() or similar may be required before a texture can be safely reused. The usecase that drove this API is importing OpenMAX buffers into clutter-gst, where they are used as part of a standard Clutter scene, and then returned to the OpenMAX buffer pool when ready. It probably goes without saying that the platform provides no behind-the-scenes synchronsation; as soon as we unref the buffer in clutter-gst, OpenMAX is liable to write to it at any moment. Without a fence API, we have no idea when the scene has completed execution, so we're forced to either copy the textures (which has an unacceptable performance impact) or just guess when the buffer is no longer in use. Note that the EGL_KHR_fence_sync integration is currently untested; I'm adding support for it for Mesa/i965 at the moment. GL_ARB_sync works though.
Created attachment 233487 [details] [review] Add support for fence/sync/completion extensions The only new piece of public API is cogl_fence_mark, which inserts a synchronisation fence into the GL command stream, executing the provided callback after that fence has been signalled by the GL or EGL implementation. Support is currently provided for GL 3.x's GL_ARB_sync extension, and EGL's EGL_KHR_fence_sync (when used with OpenGL ES). Signed-off-by: Daniel Stone <daniel@fooishbar.org>
This might help GLX's _cogl_winsys_onscreen_swap_region too, but I'm not sure of the synchronisation issues, e.g. what happens if rendering is submitted after the swap_region hook, but before the callback for completion, so I haven't prepared a patch. GLES has similar uses too, for flipped copies.
Ah, and with that I just found the CoglFrameCallback thread. Do you think fences would semantically fit in there, or would that just be combining things for the sake of it?
Created attachment 233492 [details] [review] Add synchronisation/fence API Almost entirely based on Cogl's cogl_fence_{new,destroy}, but with an additional fun layer of indirection due to the need to defer posting the fences until after the stage redraw commands have been submitted. Signed-off-by: Daniel Stone <daniel@fooishbar.org>
The above (comment #4) is how I've hooked this up in Clutter: pretty much exactly the same API. I'm not happy with _clutter_stage_get_fences and friends, but that's an implementation detail.
(In reply to comment #1) This looks really cool, thanks for the patch! After discussing with Rob Bragg, here are some thoughts about it: We think it would might make sense to attach the fence to a framebuffer rather than to the context. This would allow us to eventually consider the command streams for separate framebuffers separately. The API could be something like this: CoglFenceClosure * cogl_framebuffer_add_fence (CoglFramebuffer *fb, CoglFenceCallback *callback, void *user_data); void cogl_framebuffer_cancel_fence (CoglFramebuffer *fb, CoglFenceClosure *closure); The CoglFenceClosure* would be an opaque pointer which is just used as an identifier to cancel it again. This is probably worth doing so that an application can identify multiple fences that use the same callback function. The CoglFenceClosure* naming is so that we could make it similar to the CoglFrameCallback API by making that return a CoglFrameClosure* pointer. The ‘cancel’ naming should hopefully make it obvious that you don't need to manually destroy the object if you do wait until the callback is invoked. It doesn't seem to me like we should try to squeeze the event objects into the same callback mechanism as CoglFrameCallback because you are quite likely to want an orthogonal bit of code to listen for a specific fence. It would be annoying to make it listen to all of the frame events and ignore them until it found the specific event for its fence. We need to consider how this relates to the Cogl journal. Currently it looks like it won't behave correctly if there are entries in the journal because GL may consider the fence object to be passed before the journal is flushed. I guess eventually it would be good to always insert the fence object into the journal and then delay creating the GL object until the journal is flushed. For now we could just always flush the journal every time a fence is inserted. This is another incentive to attach the fence objects to the framebuffer rather than the context because each framebuffer has its own journal. Currently the patch relies on cogl_poll_get_info to wake up the main loop to check the fences. Cogl doesn't currently require an application to call that for some winsys backends such as SDL. We should either try to find a way to hook the timeout into SDL's main loop mechanism or maybe just blacklist the feature with the SDL winsys. I'd be happy to land the patch and worry about that later though. The 100ms timeout seems too big. Surely you'd expect anything you submit to GL to complete within the timespan of a single frame, so maybe something in the order of 4 or 8ms would be better? Annoyingly we can't use C99 in the files that aren't Linux-only because we support compiling with Visual Studio. This is being done in cogl_fence_destroy(). If you can be bothered you could make it use the COGL_TAILQ type for the list of fences instead of GSList. Then it wouldn't have to walk the list when removing the fence. This is probably just micro-optimising though so it's fine either way.
Thanks a lot for the review! Appreciate it. For the record, there's a slightly newer version on git://git.collabora.com/git/user/daniels/cogl in the 'fence' branch, but the essentials are unchanged. Your comments all sound good to me, and I'll try to get them fixed up, including taking a look into the journal. The one thing I'll note, though, is the timeout: drivers including Intel can potentially end up calling a blocking ioctl() once for every fence check, so checking too often could actually be reasonably punitive, performance-wise. I chose 100ms because I'm biased towards the video usecases, but now you mention it, I guess it does make it pretty useless as a slightly less punitive version of glFinish. ;) Maybe the thing to do would be to start with an aggressive timeout (e.g. quarter of a frame) and slowly back off if we're repeatedly polling on the one fence? Of course, the ideal here would be to get a DRM event so we don't have to poll at all. I'm working on that, but it would take a while to actually hit mainline kernels/libdrm/Mesa.
I've updated this in the git branch, and will be attaching a new patch in a sec. The patch is actually against cogl-1.12; I'll attach a patch rebased against master when I'm less jetlagged. ;) This should have addressed all the review comments, including properly managing the journal. It makes fences a two-stage thing, much as in the proposed Clutter patch: first we add them to a tailq in the journal, then when we flush the journal, we actually submit the fences and move them to another tailq in the context.
Created attachment 235160 [details] [review] Add support for fence/sync/completion extensions The only new piece of public API is cogl_fence_mark, which inserts a synchronisation fence into the GL command stream, executing the provided callback after that fence has been signalled by the GL or EGL implementation. Support is currently provided for GL 3.x's GL_ARB_sync extension, and EGL's EGL_KHR_fence_sync (when used with OpenGL ES). Signed-off-by: Daniel Stone <daniel@fooishbar.org>
And once again the commit message is wrong - I've fixed that in my branch.
Created attachment 235314 [details] [review] Feedback patch Thanks for the update. Here are some comments in the form of a patch which makes the suggested changes. If it's ok with you we can just squash it back into your patch. I started off just doing some minor style changes but I may have got a bit carried away. I hope I haven't butchered your patch too much. The handling for the journal isn't quite how I was expecting it to work because the sync is inserted after all of the journal commands are complete whereas if the fence was actually inserted into the journal it could insert the sync exactly at the right place in the command stream. I guess it's not too big a deal if the fence is a bit late but it might not be ideal if the journal has a lot of expensive commands. I'm happy to leave it as it is though. The EGL winsys version of the fence calls glFlush whereas the GL3 version uses the GL_SYNC_FLUSH_COMMANDS_BIT flag when polling the fence. However the EGL extension has an equivalent flag called EGL_SYNC_FLUSH_COMMANDS_BIT_KHR. Is there any particular reason why the patch isn't using that? Currently the fence will never be hit unless the journal is flushed. From the user's perspective this will only happen if they do a swap buffers. The user might want to do some stuff such as upload a texture and then wait for that to complete without doing a swap buffers. I think it would be better to force a flush of the journal whenever we're about to go idle to guarantee that the fence will be passed in a finite time. I think it would be nice to make sure the destructor for CoglFramebuffer automatically cancels all of the syncs, otherwise it will probably crash if you free the framebuffer instead of waiting for the sync to complete. It's a bit nasty to invoke the user's callback directly during the journal flush if it failed to create a sync object. We try to guarantee that callbacks will only be invoked at certain points (eg, during cogl_poll_dispatch) to make writing the application simpler. This patch changes it so that _cogl_fence_submit can not fail but instead when it comes to check the fence if there is no actual fence object it will immediately invoke the callback. If the journal is empty it might as well just submit it immediately. The timeout returned from cogl_poll_get_info is in microseconds not milliseconds. It should probably wait for more than 5 microseconds :) Instead of making the tests silently fail if a feature is missing, we have a mechanism to mark that the test requires the feature in test-conform-main. That way the test report can say ‘n/a’ instead of ‘pass’. We don't use gpointer in Cogl but just void* instead because it's more well known. cogl_framebuffer_add_fence should return NULL instead of FALSE.
Review of attachment 235314 [details] [review]: Looks good to me (and thank god for void * rather than gpointer), thanks again for the thorough review. Just a couple of minor nitpicks below, otherwise looks good to me. ::: cogl/cogl-fence.c @@ +55,1 @@ { The test for this branch needs to include !fence->fence_obj, else we'll add both a winsys and a GL fence (if possible), leaking the former. ::: cogl/cogl-poll.c @@ +64,3 @@ out: /* ... unless we have pending fence objects to query. */ if (_cogl_fence_should_check (context) && (*timeout == -1 || *timeout > 5)) Need to change 5 to 5000 here as well. ::: cogl/winsys/cogl-winsys-egl.c @@ +847,3 @@ ret = renderer->pf_eglClientWaitSync (renderer->edpy, fence, + EGL_SYNC_FLUSH_COMMANDS_BIT_KHR, I didn't do this the first time around, because I wasn't sure that we'd still necessarily have the same GL(ES) context bound. If we're guaranteed to have the right context bound then this is OK.
Ugh, that's not particularly useful formatting! The first comment ('the test for this branch ...') was in _cogl_fence_submit: we could potentially hit the if (winsys->fence_add) branch, followed by the if (context->glFenceSync) branch as well.
Created attachment 235320 [details] [review] More changes Ok, here is another patch with some further changes to squash in as per your suggestions. I've moved the timeout handling into a function called _cogl_fence_update_timeout so that it can be handled within cogl-fence.c. I think it should be ok to assume that Cogl's context is bound when it comes to check the fences. Cogl basically binds its context once and assumes that it always remains bound. The only situation where there would be another context is with the CoglGLES2Context support. In that case the application can push a second context but it must pop it back before calling any other Cogl functions. Therefore the context should always have been restored by the time cogl_poll_dispatch is called. I guess the next step is to port it to master.
Created attachment 239266 [details] [review] Add fence API cogl_framebuffer_add_fence creates a synchronisation fence, which will invoke a user-specified callback when the GPU has finished executing all commands provided to it up to that point in time. Support is currently provided for GL 3.x's GL_ARB_sync extension, and EGL's EGL_KHR_fence_sync (when used with OpenGL ES). Signed-off-by: Daniel Stone <daniel@fooishbar.org> Reviewed-by: Neil Roberts <neil@linux.intel.com>
This is the updated patch, with your two sets of changes taken in, rebased on top of current master (9632635a).
Created attachment 239572 [details] [review] Yet more changes to the sync patch Thanks again for the patch. After reviewing it again and discussing it with Rob Bragg, we were a bit worried that we might eventually want to pass more information to the fence callback such as timing information about when the fence was hit. The current callback has no leeway to squeeze in the extra information so we were a bit worried we'd be boxing ourselves into a corner and end up with a weird API if we later add it. This patch changes it so that the callback function has an extra CoglFence* pointer as the first parameter. Currently this is just always set to NULL but the idea is it could eventually be a refable object with getters to query extra information. We also want to leave room to eventually be able to configure the fence object before adding it to a framebuffer. For example we might eventually want to optionally disable the automatic flushing or to link it with an X sync object via the GL_EXT_x11_sync_object extension. To leave room for that, this patch renames the fence functions to: cogl_framebuffer_add_fence_callback and cogl_framebuffer_cancel_fence_callback Then if we eventually want to have a first-class configurable CoglFence object that the user can manipulate we can have: CoglFence * cogl_fence_new (void); CoglFenceClosure * cogl_fence_add_callback (CoglFence *fence, CoglFenceCallback callback, void *user_data); void cogl_fence_set_funky_configuration_thingy (CoglFence *fence, CoglBool value); void cogl_framebuffer_add_fence (CoglFramebuffer *fb, CoglFence *fence); The cogl_fence_add_callback() function will take a reference on the fence object which will be owned by the fence closure object. The closure will automatically be destroyed when it is invoked as before which will also cause the fence to be unref'd. If we do end up with that API then cogl_framebuffer_add_fence_callback would just be like a convenience wrapper for implicitly creating a fence object, adding a callback to it and adding it to the framebuffer. That way we don't have to make a confusing new API that deprecates the old API. In addition to that change, I noticed that the EGL winsys implementation of the fence submission wasn't checking whether the feature was found before trying to call eglCreateSync. This meant that if you have the extension in your headers but not in the implementation and run Cogl with the EGL winsys then it will crash. Also I thought we might as well use the slice allocator for the fence closure objects. If you're happy with these changes then we can just squash it in and merge the patch.
Sorry, I'm really not sure why I didn't reply to this earlier. Your changes look fine to me.
Ok, we had a minor hold up due to how the mainloop integration was moved down to CoglRenderer, but I've rebased the patch now and landed in master and the cogl-1.16 branch. thanks for the good work!