After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 691752 - add fence/sync API
add fence/sync API
Status: RESOLVED FIXED
Product: cogl
Classification: Platform
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: Cogl maintainer(s)
Cogl maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2013-01-15 00:19 UTC by Daniel Stone
Modified: 2013-05-28 20:41 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Add support for fence/sync/completion extensions (26.75 KB, patch)
2013-01-15 00:20 UTC, Daniel Stone
none Details | Review
Add synchronisation/fence API (19.14 KB, patch)
2013-01-15 03:18 UTC, Daniel Stone
none Details | Review
Add support for fence/sync/completion extensions (30.62 KB, patch)
2013-02-04 18:19 UTC, Daniel Stone
none Details | Review
Feedback patch (15.15 KB, patch)
2013-02-06 16:29 UTC, Neil Roberts
needs-work Details | Review
More changes (3.77 KB, patch)
2013-02-06 17:38 UTC, Neil Roberts
none Details | Review
Add fence API (31.98 KB, patch)
2013-03-19 15:10 UTC, Daniel Stone
none Details | Review
Yet more changes to the sync patch (11.59 KB, patch)
2013-03-22 18:04 UTC, Neil Roberts
none Details | Review

Description Daniel Stone 2013-01-15 00:19:07 UTC
Right now, it's impossible for external Cogl users to find out when their submitted commands have finished executing on the GPU.  This is a particular problem when using the patchset to enable texture_external importing, since the specification allows implementations to not implement any behind-the-scenes synchronisation, and explicitly notes that glFinish() or similar may be required before a texture can be safely reused.

The usecase that drove this API is importing OpenMAX buffers into clutter-gst, where they are used as part of a standard Clutter scene, and then returned to the OpenMAX buffer pool when ready.  It probably goes without saying that the platform provides no behind-the-scenes synchronsation; as soon as we unref the buffer in clutter-gst, OpenMAX is liable to write to it at any moment.  Without a fence API, we have no idea when the scene has completed execution, so we're forced to either copy the textures (which has an unacceptable performance impact) or just guess when the buffer is no longer in use.

Note that the EGL_KHR_fence_sync integration is currently untested; I'm adding support for it for Mesa/i965 at the moment.  GL_ARB_sync works though.
Comment 1 Daniel Stone 2013-01-15 00:20:28 UTC
Created attachment 233487 [details] [review]
Add support for fence/sync/completion extensions

The only new piece of public API is cogl_fence_mark, which inserts a
synchronisation fence into the GL command stream, executing the provided
callback after that fence has been signalled by the GL or EGL
implementation.

Support is currently provided for GL 3.x's GL_ARB_sync extension, and
EGL's EGL_KHR_fence_sync (when used with OpenGL ES).

Signed-off-by: Daniel Stone <daniel@fooishbar.org>
Comment 2 Daniel Stone 2013-01-15 00:51:48 UTC
This might help GLX's _cogl_winsys_onscreen_swap_region too, but I'm not sure of the synchronisation issues, e.g. what happens if rendering is submitted after the swap_region hook, but before the callback for completion, so I haven't prepared a patch.  GLES has similar uses too, for flipped copies.
Comment 3 Daniel Stone 2013-01-15 00:55:27 UTC
Ah, and with that I just found the CoglFrameCallback thread.  Do you think fences would semantically fit in there, or would that just be combining things for the sake of it?
Comment 4 Daniel Stone 2013-01-15 03:18:31 UTC
Created attachment 233492 [details] [review]
Add synchronisation/fence API

Almost entirely based on Cogl's cogl_fence_{new,destroy}, but with an
additional fun layer of indirection due to the need to defer posting the
fences until after the stage redraw commands have been submitted.

Signed-off-by: Daniel Stone <daniel@fooishbar.org>
Comment 5 Daniel Stone 2013-01-15 03:19:17 UTC
The above (comment #4) is how I've hooked this up in Clutter: pretty much exactly the same API.  I'm not happy with _clutter_stage_get_fences and friends, but that's an implementation detail.
Comment 6 Daniel Stone 2013-01-15 03:19:17 UTC
The above (comment #4) is how I've hooked this up in Clutter: pretty much exactly the same API.  I'm not happy with _clutter_stage_get_fences and friends, but that's an implementation detail.
Comment 7 Neil Roberts 2013-01-15 15:59:15 UTC
(In reply to comment #1)

This looks really cool, thanks for the patch! After discussing with
Rob Bragg, here are some thoughts about it:

We think it would might make sense to attach the fence to a
framebuffer rather than to the context. This would allow us to
eventually consider the command streams for separate framebuffers
separately. The API could be something like this:

CoglFenceClosure *
cogl_framebuffer_add_fence (CoglFramebuffer *fb,
                            CoglFenceCallback *callback,
                            void *user_data);

void
cogl_framebuffer_cancel_fence (CoglFramebuffer *fb,
                               CoglFenceClosure *closure);

The CoglFenceClosure* would be an opaque pointer which is just used as
an identifier to cancel it again. This is probably worth doing so that
an application can identify multiple fences that use the same callback
function.

The CoglFenceClosure* naming is so that we could make it similar to
the CoglFrameCallback API by making that return a CoglFrameClosure*
pointer.

The ‘cancel’ naming should hopefully make it obvious that you don't
need to manually destroy the object if you do wait until the callback
is invoked.

It doesn't seem to me like we should try to squeeze the event objects
into the same callback mechanism as CoglFrameCallback because you are
quite likely to want an orthogonal bit of code to listen for a
specific fence. It would be annoying to make it listen to all of the
frame events and ignore them until it found the specific event for its
fence.

We need to consider how this relates to the Cogl journal. Currently it
looks like it won't behave correctly if there are entries in the
journal because GL may consider the fence object to be passed before
the journal is flushed. I guess eventually it would be good to always
insert the fence object into the journal and then delay creating the
GL object until the journal is flushed. For now we could just always
flush the journal every time a fence is inserted. This is another
incentive to attach the fence objects to the framebuffer rather than
the context because each framebuffer has its own journal.

Currently the patch relies on cogl_poll_get_info to wake up the main
loop to check the fences. Cogl doesn't currently require an
application to call that for some winsys backends such as SDL. We
should either try to find a way to hook the timeout into SDL's main
loop mechanism or maybe just blacklist the feature with the SDL
winsys. I'd be happy to land the patch and worry about that later
though.

The 100ms timeout seems too big. Surely you'd expect anything you
submit to GL to complete within the timespan of a single frame, so
maybe something in the order of 4 or 8ms would be better?

Annoyingly we can't use C99 in the files that aren't Linux-only
because we support compiling with Visual Studio. This is being done in
cogl_fence_destroy().

If you can be bothered you could make it use the COGL_TAILQ type for
the list of fences instead of GSList. Then it wouldn't have to walk
the list when removing the fence. This is probably just
micro-optimising though so it's fine either way.
Comment 8 Daniel Stone 2013-01-15 16:16:36 UTC
Thanks a lot for the review! Appreciate it.  For the record, there's a slightly newer version on git://git.collabora.com/git/user/daniels/cogl in the 'fence' branch, but the essentials are unchanged.

Your comments all sound good to me, and I'll try to get them fixed up, including taking a look into the journal.

The one thing I'll note, though, is the timeout: drivers including Intel can potentially end up calling a blocking ioctl() once for every fence check, so checking too often could actually be reasonably punitive, performance-wise.  I chose 100ms because I'm biased towards the video usecases, but now you mention it, I guess it does make it pretty useless as a slightly less punitive version of glFinish. ;) Maybe the thing to do would be to start with an aggressive timeout (e.g. quarter of a frame) and slowly back off if we're repeatedly polling on the one fence?

Of course, the ideal here would be to get a DRM event so we don't have to poll at all.  I'm working on that, but it would take a while to actually hit mainline kernels/libdrm/Mesa.
Comment 9 Daniel Stone 2013-02-04 18:19:19 UTC
I've updated this in the git branch, and will be attaching a new patch in a sec.  The patch is actually against cogl-1.12; I'll attach a patch rebased against master when I'm less jetlagged. ;)

This should have addressed all the review comments, including properly managing the journal.  It makes fences a two-stage thing, much as in the proposed Clutter patch: first we add them to a tailq in the journal, then when we flush the journal, we actually submit the fences and move them to another tailq in the context.
Comment 10 Daniel Stone 2013-02-04 18:19:42 UTC
Created attachment 235160 [details] [review]
Add support for fence/sync/completion extensions

The only new piece of public API is cogl_fence_mark, which inserts a
synchronisation fence into the GL command stream, executing the provided
callback after that fence has been signalled by the GL or EGL
implementation.

Support is currently provided for GL 3.x's GL_ARB_sync extension, and
EGL's EGL_KHR_fence_sync (when used with OpenGL ES).

Signed-off-by: Daniel Stone <daniel@fooishbar.org>
Comment 11 Daniel Stone 2013-02-04 20:17:29 UTC
And once again the commit message is wrong - I've fixed that in my branch.
Comment 12 Neil Roberts 2013-02-06 16:29:51 UTC
Created attachment 235314 [details] [review]
Feedback patch

Thanks for the update. Here are some comments in the form of a patch
which makes the suggested changes. If it's ok with you we can just
squash it back into your patch. I started off just doing some minor
style changes but I may have got a bit carried away. I hope I haven't
butchered your patch too much.

The handling for the journal isn't quite how I was expecting it to
work because the sync is inserted after all of the journal commands
are complete whereas if the fence was actually inserted into the
journal it could insert the sync exactly at the right place in the
command stream. I guess it's not too big a deal if the fence is a bit
late but it might not be ideal if the journal has a lot of expensive
commands. I'm happy to leave it as it is though.

The EGL winsys version of the fence calls glFlush whereas the GL3
version uses the GL_SYNC_FLUSH_COMMANDS_BIT flag when polling the
fence. However the EGL extension has an equivalent flag called
EGL_SYNC_FLUSH_COMMANDS_BIT_KHR. Is there any particular reason why
the patch isn't using that?

Currently the fence will never be hit unless the journal is flushed.
From the user's perspective this will only happen if they do a swap
buffers. The user might want to do some stuff such as upload a texture
and then wait for that to complete without doing a swap buffers. I
think it would be better to force a flush of the journal whenever
we're about to go idle to guarantee that the fence will be passed in a
finite time.

I think it would be nice to make sure the destructor for
CoglFramebuffer automatically cancels all of the syncs, otherwise it
will probably crash if you free the framebuffer instead of waiting for
the sync to complete.

It's a bit nasty to invoke the user's callback directly during the
journal flush if it failed to create a sync object. We try to
guarantee that callbacks will only be invoked at certain points (eg,
during cogl_poll_dispatch) to make writing the application simpler.
This patch changes it so that _cogl_fence_submit can not fail but
instead when it comes to check the fence if there is no actual fence
object it will immediately invoke the callback.

If the journal is empty it might as well just submit it immediately.

The timeout returned from cogl_poll_get_info is in microseconds not
milliseconds. It should probably wait for more than 5 microseconds :)

Instead of making the tests silently fail if a feature is missing, we
have a mechanism to mark that the test requires the feature in
test-conform-main. That way the test report can say ‘n/a’ instead of
‘pass’.

We don't use gpointer in Cogl but just void* instead because it's more
well known.

cogl_framebuffer_add_fence should return NULL instead of FALSE.
Comment 13 Daniel Stone 2013-02-06 16:43:32 UTC
Review of attachment 235314 [details] [review]:

Looks good to me (and thank god for void * rather than gpointer), thanks again for the thorough review.  Just a couple of minor nitpicks below, otherwise looks good to me.

::: cogl/cogl-fence.c
@@ +55,1 @@
     {

The test for this branch needs to include !fence->fence_obj, else we'll add both a winsys and a GL fence (if possible), leaking the former.

::: cogl/cogl-poll.c
@@ +64,3 @@
 out:
   /* ... unless we have pending fence objects to query. */
   if (_cogl_fence_should_check (context) && (*timeout == -1 || *timeout > 5))

Need to change 5 to 5000 here as well.

::: cogl/winsys/cogl-winsys-egl.c
@@ +847,3 @@
   ret = renderer->pf_eglClientWaitSync (renderer->edpy,
                                         fence,
+                                        EGL_SYNC_FLUSH_COMMANDS_BIT_KHR,

I didn't do this the first time around, because I wasn't sure that we'd still necessarily have the same GL(ES) context bound.  If we're guaranteed to have the right context bound then this is OK.
Comment 14 Daniel Stone 2013-02-06 16:44:35 UTC
Ugh, that's not particularly useful formatting! The first comment ('the test for this branch ...') was in _cogl_fence_submit: we could potentially hit the if (winsys->fence_add) branch, followed by the if (context->glFenceSync) branch as well.
Comment 15 Neil Roberts 2013-02-06 17:38:08 UTC
Created attachment 235320 [details] [review]
More changes

Ok, here is another patch with some further changes to squash in as
per your suggestions.

I've moved the timeout handling into a function called
_cogl_fence_update_timeout so that it can be handled within
cogl-fence.c.

I think it should be ok to assume that Cogl's context is bound when it
comes to check the fences. Cogl basically binds its context once and
assumes that it always remains bound. The only situation where there
would be another context is with the CoglGLES2Context support. In that
case the application can push a second context but it must pop it
back before calling any other Cogl functions. Therefore the context
should always have been restored by the time cogl_poll_dispatch is
called.

I guess the next step is to port it to master.
Comment 16 Daniel Stone 2013-03-19 15:10:07 UTC
Created attachment 239266 [details] [review]
Add fence API

cogl_framebuffer_add_fence creates a synchronisation fence, which will
invoke a user-specified callback when the GPU has finished executing all
commands provided to it up to that point in time.

Support is currently provided for GL 3.x's GL_ARB_sync extension, and
EGL's EGL_KHR_fence_sync (when used with OpenGL ES).

Signed-off-by: Daniel Stone <daniel@fooishbar.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Comment 17 Daniel Stone 2013-03-19 15:10:57 UTC
This is the updated patch, with your two sets of changes taken in, rebased on top of current master (9632635a).
Comment 18 Neil Roberts 2013-03-22 18:04:32 UTC
Created attachment 239572 [details] [review]
Yet more changes to the sync patch

Thanks again for the patch. After reviewing it again and discussing it
with Rob Bragg, we were a bit worried that we might eventually want to
pass more information to the fence callback such as timing information
about when the fence was hit. The current callback has no leeway to
squeeze in the extra information so we were a bit worried we'd be
boxing ourselves into a corner and end up with a weird API if we later
add it. This patch changes it so that the callback function has an
extra CoglFence* pointer as the first parameter. Currently this is
just always set to NULL but the idea is it could eventually be a
refable object with getters to query extra information.

We also want to leave room to eventually be able to configure the
fence object before adding it to a framebuffer. For example we might
eventually want to optionally disable the automatic flushing or to
link it with an X sync object via the GL_EXT_x11_sync_object
extension. To leave room for that, this patch renames the fence
functions to:

cogl_framebuffer_add_fence_callback
and cogl_framebuffer_cancel_fence_callback

Then if we eventually want to have a first-class configurable
CoglFence object that the user can manipulate we can have:

CoglFence *
cogl_fence_new (void);

CoglFenceClosure *
cogl_fence_add_callback (CoglFence *fence,
                         CoglFenceCallback callback,
                         void *user_data);

void
cogl_fence_set_funky_configuration_thingy (CoglFence *fence,
                                           CoglBool value);

void
cogl_framebuffer_add_fence (CoglFramebuffer *fb,
                            CoglFence *fence);

The cogl_fence_add_callback() function will take a reference on the
fence object which will be owned by the fence closure object. The
closure will automatically be destroyed when it is invoked as before
which will also cause the fence to be unref'd.

If we do end up with that API then cogl_framebuffer_add_fence_callback
would just be like a convenience wrapper for implicitly creating a
fence object, adding a callback to it and adding it to the
framebuffer. That way we don't have to make a confusing new API that
deprecates the old API.

In addition to that change, I noticed that the EGL winsys
implementation of the fence submission wasn't checking whether the
feature was found before trying to call eglCreateSync. This meant that
if you have the extension in your headers but not in the
implementation and run Cogl with the EGL winsys then it will crash.

Also I thought we might as well use the slice allocator for the fence
closure objects.

If you're happy with these changes then we can just squash it in and
merge the patch.
Comment 19 Daniel Stone 2013-05-06 09:31:32 UTC
Sorry, I'm really not sure why I didn't reply to this earlier.  Your changes look fine to me.
Comment 20 Robert Bragg 2013-05-28 20:41:25 UTC
Ok, we had a minor hold up due to how the mainloop integration was moved down to CoglRenderer, but I've rebased the patch now and landed in master and the cogl-1.16 branch.

thanks for the good work!