GNOME Bugzilla – Bug 796640
qmlglsink: synchronization issue
Last modified: 2018-11-03 15:31:21 UTC
If you want the fix please tell me how I might contribute, I would be happy to push a branch with the fix. //jimmy
You can use git format-patch to produce a patch file, and then attach it to this issue, we will then review and merge when the patch is correct.
Created attachment 372740 [details] [review] Contains a git patch with 2 commits that should resolve the issue The patch contains 2 commits. The first commit fixes the actual stuttering bug. The second commit modifies the qmlsink example to use the moving ball pattern, this pattern is much better for detecting misordered friends and read after write issues.
Created attachment 372741 [details] Patch as a tar.gz attempt #2 Let me try not checking the "patch" checkbox :)
Created attachment 372742 [details] [review] Fixed unsafe asynchronous usage of GstBuffer between CPU and GPU
Created attachment 372743 [details] [review] Changed qmlglsink example to use a moving ball pattern
I've uploaded the patches as 2 separate plain files so they can be more easily reviewed. 2 comments: * The git commit summary should be a single line less than 76 characters, then leave 1 blank line and write any commit log description that needs more than 76 chars. * 0001 contains some extra unnecessary white-space changes Does the mutex actually have any effect, or is the gst_gl_sync_meta_wait_cpu() call sufficient? I gather the bug you're seeing is flashing and out-of-order frames, but how are you reproducing it? I haven't seen such behaviour with qmlglsink myself.
(In reply to Jan Schmidt from comment #6) > I've uploaded the patches as 2 separate plain files so they can be more > easily reviewed. > > 2 comments: > * The git commit summary should be a single line less than 76 characters, > then leave 1 blank line and write any commit log description that needs more > than 76 chars. > * 0001 contains some extra unnecessary white-space changes Do you need me to submit a new patch with modified git commit summaries? > > Does the mutex actually have any effect, or is the > gst_gl_sync_meta_wait_cpu() call sufficient? In my testing the gst_gl_sync_meta_wait_cpu call was sufficient (appears to be the same as glFinish()) BUT From reading the comments it seemed as if the gstreamer thread calls GstQSGTexture::setBuffer while Qt:s render thread calls on GstQSGTexture::bind, this prompted me to add this lock. > > I gather the bug you're seeing is flashing and out-of-order frames, but how > are you reproducing it? I haven't seen such behaviour with qmlglsink myself. Yes flashing and out-of-order frames. The issue is clearly visible on the Tegra X2 platform (most of the time) when running the modified qmlglsink example (moving ball pattern), it is even more obvious when live streaming from a camera. Since this is timing related, it might not be visible or rarely visible on other platforms. I could snap a video of the issue with my cell-phone and send to you if you really want to see it.
(In reply to jimmy from comment #7) > (In reply to Jan Schmidt from comment #6) > > I've uploaded the patches as 2 separate plain files so they can be more > > easily reviewed. > > > > 2 comments: > > * The git commit summary should be a single line less than 76 characters, > > then leave 1 blank line and write any commit log description that needs more > > than 76 chars. > > * 0001 contains some extra unnecessary white-space changes > > Do you need me to submit a new patch with modified git commit summaries? That would be great. FYI, many of us use the git-bz tool to help with uploading patches to Bugzilla (https://wiki.gnome.org/Git/WorkingWithPatches) > > > > Does the mutex actually have any effect, or is the > > gst_gl_sync_meta_wait_cpu() call sufficient? > > In my testing the gst_gl_sync_meta_wait_cpu call was sufficient (appears to > be the same as glFinish()) BUT > > From reading the comments it seemed as if the gstreamer thread calls > GstQSGTexture::setBuffer while Qt:s render thread calls on > GstQSGTexture::bind, this prompted me to add this lock. Which comment are you thinking of? If I remember correctly, the setBuffer call is made while blocking the Qt render thread, preventing ::bind from running simultaneously. > > > > I gather the bug you're seeing is flashing and out-of-order frames, but how > > are you reproducing it? I haven't seen such behaviour with qmlglsink myself. > > Yes flashing and out-of-order frames. > > The issue is clearly visible on the Tegra X2 platform (most of the time) > when running the modified qmlglsink example (moving ball pattern), it is > even more obvious when live streaming from a camera. OK. It would be good to mention that it happens on that platform. > Since this is timing related, it might not be visible or rarely visible on > other platforms. You're quite right. I tend to test on platforms with shared GPU/CPU system memory or simpler GPUs that might not parallelise so much. > I could snap a video of the issue with my cell-phone and send to you if you > really want to see it.
(In reply to Jan Schmidt from comment #8) > (In reply to jimmy from comment #7) > > (In reply to Jan Schmidt from comment #6) > > > I've uploaded the patches as 2 separate plain files so they can be more > > > easily reviewed. > > > > > > 2 comments: > > > * The git commit summary should be a single line less than 76 characters, > > > then leave 1 blank line and write any commit log description that needs more > > > than 76 chars. > > > * 0001 contains some extra unnecessary white-space changes > > > > Do you need me to submit a new patch with modified git commit summaries? > > That would be great. FYI, many of us use the git-bz tool to help with > uploading patches to Bugzilla (https://wiki.gnome.org/Git/WorkingWithPatches) Thanks! > > > > > > > Does the mutex actually have any effect, or is the > > > gst_gl_sync_meta_wait_cpu() call sufficient? > > > > In my testing the gst_gl_sync_meta_wait_cpu call was sufficient (appears to > > be the same as glFinish()) BUT > > > > From reading the comments it seemed as if the gstreamer thread calls > > GstQSGTexture::setBuffer while Qt:s render thread calls on > > GstQSGTexture::bind, this prompted me to add this lock. > > Which comment are you thinking of? If I remember correctly, the setBuffer > call is made while blocking the Qt render thread, preventing ::bind from > running simultaneously. Yes I seem to have misinterpreted the comment, it says that it is indeed blocking the rendering thread. Is my understanding correct that this is the call that eventually invokes the bind() function? : // from qtitem.cc: gst_buffer_replace (&qt_item->priv->buffer, buffer); QMetaObject::invokeMethod(qt_item, "update", Qt::QueuedConnection); g_mutex_unlock (&qt_item->priv->lock); If so, is there really a guarantee that rendering has been invoked by the time the mutex is unlocked? (note there is also Qt::BlockingQueuedConnection). > > > > > > > I gather the bug you're seeing is flashing and out-of-order frames, but how > > > are you reproducing it? I haven't seen such behaviour with qmlglsink myself. > > > > Yes flashing and out-of-order frames. > > > > The issue is clearly visible on the Tegra X2 platform (most of the time) > > when running the modified qmlglsink example (moving ball pattern), it is > > even more obvious when live streaming from a camera. > > OK. It would be good to mention that it happens on that platform. You mean in the git commit message or generally when submitting bugs? :) > > > Since this is timing related, it might not be visible or rarely visible on > > other platforms. > > You're quite right. I tend to test on platforms with shared GPU/CPU system > memory or simpler GPUs that might not parallelise so much. > > > I could snap a video of the issue with my cell-phone and send to you if you > > really want to see it.
Review of attachment 372742 [details] [review]: Maybe improve the commit message a bit ? And maybe two commits (optional, just proposing). qttexture: Protect buffer access using a mutext qttexture: Wait on the gl fence when using imported buffers Don't we need to wait on the fence before the render code ? ::: ext/qt/gstqsgtexture.cc @@ +75,3 @@ + + + I don't think these empty lines makes sense.
Review of attachment 372742 [details] [review]: What you might be seeing is a use-before-release where qmlglsink releases the buffer before Qt has drawn with it and could be written too by upstream. Fixing that involves keeping the buffer and/or gst_video_frame_map () alive between successive bind() calls (or error/destruction). ::: ext/qt/gstqsgtexture.cc @@ +152,3 @@ + * since the memory in buffer_ is used by other threads + */ + gst_gl_sync_meta_wait_cpu (sync_meta, this->qt_context_); This is actually the wrong API to cause a wait and will cause a glFinish() (or equivalent in GL fences) which is way to heavy handed for this. The fact that they're used by other threads/contexts does not matter in GPU space. You only need to wait for the GPU to complete (of which there is already code to do that).
(In reply to jimmy from comment #9) > (In reply to Jan Schmidt from comment #8) > > (In reply to jimmy from comment #7) > > > (In reply to Jan Schmidt from comment #6) > > > > Does the mutex actually have any effect, or is the > > > > gst_gl_sync_meta_wait_cpu() call sufficient? > > > > > > In my testing the gst_gl_sync_meta_wait_cpu call was sufficient (appears to > > > be the same as glFinish()) BUT > > > > > > From reading the comments it seemed as if the gstreamer thread calls > > > GstQSGTexture::setBuffer while Qt:s render thread calls on > > > GstQSGTexture::bind, this prompted me to add this lock. > > > > Which comment are you thinking of? If I remember correctly, the setBuffer > > call is made while blocking the Qt render thread, preventing ::bind from > > running simultaneously. > > Yes I seem to have misinterpreted the comment, it says that it is indeed > blocking the rendering thread. > > Is my understanding correct that this is the call that eventually invokes > the bind() function? : > > // from qtitem.cc: > gst_buffer_replace (&qt_item->priv->buffer, buffer); > QMetaObject::invokeMethod(qt_item, "update", Qt::QueuedConnection); > g_mutex_unlock (&qt_item->priv->lock); > > If so, is there really a guarantee that rendering has been invoked by the > time the mutex is unlocked? (note there is also > Qt::BlockingQueuedConnection). That's the way we force an update to be displayed. Qt can add it's own in there if it likes. Qt scenegraph update happens in two stages which can actually be in two different threads (where supported by drivers et al. A helpful diagram is available from the Qt docs: http://doc.qt.io/qt-5/qtquick-visualcanvas-scenegraph.html#threaded-render-loop-threaded So, there are three parts to GStreamer displaying things using qmlglsink: 1. Receive buffer and send it to the component/item. Rather boring stuff, just atomically updates a pointer atm. 2. Qt calls item::updatePaintNode() which uses the updated buffer and then sets the buffer/caps/information on the associated node that is attached to scenegraph like setBuffer, setCaps. Happens in the updatePaintNode section of the above diagram 3. Qt then calls bind() whenever it wants the texture bound and drawn. Happens in the 'scene graph is rendered' stage in the above docs. With all that, the multithreaded approach is actually not used everywhere (as mentioned in the above link) like on mesa drivers and the threaded vs non-threaded loop can be changed with the QSG_RENDER_LOOP environment variable. Try changing it between 'threaded' and 'basic' and see if that makes a difference.
(In reply to Matthew Waters (ystreet00) from comment #11) > Review of attachment 372742 [details] [review] [review]: > > What you might be seeing is a use-before-release where qmlglsink releases > the buffer before Qt has drawn with it and could be written too by upstream. I think that is exactly what is happening, it looks to me like the buffer is sometimes written to by another thread after the lock has been released. BUT that doesn't really make sense either, aren't all writes to this buffer scheduled to the same stream / command queue? Wouldn't that cause them to be done in order? Another question I have is if I'm correct in assuming that the glupload element is writing to the same GstBuffer? > Fixing that involves keeping the buffer and/or gst_video_frame_map () alive > between successive bind() calls (or error/destruction). > > ::: ext/qt/gstqsgtexture.cc > @@ +152,3 @@ > + * since the memory in buffer_ is used by other threads > + */ > + gst_gl_sync_meta_wait_cpu (sync_meta, this->qt_context_); > > This is actually the wrong API to cause a wait and will cause a glFinish() > (or equivalent in GL fences) which is way to heavy handed for this. The > fact that they're used by other threads/contexts does not matter in GPU > space. > > You only need to wait for the GPU to complete (of which there is already > code to do that). Yes causing a glFinish() is probably very disruptive to the GL command queue (will likely cause sync bubbles). So how do we wait for the GPU to finish appropriately? There are synchronization fences that could be imposed (glFenceSync / glClientFenceSync), but in my view as long as they are done on the same GPU stream in order there should be no need to halt any CPU threads? By the way, making a copy of the incoming GstBuffer also resolves the issue, but I'm sure we would also like to avoid making unnecessary copies. Anyways I'm gonna do some more digging, your input is very helpful. Thanks
(In reply to Nicolas Dufresne (ndufresne) from comment #10) > Review of attachment 372742 [details] [review] [review]: > > Maybe improve the commit message a bit ? And maybe two commits (optional, > just proposing). I was discussing 2 commits, one per file with Jan Schmidt. > > qttexture: Protect buffer access using a mutext > qttexture: Wait on the gl fence when using imported buffers > > Don't we need to wait on the fence before the render code ? There is a discussion on-going regarding the most efficient solution, we might want to avoid CPU/GPU synchronization, but yeah: ensuring all GL calls are done before render calls make sense. > > ::: ext/qt/gstqsgtexture.cc > @@ +75,3 @@ > + > + > + > > I don't think these empty lines makes sense. Agreed! Thanks
(In reply to jimmy from comment #13) > (In reply to Matthew Waters (ystreet00) from comment #11) > > Review of attachment 372742 [details] [review] [review] [review]: > > > > What you might be seeing is a use-before-release where qmlglsink releases > > the buffer before Qt has drawn with it and could be written too by upstream. > > I think that is exactly what is happening, it looks to me like the buffer is > sometimes written to by another thread after the lock has been released. > > BUT that doesn't really make sense either, aren't all writes to this buffer > scheduled to the same stream / command queue? Wouldn't that cause them to be > done in order? In theory what might happen is 1. buffer come in and is pushed up to Qt for display 2. next buffer comes in and the previous one is freed and reused. 3. Problem here is that Qt might not have displayed the previous buffer or is in the process of displaying it while the previous buffer is written to by upstream (as it has been freed). So, either we hold the buffer for an extra frame, or only free after 1. the next frame has arrived and 2. the old frame is not going to be used by Qt. 2. Might be signalled through fences after the next Qt frame update has completed. > Another question I have is if I'm correct in assuming that the glupload > element is writing to the same GstBuffer? Depends on the pipeline as to which element exactly is writing to the buffer that is used by qmlglsink. e.g. videotestsrc ! glupload ! qmlglsink -> videotestsrc uses downstream's buffer pool and is the writing element videotestsrc ! video/x-raw,format=BGRA ! glupload ! glcolorconvert ! qmlglsink -> glcolorconvert converts between BGRA and RGBA so is the writing element for qmlglsink. A case where glupload is the writing element is if upstream of glupload does not us the GLBufferPool and glupload copies/wraps into a GL texture. > > > > ::: ext/qt/gstqsgtexture.cc > > @@ +152,3 @@ > > + * since the memory in buffer_ is used by other threads > > + */ > > + gst_gl_sync_meta_wait_cpu (sync_meta, this->qt_context_); > > > > This is actually the wrong API to cause a wait and will cause a glFinish() > > (or equivalent in GL fences) which is way to heavy handed for this. The > > fact that they're used by other threads/contexts does not matter in GPU > > space. > > > > You only need to wait for the GPU to complete (of which there is already > > code to do that). > > Yes causing a glFinish() is probably very disruptive to the GL command queue > (will likely cause sync bubbles). > > So how do we wait for the GPU to finish appropriately? There are > synchronization fences that could be imposed (glFenceSync / > glClientFenceSync), but in my view as long as they are done on the same GPU > stream in order there should be no need to halt any CPU threads? That's what gst_gl_sync_meta_wait() does behind the hood already :) > By the way, making a copy of the incoming GstBuffer also resolves the issue, > but I'm sure we would also like to avoid making unnecessary copies. > > Anyways I'm gonna do some more digging, your input is very helpful.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/485.