After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 789475 - omx: wait for flush complete and buffers being released when flushing
omx: wait for flush complete and buffers being released when flushing
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-omx
git master
Other Linux
: Normal normal
: 1.13.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-10-25 12:08 UTC by Guillaume Desmottes
Modified: 2018-08-29 16:20 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gst_omx_port_set_flushing: simplify waiting loop (1.68 KB, patch)
2017-10-25 12:09 UTC, Guillaume Desmottes
committed Details | Review
omx: wait for flush complete and buffers being released when flushing (1.15 KB, patch)
2017-10-25 12:09 UTC, Guillaume Desmottes
none Details | Review
omx: wait for flush complete and buffers being released when flushing (1.42 KB, patch)
2017-10-31 10:09 UTC, Guillaume Desmottes
none Details | Review
omx: factor out should_wait_until_flushed() (2.00 KB, patch)
2018-08-23 07:44 UTC, Guillaume Desmottes
none Details | Review
omx: factor out should_wait_until_flushed() (1.90 KB, patch)
2018-08-23 07:44 UTC, Guillaume Desmottes
committed Details | Review
omx: wait for flush complete and buffers being released when flushing (1.95 KB, patch)
2018-08-23 07:44 UTC, Guillaume Desmottes
committed Details | Review

Description Guillaume Desmottes 2017-10-25 12:08:59 UTC
.
Comment 1 Guillaume Desmottes 2017-10-25 12:09:22 UTC
Created attachment 362253 [details] [review]
gst_omx_port_set_flushing: simplify waiting loop

No semantic change so far, I just made the 'while' end condition easier
to understand as a first step before changing it.
- move error/time out checks inside the loop to make it clearer on what
we are actually waiting for.
- group port->buffers checks together with parenthesis as they are part
of the same conceptual check: waiting for all buffers to be released.
Comment 2 Guillaume Desmottes 2017-10-25 12:09:28 UTC
Created attachment 362254 [details] [review]
omx: wait for flush complete and buffers being released when flushing

As stated in the existing comment, when flusing we should wait for OMX
to send the flush command complete event AND all ports being released.
We were stopping as soon as one of those condition was met.

Fix a race between FillThisBufferDone/EmptyBufferDone and the flush
EventCmdComplete messages.
Comment 3 Nicolas Dufresne (ndufresne) 2017-10-30 18:55:48 UTC
Review of attachment 362253 [details] [review]:

Looks good.
Comment 4 Nicolas Dufresne (ndufresne) 2017-10-30 18:58:00 UTC
Review of attachment 362253 [details] [review]:

Looks good.
Comment 5 Nicolas Dufresne (ndufresne) 2017-10-30 18:58:00 UTC
Review of attachment 362253 [details] [review]:

Looks good.
Comment 6 Nicolas Dufresne (ndufresne) 2017-10-30 18:58:00 UTC
Review of attachment 362253 [details] [review]:

Looks good.
Comment 7 Nicolas Dufresne (ndufresne) 2017-10-30 19:09:03 UTC
Review of attachment 362254 [details] [review]:

Wow, that explains all sort of bug I have seen in the past. And it's specially important since the buffer tracking is horribly unsafe in gstomx.c.
Comment 8 Nicolas Dufresne (ndufresne) 2017-10-30 19:18:56 UTC
just before merging, I'd like some feedback. I've read the spec a little, and receiving the command complete event before the buffers has been returns seems like it would be a component bug. And then if you unblock when all buffers comes back, and you didn't get the command complete, you'll endup with the flushed boolean left set.

Overall, the patch makes things safer and more correct, but my impression is that you should only wait for the command complete (flushed boolean), and warn or error out if buffers where not returned by the component properly. Please provide some feedback so I we don't merge this in the wrong form.
Comment 9 Guillaume Desmottes 2017-10-31 10:09:06 UTC
You're right, the component should release the buffer and then complete the flush command. That's what our OMX implementation does but the ordering isn't guaranteed because of threads.

I amend the commit message to make this clearer.
Comment 10 Guillaume Desmottes 2017-10-31 10:09:22 UTC
Created attachment 362611 [details] [review]
omx: wait for flush complete and buffers being released when flushing

As stated in the existing comment, when flusing we should wait for OMX
to send the flush command complete event AND all ports being released.
We were stopping as soon as one of those condition was met.

Fix a race between FillThisBufferDone/EmptyBufferDone and the flush
EventCmdComplete messages. The OMX implementation is supposed to release
its buffers before posting the EventCmdComplete event but the ordering
isn't guaranteed as the FillThisBufferDone/EmptyBufferDone and
EventHandler callbacks can be called from different threads (cf 2.7
'Thread Safety' in the spec).
Comment 11 Olivier Crête 2017-11-28 18:35:41 UTC
Merged patches

Attachment 362253 [details] pushed as 4c0a8a6 - gst_omx_port_set_flushing: simplify waiting loop
Attachment 362611 [details] pushed as 4211e4c - omx: wait for flush complete and buffers being released when flushing
Comment 12 Nicolas Dufresne (ndufresne) 2018-02-28 21:25:51 UTC
Unfortunately, this was wrong. gst_omx_port_set_flushing() now waits for all the buffers in the pipeline to come back, while it was suppose to wait for the flush command to complete. How it works:

 - We send the OMX_CommandFlush
 - Then our handles will receive it and serialize into our message queue
 - We call gst_omx_component_handle_messages()
 - And then check for port->flushed or wait for the message to arrive

All this is protected with the comp->lock. About the && check, it was an optimization, because it's pointless to wait for the message if all buffers are pending. The component is still being flushed, but we can avoid the waiting (OMX stack can be slow).

I'm not too sure why we unblock anything waiting for message, it's seems pointless but harmless. I might be missing something.

To come back to why it's wrong, the code has been changed so now it wait for all buffers from the entire pipeline to come back. This just impossible, buffers stuck in display sink (last-sample or scannout if kmssink) will only be returned if a new buffers replace them or an allocation/drain query is received. As a side effect, we now always reach the timeout.

Though, we should check the the OMX stack have behaved properly and returned the buffers that was in the OMX stack. Currently, we don't track this, we have no idea which outstanding buffers are in the OMX stack and which one are downstream the pipeline.

In my opinion though, if you received OMX_CommandFlush and the buffers were not returned yet, this is an error. What you seems to have fixed here is a bug in the Zynq OMX stack.
Comment 13 Guillaume Desmottes 2018-03-01 07:52:46 UTC
(In reply to Nicolas Dufresne (stormer) from comment #12)
> Though, we should check the the OMX stack have behaved properly and returned
> the buffers that was in the OMX stack. Currently, we don't track this, we
> have no idea which outstanding buffers are in the OMX stack and which one
> are downstream the pipeline.

buf->used = TRUE is supposed to indicate that a buffer is in the OMX stack. Could we just wait for those?

> In my opinion though, if you received OMX_CommandFlush and the buffers were
> not returned yet, this is an error. What you seems to have fixed here is a
> bug in the Zynq OMX stack.

Agreed that not returning buffers would be a bug in the OMX stack. But iirc, and according to the commit message, we were hitting a race between the EventCmdComplete event and the FillThisBufferDone/EmptyBufferDone.
This was causing issues but I don't remember exactly what.
Comment 14 Nicolas Dufresne (ndufresne) 2018-03-02 16:06:39 UTC
Review of attachment 362611 [details] [review]:

commit 5eed1a7eb2aa5c537738df60e6945f6464e20d2d (HEAD -> master)
Author: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Date:   Fri Mar 2 11:06:08 2018 -0500

    Revert "omx: wait for flush complete and buffers being released when flushing"
    
    This reverts commit 4211e4c29a262f110cb92ddf9c06b403ced233ef.
Comment 15 Nicolas Dufresne (ndufresne) 2018-03-02 16:10:02 UTC
I've kept the cleanup patch for the release, it didn't have any side effect. My next step here is to try and create hard ref dependency between the GstOMXBuffer/Port/Comp and the GstBuffer/Memory travelling in the pipeline. This way, if a buffer is not return, we'll keep the OMX objects alive. If that happens, I will also have to seek to find out why these are leaked of course. It's better to leak all this then to risk crashing the OMX stack (which often crash the driver in practice).
Comment 16 Guillaume Desmottes 2018-08-23 07:44:05 UTC
Created attachment 373430 [details] [review]
omx: factor out should_wait_until_flushed()

No semantic change. Makes the code easier to understand and I'm about to
change the waiting condition.

Upstream-Status: Pending
Signed-off-by: Guillaume Desmottes <guillaume.desmottes@collabora.co.uk>
Comment 17 Guillaume Desmottes 2018-08-23 07:44:43 UTC
Created attachment 373431 [details] [review]
omx: factor out should_wait_until_flushed()

No semantic change. Makes the code easier to understand and I'm about to
change the waiting condition.
Comment 18 Guillaume Desmottes 2018-08-23 07:44:48 UTC
Created attachment 373432 [details] [review]
omx: wait for flush complete and buffers being released when flushing

When flusing we should wait for OMX to send the flush command complete event
AND all ports being released.
We were stopping as soon as one of those condition was met.

Fix a race between FillThisBufferDone/EmptyBufferDone and the flush
EventCmdComplete messages. The OMX implementation is supposed to release
its buffers before posting the EventCmdComplete event but the ordering
isn't guaranteed as the FillThisBufferDone/EmptyBufferDone and
EventHandler callbacks can be called from different threads (cf 2.7
'Thread Safety' in the spec).

Only wait for buffers currently used by OMX as some buffers may not be
in the pending queue because they are held downstream.
Comment 19 Guillaume Desmottes 2018-08-23 07:48:52 UTC
(In reply to Nicolas Dufresne (ndufresne) from comment #12)
> Though, we should check the the OMX stack have behaved properly and returned
> the buffers that was in the OMX stack. Currently, we don't track this, we
> have no idea which outstanding buffers are in the OMX stack and which one
> are downstream the pipeline.

With the couple of patches above we now wait for all buffers in OMX to be released.
Comment 20 Nicolas Dufresne (ndufresne) 2018-08-23 14:05:39 UTC
Adding Thibault in CC has he was using gst-omx on RPi really recently.
Comment 21 Nicolas Dufresne (ndufresne) 2018-08-29 16:17:20 UTC
Review of attachment 373431 [details] [review]:

.
Comment 22 Nicolas Dufresne (ndufresne) 2018-08-29 16:17:45 UTC
Review of attachment 373432 [details] [review]:

.
Comment 23 Nicolas Dufresne (ndufresne) 2018-08-29 16:17:50 UTC
Review of attachment 373432 [details] [review]:

.
Comment 24 Nicolas Dufresne (ndufresne) 2018-08-29 16:18:50 UTC
Attachment 373431 [details] pushed as 3d55051 - omx: factor out should_wait_until_flushed()
Attachment 373432 [details] pushed as 40ae47d - omx: wait for flush complete and buffers being released when flushing
Comment 25 Nicolas Dufresne (ndufresne) 2018-08-29 16:20:48 UTC
Picked into 1.14 branch.

2b7ad7f2fe1157c508e3bf6f66be447e3a38d137        
ba3f947f57bd203e4eaaa8193b5ef51d7b495e4b