GNOME Bugzilla – Bug 789475
omx: wait for flush complete and buffers being released when flushing
Last modified: 2018-08-29 16:20:48 UTC
.
Created attachment 362253 [details] [review] gst_omx_port_set_flushing: simplify waiting loop No semantic change so far, I just made the 'while' end condition easier to understand as a first step before changing it. - move error/time out checks inside the loop to make it clearer on what we are actually waiting for. - group port->buffers checks together with parenthesis as they are part of the same conceptual check: waiting for all buffers to be released.
Created attachment 362254 [details] [review] omx: wait for flush complete and buffers being released when flushing As stated in the existing comment, when flusing we should wait for OMX to send the flush command complete event AND all ports being released. We were stopping as soon as one of those condition was met. Fix a race between FillThisBufferDone/EmptyBufferDone and the flush EventCmdComplete messages.
Review of attachment 362253 [details] [review]: Looks good.
Review of attachment 362254 [details] [review]: Wow, that explains all sort of bug I have seen in the past. And it's specially important since the buffer tracking is horribly unsafe in gstomx.c.
just before merging, I'd like some feedback. I've read the spec a little, and receiving the command complete event before the buffers has been returns seems like it would be a component bug. And then if you unblock when all buffers comes back, and you didn't get the command complete, you'll endup with the flushed boolean left set. Overall, the patch makes things safer and more correct, but my impression is that you should only wait for the command complete (flushed boolean), and warn or error out if buffers where not returned by the component properly. Please provide some feedback so I we don't merge this in the wrong form.
You're right, the component should release the buffer and then complete the flush command. That's what our OMX implementation does but the ordering isn't guaranteed because of threads. I amend the commit message to make this clearer.
Created attachment 362611 [details] [review] omx: wait for flush complete and buffers being released when flushing As stated in the existing comment, when flusing we should wait for OMX to send the flush command complete event AND all ports being released. We were stopping as soon as one of those condition was met. Fix a race between FillThisBufferDone/EmptyBufferDone and the flush EventCmdComplete messages. The OMX implementation is supposed to release its buffers before posting the EventCmdComplete event but the ordering isn't guaranteed as the FillThisBufferDone/EmptyBufferDone and EventHandler callbacks can be called from different threads (cf 2.7 'Thread Safety' in the spec).
Merged patches Attachment 362253 [details] pushed as 4c0a8a6 - gst_omx_port_set_flushing: simplify waiting loop Attachment 362611 [details] pushed as 4211e4c - omx: wait for flush complete and buffers being released when flushing
Unfortunately, this was wrong. gst_omx_port_set_flushing() now waits for all the buffers in the pipeline to come back, while it was suppose to wait for the flush command to complete. How it works: - We send the OMX_CommandFlush - Then our handles will receive it and serialize into our message queue - We call gst_omx_component_handle_messages() - And then check for port->flushed or wait for the message to arrive All this is protected with the comp->lock. About the && check, it was an optimization, because it's pointless to wait for the message if all buffers are pending. The component is still being flushed, but we can avoid the waiting (OMX stack can be slow). I'm not too sure why we unblock anything waiting for message, it's seems pointless but harmless. I might be missing something. To come back to why it's wrong, the code has been changed so now it wait for all buffers from the entire pipeline to come back. This just impossible, buffers stuck in display sink (last-sample or scannout if kmssink) will only be returned if a new buffers replace them or an allocation/drain query is received. As a side effect, we now always reach the timeout. Though, we should check the the OMX stack have behaved properly and returned the buffers that was in the OMX stack. Currently, we don't track this, we have no idea which outstanding buffers are in the OMX stack and which one are downstream the pipeline. In my opinion though, if you received OMX_CommandFlush and the buffers were not returned yet, this is an error. What you seems to have fixed here is a bug in the Zynq OMX stack.
(In reply to Nicolas Dufresne (stormer) from comment #12) > Though, we should check the the OMX stack have behaved properly and returned > the buffers that was in the OMX stack. Currently, we don't track this, we > have no idea which outstanding buffers are in the OMX stack and which one > are downstream the pipeline. buf->used = TRUE is supposed to indicate that a buffer is in the OMX stack. Could we just wait for those? > In my opinion though, if you received OMX_CommandFlush and the buffers were > not returned yet, this is an error. What you seems to have fixed here is a > bug in the Zynq OMX stack. Agreed that not returning buffers would be a bug in the OMX stack. But iirc, and according to the commit message, we were hitting a race between the EventCmdComplete event and the FillThisBufferDone/EmptyBufferDone. This was causing issues but I don't remember exactly what.
Review of attachment 362611 [details] [review]: commit 5eed1a7eb2aa5c537738df60e6945f6464e20d2d (HEAD -> master) Author: Nicolas Dufresne <nicolas.dufresne@collabora.com> Date: Fri Mar 2 11:06:08 2018 -0500 Revert "omx: wait for flush complete and buffers being released when flushing" This reverts commit 4211e4c29a262f110cb92ddf9c06b403ced233ef.
I've kept the cleanup patch for the release, it didn't have any side effect. My next step here is to try and create hard ref dependency between the GstOMXBuffer/Port/Comp and the GstBuffer/Memory travelling in the pipeline. This way, if a buffer is not return, we'll keep the OMX objects alive. If that happens, I will also have to seek to find out why these are leaked of course. It's better to leak all this then to risk crashing the OMX stack (which often crash the driver in practice).
Created attachment 373430 [details] [review] omx: factor out should_wait_until_flushed() No semantic change. Makes the code easier to understand and I'm about to change the waiting condition. Upstream-Status: Pending Signed-off-by: Guillaume Desmottes <guillaume.desmottes@collabora.co.uk>
Created attachment 373431 [details] [review] omx: factor out should_wait_until_flushed() No semantic change. Makes the code easier to understand and I'm about to change the waiting condition.
Created attachment 373432 [details] [review] omx: wait for flush complete and buffers being released when flushing When flusing we should wait for OMX to send the flush command complete event AND all ports being released. We were stopping as soon as one of those condition was met. Fix a race between FillThisBufferDone/EmptyBufferDone and the flush EventCmdComplete messages. The OMX implementation is supposed to release its buffers before posting the EventCmdComplete event but the ordering isn't guaranteed as the FillThisBufferDone/EmptyBufferDone and EventHandler callbacks can be called from different threads (cf 2.7 'Thread Safety' in the spec). Only wait for buffers currently used by OMX as some buffers may not be in the pending queue because they are held downstream.
(In reply to Nicolas Dufresne (ndufresne) from comment #12) > Though, we should check the the OMX stack have behaved properly and returned > the buffers that was in the OMX stack. Currently, we don't track this, we > have no idea which outstanding buffers are in the OMX stack and which one > are downstream the pipeline. With the couple of patches above we now wait for all buffers in OMX to be released.
Adding Thibault in CC has he was using gst-omx on RPi really recently.
Review of attachment 373431 [details] [review]: .
Review of attachment 373432 [details] [review]: .
Attachment 373431 [details] pushed as 3d55051 - omx: factor out should_wait_until_flushed() Attachment 373432 [details] pushed as 40ae47d - omx: wait for flush complete and buffers being released when flushing
Picked into 1.14 branch. 2b7ad7f2fe1157c508e3bf6f66be447e3a38d137 ba3f947f57bd203e4eaaa8193b5ef51d7b495e4b