After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 715192 - videodecoder: Cannot accumulate buffer while downstream is blocked
videodecoder: Cannot accumulate buffer while downstream is blocked
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
git master
Other Linux
: Normal minor
: 1.15.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2013-11-25 19:44 UTC by Alexey Chernov
Modified: 2018-10-27 09:07 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch with release of STREAM_LOCK during gst_pad_push for GstVideoDecoder (1.08 KB, patch)
2013-11-25 19:44 UTC, Alexey Chernov
needs-work Details | Review
Patch with release of STREAM_LOCK during gst_pad_push for GstVideoEncoder (1.38 KB, patch)
2013-11-25 19:44 UTC, Alexey Chernov
needs-work Details | Review
videodecoder: Release STREAM_LOCK during gst_pad_push() (1.08 KB, patch)
2018-08-31 20:48 UTC, Nicolas Dufresne (ndufresne)
committed Details | Review
videoencoder: Release STREAM_LOCK during gst_pad_push() (1.44 KB, patch)
2018-08-31 20:48 UTC, Nicolas Dufresne (ndufresne)
committed Details | Review
omxvideodec: don't hold the stream lock when trying to push a frame (1.65 KB, patch)
2018-08-31 21:28 UTC, Nicolas Dufresne (ndufresne)
committed Details | Review
videodecoder: Delete the link before pushing (1.65 KB, patch)
2018-09-08 02:18 UTC, Nicolas Dufresne (ndufresne)
committed Details | Review
omxvideodec: Remove spurious unlock in error case (1.24 KB, patch)
2018-09-08 03:14 UTC, Nicolas Dufresne (ndufresne)
committed Details | Review
omxvideoenc: Remove spurious locking (2.13 KB, patch)
2018-09-08 03:14 UTC, Nicolas Dufresne (ndufresne)
none Details | Review
omxvideoenc: Remove unneeded size check (1.50 KB, patch)
2018-09-08 03:14 UTC, Nicolas Dufresne (ndufresne)
committed Details | Review
omxvideoenc: Remove spurious locking (2.31 KB, patch)
2018-09-08 03:43 UTC, Nicolas Dufresne (ndufresne)
committed Details | Review

Description Alexey Chernov 2013-11-25 19:44:22 UTC
Created attachment 261479 [details] [review]
Patch with release of STREAM_LOCK during gst_pad_push for GstVideoDecoder

STREAM_LOCK is locked too much inside finish_frame() function of GstVideoDecoder as it remains locked during the whole period of gst_pad_push() call. When there's no queue after decoder or this queue is full, the period of time when STREAM_LOCK is actually locked by finish_frame() can be near to frame duration as gst_pad_push() waits for frame to be displayed. That means that upstream thread (i.e. handle_frame()) should wait for STREAM_LOCK been unlocked, but given that it doesn't have corresponding wait condition there's no way to wake up upstream thread and in most cases it simply misses its turn to occupy STREAM_LOCK.

While writing GstVideoDecoder-based decoder we found that there's no harm to unlock STREAM_LOCK before calling gst_pad_push() and lock it again afterwards. It's possible for the following reasons:
1. In single-threaded decoders where finish_frame() is effectively called inside handle_frame() there's no problem as a) STREAM_LOCK is still locked at least once and b) the thread is blocked by gst_pad_push() naturally
2. In multi-threaded case synchronization requires more intelligent approach than blocking input thread out of output thread. This model is often used when decoder isn't capable to deliver every single frame during its duration but could provide enough average framerate, so it would like to consume some input frames even when output frame is being displayed. It surely requires some overfill protection but it can be done using queues with mutexes and wait conditions while it's too dangerous to use locked STREAM_LOCK during gst_pad_push() for it.

We prepared a couple of patches for review, one for GstVideoDecoder and another one for GstVideoEncoder. We release STREAM_LOCK only for gst_pad_push() as gst_pad_push_event() is already called outside STREAM_LOCK so everything's fine with them.
Comment 1 Alexey Chernov 2013-11-25 19:44:54 UTC
Created attachment 261481 [details] [review]
Patch with release of STREAM_LOCK during gst_pad_push for GstVideoEncoder
Comment 2 Nicolas Dufresne (ndufresne) 2013-11-25 19:58:09 UTC
That change seems odd to me considering what a stream lock is. Is there any use-case, sample code that demonstrate an issue, or real contention here ?

Fyi, the stream lock(s) is what protects the stream, there is one on each pad, and few element also have one. They are normally kept locked as long as the pad or the element is streaming. This is a very specialized lock, which can be unlocked by certain event, like flush-start. It's not intended for general protection, though we need to be careful of the locking order, the reason we cannot ignore it.

Finally, by unlocking it before pushing, you open a gap where another push could hypothetically happen (depend on the implementation on top of the baseclass), causing buffer or event miss-ordering. It would be a very hard problem to track.
Comment 3 Alexey Chernov 2013-11-25 20:17:14 UTC
Thanks for detailed information, Nicolas.
Yes, I have a use-case for it: a multithreaded decoder based on GstVideoDecoder with three threads: one input thread which rotates handle_frame(), one thread which rotates hardware decoder itself and an output thread which does finish_frame(). All threads are connected using a queue, a mutex and wait condition (input-decoder and decoder-output). Some variation of this model is used in gst-omx decoder. So, everything works fine until there's no external queue after decoder element or this queue is full. In such case finish_frame() is blocked for fair amount of time by gst_pad_push(), so is STREAM_LOCK. Output thread is blocked by finish_frame() and it's normal, but the problem is that there's also tension between output thread and input thread. As STREAM_LOCK mutex is locked most part of time and has no wait condition there's no guarantee that input thread would see it released when scheduler would make it active. Actually it wouldn't most of times, so handle_frame() can't provide any input frames for decoder, decoder processes all the frames in its input queue and fall asleep. Only after that output thread starts to wait on the wait condition of output queue and stops to occupy STREAM_LOCK too often. Input thread receives a chance to provide some frames, decoder starts to work, in some time it fills out external queue if there's one and the problem repeats. It looks like oscillation.

I see that STREAM_LOCK is quite special instrument by its nature, but it seems that unlocking it for gst_pad_push() can be very helpful for the cases as mentioned above. You're right that it requires more accurate way to do output routines, but this code is usually quite serialized and all such calls are usually in the same thread.
Comment 4 Nicolas Dufresne (ndufresne) 2013-11-25 21:01:33 UTC
I see you're point.

For those who don't like long threads, we take GST_VIDEO_DECODER_STREAM_LOCK() in both _chain(), and hold it when we call push(buffer). Which mean no buffer can come inside the element while pushing.
Comment 5 Alexey Chernov 2013-11-25 21:37:01 UTC
Yes, thanks for short summary, Nicolas)
Comment 6 Olivier Crête 2013-11-25 22:12:33 UTC
I think should not take the decoder stream lock during the chain() function. But that requires going over every data member and making sure that using the object lock is enough (and that is it used correctly).
Comment 7 Sebastian Dröge (slomo) 2013-11-25 22:15:42 UTC
FWIW I recommended these changes to Alexey. The right thing to do here is to only hold the base class' stream lock while in chain(), when handling a serialized event (but not when calling gst_pad_push_event()) and around finish_frame() (but not when calling gst_pad_push()). I'll review these patches tomorrow and fix them up if there's anything to fix up.
Comment 8 Olivier Crête 2013-11-25 22:23:02 UTC
What's the point of the base class stream lock in chain? You're already holding the stream lock of the sink pad.
Comment 9 Sebastian Dröge (slomo) 2013-11-26 13:43:52 UTC
These patches are not complete, will finish them now (hopefully finish before I leave, otherwise later this evening) and also check again if there are any race conditions I missed before.
Comment 10 Alexey Chernov 2013-11-26 14:25:48 UTC
Thanks, Sebastian. I also can check updated patches with our decoder and gst-omx on Raspberry, if it would help.
Comment 11 Sebastian Dröge (slomo) 2013-12-01 21:59:19 UTC
FWIW I'm still on it, got distracted by a related bug that I noticed while looking into this :) In reverse playback we're breaking threading assumptions when having a multi-threaded decoder.

Apart from that Alexey's patch is missing some changes for serialized events, which is related to ↑.
Comment 12 Alexey Chernov 2013-12-02 06:43:42 UTC
Hmmm.. do you mean pushing downstream events serialized with buffer stream?
If so, I think, this issue exists even now, without the patch, because AFAICT all the calls to gst_pad_push_event() isn't protected by STREAM_LOCK, so locking it for gst_pad_push() call doesn't seem to save the game. Unfortunately, I don't have an idea how to solve it easily. We had some thoughts on adding some multi-threaded-aware routines to GstVideoDecoder or to subclass it for multi-threaded decoders, but all these are not too realistic for now, I think.
Comment 13 Sebastian Dröge (slomo) 2013-12-14 17:07:51 UTC
I think it's not that simple. Unfortunately we send serialized events and buffers from different threads for multi-threaded decoders, because it happens based on various videodecoder functions called from various threads.

E.g. we send events from negotiate (called from setcaps for example, or allocate), we send buffers from chain (in reverse playback), some events are sent directly from the event function. OTOH the same happens from everything called from the streaming thread of the srcpad.

I think the way it is currently inside the base class it does not cause any problems and is correct, just confusing. All events and buffers are always kept in the same order and nothing is doing a serialized action downstream at the same time as another thread. However if we remove the locking this is not necessarily true anymore.
Comment 14 Sebastian Dröge (slomo) 2013-12-14 17:09:25 UTC
Ah forgot the example: srcpad streaming thread pushes a buffer or event, at the same time the sinkpad streaming thread allocates a buffer that triggers negotiation.
Comment 15 Nicolas Dufresne (ndufresne) 2013-12-14 19:27:18 UTC
(In reply to comment #14)
> Ah forgot the example: srcpad streaming thread pushes a buffer or event, at the
> same time the sinkpad streaming thread allocates a buffer that triggers
> negotiation.

I'm not fully convince by the explanation. If you are pushing a buffer on the srcpad, you'll have to wait for that call to return before you can make any serialized operation like sending caps event, doing allocation query etc. That is because the srcpad streaming lock has been taken from another process.

What am I missing ?
Comment 16 Sebastian Dröge (slomo) 2013-12-14 19:32:47 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > Ah forgot the example: srcpad streaming thread pushes a buffer or event, at the
> > same time the sinkpad streaming thread allocates a buffer that triggers
> > negotiation.
> 
> I'm not fully convince by the explanation. If you are pushing a buffer on the
> srcpad, you'll have to wait for that call to return before you can make any
> serialized operation like sending caps event, doing allocation query etc. That
> is because the srcpad streaming lock has been taken from another process.
> 
> What am I missing ?

Yes, *because* we take the GST_VIDEO_DECODER_STREAM_LOCK() here it is all fine. If we don't hold it while pushing buffers downstream, it is possible for the sinkpad streaming thread to allocate a buffer, which then would (at the same time as the srcpad streaming thread pushing a buffer) push events downstream if we're unlucky.
Comment 17 Sebastian Dröge (slomo) 2013-12-14 19:37:54 UTC
Ah I see what you mean. Yes, that's actually true however we have two threads fighting for the srcpad's stream lock here then. Depending on how lucky you are with the timing this can mess up the order of things if I'm not mistaken.
Comment 18 Sebastian Dröge (slomo) 2013-12-14 20:15:12 UTC
I'll review all this another time tomorrow
Comment 19 George Kiagiadakis 2015-01-19 13:46:41 UTC
I have recently encountered this problem here. My usage scenario is omxvideodec running on a rpi with the following pipeline:

 filesrc ! queue ! decodebin (typefind ! h264parse ! omxh264dec) ! custom video sink

This video sink is made so that it blocks in render() until the frame has been displayed, so eventually the time that the lock is available for the queue thread to push frames in, is minimal.

Eventually, I found it was causing big delays (big enough to skip frames) on the call to GST_VIDEO_DECODER_STREAM_LOCK(decoder) inside the gst_omx_video_dec_loop() function. Obviously the queue thread also needs some time to run, so it is reasonable to lock the other thread out for a while, however it is a problem to skip frames.

I solved this with the following patches:

http://cgit.collabora.com/git/user/gkiagia/gst-plugins-base.git/commit/?h=sil-1.2.0&id=905f138876c55f8c906925f13971efa49ca66c51
http://cgit.collabora.com/git/user/gkiagia/gst-omx.git/commit/?h=sil-1.0.0.2&id=690e303dc64009d32acd2b7ebee654ee893bfe24
Comment 20 Tim-Philipp Müller 2015-05-12 19:57:52 UTC
There must be some solution to this, because clearly

  (decoder|encoder) ! queue ! ..

must work and works, and it must be possible to replicate that conceptually inside the base class without another thread, or am I missing something?
Comment 21 Nicolas Dufresne (ndufresne) 2015-05-12 20:17:10 UTC
The point is from decoder/encoder perspective. you want upstream buffers as soon as possible into the driver, and certainly don't want to stop doing so while pushing a buffer downstream. Adding a queue afterward is just a workaround. Fixing the interblocking by adding yet another thread is unwanted.
Comment 22 Tim-Philipp Müller 2015-05-12 20:30:01 UTC
Of course, I was not suggesting we add a queue, but saying that if that construct is safe then there must be a solution where we can drop the stream lock while pushing out buffers that's safe as well.
Comment 23 Sebastian Dröge (slomo) 2015-05-13 07:25:00 UTC
Yes, I think the main problem here was that after gst_pad_push() the base class also accesses various internal fields and updates them, and it wasn't clear if it's safe to do them a) before gst_pad_push() or b) unlock the stream lock, potentially accept more data on the sinkpad and then lock again and modify them

Needs someone to analyze that :)
Comment 24 Nicolas Dufresne (ndufresne) 2018-01-11 13:39:24 UTC
I think a safer alternative to these patches would be a pattern like:

  GST_PAD_STREAM_LOCK (srcpad);
  GST_VIDEO_XCODER_STREAM_UNLOCK (xcoder);
  gst_pad_push (...);
  GST_PAD_STREAM_UNLOCK (srcpad);
  GST_VIDEO_XCODER_STREAM_LOCK (xcoder);

What do you think ?
Comment 25 Nicolas Dufresne (ndufresne) 2018-08-31 20:48:22 UTC
Created attachment 373530 [details] [review]
videodecoder: Release STREAM_LOCK during gst_pad_push()

Release STREAM_LOCK before calling gst_pad_push() and take it
back afterward so that upstream isn't blocked while output
buffer is being pushed downstream.
Comment 26 Nicolas Dufresne (ndufresne) 2018-08-31 20:48:28 UTC
Created attachment 373531 [details] [review]
videoencoder: Release STREAM_LOCK during gst_pad_push()

Release STREAM_LOCK before calling gst_pad_push() and take it
back afterward so that upstream isn't blocked while output
buffer is being pushed downstream.
Comment 27 Nicolas Dufresne (ndufresne) 2018-08-31 21:12:07 UTC
While it would also be nice to do the same while pushing serialized event, it does not seems strictly needed to achieve decent performance here.

What was very key here, was to control the case where the frame being _finish() is that last one. Basically, while pushing, if some events comes in, they should be appended to that frame, as we already pushed those pending events. But this was already covered, since we remove the frame from the internal list before we get there. The aliasing of the srcpad stream lock should cover most worries. At the moment we have two decoder/encoder implementation, both mimics each others, and for both, concurrent negotiation and finish_frame is unlikely. The internal data always get drained before such a situation. I also think these patches should hold for a while, at which point we'll probably have to fork it to remove that lock and simply find a better workflow for threaded decoders/encoders.
Comment 28 Nicolas Dufresne (ndufresne) 2018-08-31 21:18:20 UTC
Attachment 373530 [details] pushed as fb49674 - videodecoder: Release STREAM_LOCK during gst_pad_push()
Attachment 373531 [details] pushed as 595dd27 - videoencoder: Release STREAM_LOCK during gst_pad_push()
Comment 29 Nicolas Dufresne (ndufresne) 2018-08-31 21:28:41 UTC
Created attachment 373532 [details] [review]
omxvideodec: don't hold the stream lock when trying to push a frame

The base class methods will lock this properly when needed, there seems
to be no need to lock it explicitly.

This allows the patch in gstvideodec for unlocking the stream lock
when pushing buffers out to work.
Comment 30 Nicolas Dufresne (ndufresne) 2018-08-31 21:30:48 UTC
This one is George OMX patch ported to latest code base. I've read it couple of time, going through the ramification, and indeed it's calling thread safe methods, or stuff that can only ever be called from the srcpad thread. I needed to protect the flow_ret though, just so that it does not get reset accidental.
Comment 31 Nicolas Dufresne (ndufresne) 2018-08-31 21:42:21 UTC
Attachment 373532 [details] pushed as 04e1f76 - omxvideodec: don't hold the stream lock when trying to push a frame
Comment 32 Jan Schmidt 2018-09-03 04:00:46 UTC
This makes gst_video_decoder_clip_and_push_buf() drop the STREAM_LOCK where it didn't before, but gst_video_decoder_flush_parse() calls it assuming that the priv->output_queued list can't be mutated while it's called.

A poorly timed flush in reverse playback could double-unref a queued buffer. It can be fixed by moving the g_list_delete_link() logic to the top of the loop in gst_video_decoder_flush_parse()
Comment 33 Guillaume Desmottes 2018-09-03 06:59:08 UTC
(In reply to Nicolas Dufresne (ndufresne) from comment #30)
> This one is George OMX patch ported to latest code base. I've read it couple
> of time, going through the ramification, and indeed it's calling thread safe
> methods, or stuff that can only ever be called from the srcpad thread. I
> needed to protect the flow_ret though, just so that it does not get reset
> accidental.

Shouldn't we do a similar change in omxvideoenc?
Comment 34 Sebastian Dröge (slomo) 2018-09-03 07:17:56 UTC
Reopening for Jan's remark

(In reply to Guillaume Desmottes from comment #33)
> 
> Shouldn't we do a similar change in omxvideoenc?

And the the androidmedia elements.
Comment 35 Nicolas Dufresne (ndufresne) 2018-09-03 13:57:43 UTC
Ok,I'll check what Jan reports. I had ignored reverse because it didn't work for me neither before or after, but I agree not making it worst.

For the extra possible locking in the implementation, I'll just file bugs. I have not read AMC stuff in this regard, so I trust Sebastian that it is doing same spurious locking.
Comment 36 Sebastian Dröge (slomo) 2018-09-03 14:42:34 UTC
AMC code is basically the same as OMX
Comment 37 Nicolas Dufresne (ndufresne) 2018-09-03 17:16:42 UTC
Ok, I'll check:

 - make reverse playback safe again
 - check if locking need to be removed/moved in OMX encoder
 - check if locking need to be removed in AMC
Comment 38 Nicolas Dufresne (ndufresne) 2018-09-08 02:18:53 UTC
Created attachment 373562 [details] [review]
videodecoder: Delete the link before pushing

The gst_video_decoder_clip_and_push_buf() now drops the internal stream
lock while pushing. This means, the output_queued list could be modififed
during that time. To make the code safe again, we delete the link before
pushing the data. The walk pointer will later be updated with the list
head, which makes it safe in case the list was modififed.
Comment 39 Nicolas Dufresne (ndufresne) 2018-09-08 03:14:02 UTC
Created attachment 373563 [details] [review]
omxvideodec: Remove spurious unlock in error case

This was forgotton in previous patch. We no long hold the lock when goto
invalid_buffer is called.
Comment 40 Nicolas Dufresne (ndufresne) 2018-09-08 03:14:06 UTC
Created attachment 373564 [details] [review]
omxvideoenc: Remove spurious locking

The method we call in the context of pushing a buffer are all thread
safe. Holding a lock would prevent input buffers from being queued while
pushing.
Comment 41 Nicolas Dufresne (ndufresne) 2018-09-08 03:14:11 UTC
Created attachment 373565 [details] [review]
omxvideoenc: Remove unneeded size check

We only enter this branch if nFilledLen > 0, there is not need
to check again.
Comment 42 Nicolas Dufresne (ndufresne) 2018-09-08 03:20:55 UTC
The android part looks much more complicated, or maybe there is code that shoudn't be there (e.g. we call finish_frame() on old frames, without output buffer). We also access a lot more state, and the code using chained helper, rather then sequential helpers function to reduce the function size. This coding habit makes it much harder to verify locking. I think I'll file another bug for this one, and leave it for someone else to optimize.
Comment 43 Nicolas Dufresne (ndufresne) 2018-09-08 03:43:20 UTC
Created attachment 373566 [details] [review]
omxvideoenc: Remove spurious locking

The method we call in the context of pushing a buffer are all thread
safe. Holding a lock would prevent input buffers from being queued while
pushing.
Comment 44 Nicolas Dufresne (ndufresne) 2018-09-08 03:59:41 UTC
These patches has been tested using transcoding pipeline on the Zynq. The processing is much smoother now. We no longer need to massively increase the sender latency to ensure the network transmission isn't too jittery.
Comment 45 Nicolas Dufresne (ndufresne) 2018-09-10 21:08:41 UTC
Comment on attachment 373562 [details] [review]
videodecoder: Delete the link before pushing

Attachment 373562 [details] pushed as 8ffd15a - videodecoder: Delete the link before pushing
Comment 46 Nicolas Dufresne (ndufresne) 2018-09-10 21:09:39 UTC
Attachment 373563 [details] pushed as 35abdd1 - omxvideodec: Remove spurious unlock in error case
Attachment 373565 [details] pushed as d80504a - omxvideoenc: Remove unneeded size check
Attachment 373566 [details] pushed as a84badc - omxvideoenc: Remove spurious locking
Comment 47 rland 2018-10-26 14:13:37 UTC
Out of curiosity, can these modifications be applied with base audiodecoder? :)
Comment 48 Nicolas Dufresne (ndufresne) 2018-10-27 07:44:18 UTC
It is a good question, one would need to read the code to see if the condition exist, and also check if a threaded audio encoder/decoder do exists, since such an optimization only helps when the element runs a streaming thread.
Comment 49 Sebastian Dröge (slomo) 2018-10-27 09:07:08 UTC
(In reply to Nicolas Dufresne (ndufresne) from comment #48)
> also check if a threaded audio encoder/decoder do
> exists, since such an optimization only helps when the element runs a
> streaming thread.

androidmedia and gst-omx audio decoders/encoders. They're basically working the same as the video ones.

Someone want to file a bug for tracking this?