GNOME Bugzilla – Bug 715192
videodecoder: Cannot accumulate buffer while downstream is blocked
Last modified: 2018-10-27 09:07:08 UTC
Created attachment 261479 [details] [review] Patch with release of STREAM_LOCK during gst_pad_push for GstVideoDecoder STREAM_LOCK is locked too much inside finish_frame() function of GstVideoDecoder as it remains locked during the whole period of gst_pad_push() call. When there's no queue after decoder or this queue is full, the period of time when STREAM_LOCK is actually locked by finish_frame() can be near to frame duration as gst_pad_push() waits for frame to be displayed. That means that upstream thread (i.e. handle_frame()) should wait for STREAM_LOCK been unlocked, but given that it doesn't have corresponding wait condition there's no way to wake up upstream thread and in most cases it simply misses its turn to occupy STREAM_LOCK. While writing GstVideoDecoder-based decoder we found that there's no harm to unlock STREAM_LOCK before calling gst_pad_push() and lock it again afterwards. It's possible for the following reasons: 1. In single-threaded decoders where finish_frame() is effectively called inside handle_frame() there's no problem as a) STREAM_LOCK is still locked at least once and b) the thread is blocked by gst_pad_push() naturally 2. In multi-threaded case synchronization requires more intelligent approach than blocking input thread out of output thread. This model is often used when decoder isn't capable to deliver every single frame during its duration but could provide enough average framerate, so it would like to consume some input frames even when output frame is being displayed. It surely requires some overfill protection but it can be done using queues with mutexes and wait conditions while it's too dangerous to use locked STREAM_LOCK during gst_pad_push() for it. We prepared a couple of patches for review, one for GstVideoDecoder and another one for GstVideoEncoder. We release STREAM_LOCK only for gst_pad_push() as gst_pad_push_event() is already called outside STREAM_LOCK so everything's fine with them.
Created attachment 261481 [details] [review] Patch with release of STREAM_LOCK during gst_pad_push for GstVideoEncoder
That change seems odd to me considering what a stream lock is. Is there any use-case, sample code that demonstrate an issue, or real contention here ? Fyi, the stream lock(s) is what protects the stream, there is one on each pad, and few element also have one. They are normally kept locked as long as the pad or the element is streaming. This is a very specialized lock, which can be unlocked by certain event, like flush-start. It's not intended for general protection, though we need to be careful of the locking order, the reason we cannot ignore it. Finally, by unlocking it before pushing, you open a gap where another push could hypothetically happen (depend on the implementation on top of the baseclass), causing buffer or event miss-ordering. It would be a very hard problem to track.
Thanks for detailed information, Nicolas. Yes, I have a use-case for it: a multithreaded decoder based on GstVideoDecoder with three threads: one input thread which rotates handle_frame(), one thread which rotates hardware decoder itself and an output thread which does finish_frame(). All threads are connected using a queue, a mutex and wait condition (input-decoder and decoder-output). Some variation of this model is used in gst-omx decoder. So, everything works fine until there's no external queue after decoder element or this queue is full. In such case finish_frame() is blocked for fair amount of time by gst_pad_push(), so is STREAM_LOCK. Output thread is blocked by finish_frame() and it's normal, but the problem is that there's also tension between output thread and input thread. As STREAM_LOCK mutex is locked most part of time and has no wait condition there's no guarantee that input thread would see it released when scheduler would make it active. Actually it wouldn't most of times, so handle_frame() can't provide any input frames for decoder, decoder processes all the frames in its input queue and fall asleep. Only after that output thread starts to wait on the wait condition of output queue and stops to occupy STREAM_LOCK too often. Input thread receives a chance to provide some frames, decoder starts to work, in some time it fills out external queue if there's one and the problem repeats. It looks like oscillation. I see that STREAM_LOCK is quite special instrument by its nature, but it seems that unlocking it for gst_pad_push() can be very helpful for the cases as mentioned above. You're right that it requires more accurate way to do output routines, but this code is usually quite serialized and all such calls are usually in the same thread.
I see you're point. For those who don't like long threads, we take GST_VIDEO_DECODER_STREAM_LOCK() in both _chain(), and hold it when we call push(buffer). Which mean no buffer can come inside the element while pushing.
Yes, thanks for short summary, Nicolas)
I think should not take the decoder stream lock during the chain() function. But that requires going over every data member and making sure that using the object lock is enough (and that is it used correctly).
FWIW I recommended these changes to Alexey. The right thing to do here is to only hold the base class' stream lock while in chain(), when handling a serialized event (but not when calling gst_pad_push_event()) and around finish_frame() (but not when calling gst_pad_push()). I'll review these patches tomorrow and fix them up if there's anything to fix up.
What's the point of the base class stream lock in chain? You're already holding the stream lock of the sink pad.
These patches are not complete, will finish them now (hopefully finish before I leave, otherwise later this evening) and also check again if there are any race conditions I missed before.
Thanks, Sebastian. I also can check updated patches with our decoder and gst-omx on Raspberry, if it would help.
FWIW I'm still on it, got distracted by a related bug that I noticed while looking into this :) In reverse playback we're breaking threading assumptions when having a multi-threaded decoder. Apart from that Alexey's patch is missing some changes for serialized events, which is related to ↑.
Hmmm.. do you mean pushing downstream events serialized with buffer stream? If so, I think, this issue exists even now, without the patch, because AFAICT all the calls to gst_pad_push_event() isn't protected by STREAM_LOCK, so locking it for gst_pad_push() call doesn't seem to save the game. Unfortunately, I don't have an idea how to solve it easily. We had some thoughts on adding some multi-threaded-aware routines to GstVideoDecoder or to subclass it for multi-threaded decoders, but all these are not too realistic for now, I think.
I think it's not that simple. Unfortunately we send serialized events and buffers from different threads for multi-threaded decoders, because it happens based on various videodecoder functions called from various threads. E.g. we send events from negotiate (called from setcaps for example, or allocate), we send buffers from chain (in reverse playback), some events are sent directly from the event function. OTOH the same happens from everything called from the streaming thread of the srcpad. I think the way it is currently inside the base class it does not cause any problems and is correct, just confusing. All events and buffers are always kept in the same order and nothing is doing a serialized action downstream at the same time as another thread. However if we remove the locking this is not necessarily true anymore.
Ah forgot the example: srcpad streaming thread pushes a buffer or event, at the same time the sinkpad streaming thread allocates a buffer that triggers negotiation.
(In reply to comment #14) > Ah forgot the example: srcpad streaming thread pushes a buffer or event, at the > same time the sinkpad streaming thread allocates a buffer that triggers > negotiation. I'm not fully convince by the explanation. If you are pushing a buffer on the srcpad, you'll have to wait for that call to return before you can make any serialized operation like sending caps event, doing allocation query etc. That is because the srcpad streaming lock has been taken from another process. What am I missing ?
(In reply to comment #15) > (In reply to comment #14) > > Ah forgot the example: srcpad streaming thread pushes a buffer or event, at the > > same time the sinkpad streaming thread allocates a buffer that triggers > > negotiation. > > I'm not fully convince by the explanation. If you are pushing a buffer on the > srcpad, you'll have to wait for that call to return before you can make any > serialized operation like sending caps event, doing allocation query etc. That > is because the srcpad streaming lock has been taken from another process. > > What am I missing ? Yes, *because* we take the GST_VIDEO_DECODER_STREAM_LOCK() here it is all fine. If we don't hold it while pushing buffers downstream, it is possible for the sinkpad streaming thread to allocate a buffer, which then would (at the same time as the srcpad streaming thread pushing a buffer) push events downstream if we're unlucky.
Ah I see what you mean. Yes, that's actually true however we have two threads fighting for the srcpad's stream lock here then. Depending on how lucky you are with the timing this can mess up the order of things if I'm not mistaken.
I'll review all this another time tomorrow
I have recently encountered this problem here. My usage scenario is omxvideodec running on a rpi with the following pipeline: filesrc ! queue ! decodebin (typefind ! h264parse ! omxh264dec) ! custom video sink This video sink is made so that it blocks in render() until the frame has been displayed, so eventually the time that the lock is available for the queue thread to push frames in, is minimal. Eventually, I found it was causing big delays (big enough to skip frames) on the call to GST_VIDEO_DECODER_STREAM_LOCK(decoder) inside the gst_omx_video_dec_loop() function. Obviously the queue thread also needs some time to run, so it is reasonable to lock the other thread out for a while, however it is a problem to skip frames. I solved this with the following patches: http://cgit.collabora.com/git/user/gkiagia/gst-plugins-base.git/commit/?h=sil-1.2.0&id=905f138876c55f8c906925f13971efa49ca66c51 http://cgit.collabora.com/git/user/gkiagia/gst-omx.git/commit/?h=sil-1.0.0.2&id=690e303dc64009d32acd2b7ebee654ee893bfe24
There must be some solution to this, because clearly (decoder|encoder) ! queue ! .. must work and works, and it must be possible to replicate that conceptually inside the base class without another thread, or am I missing something?
The point is from decoder/encoder perspective. you want upstream buffers as soon as possible into the driver, and certainly don't want to stop doing so while pushing a buffer downstream. Adding a queue afterward is just a workaround. Fixing the interblocking by adding yet another thread is unwanted.
Of course, I was not suggesting we add a queue, but saying that if that construct is safe then there must be a solution where we can drop the stream lock while pushing out buffers that's safe as well.
Yes, I think the main problem here was that after gst_pad_push() the base class also accesses various internal fields and updates them, and it wasn't clear if it's safe to do them a) before gst_pad_push() or b) unlock the stream lock, potentially accept more data on the sinkpad and then lock again and modify them Needs someone to analyze that :)
I think a safer alternative to these patches would be a pattern like: GST_PAD_STREAM_LOCK (srcpad); GST_VIDEO_XCODER_STREAM_UNLOCK (xcoder); gst_pad_push (...); GST_PAD_STREAM_UNLOCK (srcpad); GST_VIDEO_XCODER_STREAM_LOCK (xcoder); What do you think ?
Created attachment 373530 [details] [review] videodecoder: Release STREAM_LOCK during gst_pad_push() Release STREAM_LOCK before calling gst_pad_push() and take it back afterward so that upstream isn't blocked while output buffer is being pushed downstream.
Created attachment 373531 [details] [review] videoencoder: Release STREAM_LOCK during gst_pad_push() Release STREAM_LOCK before calling gst_pad_push() and take it back afterward so that upstream isn't blocked while output buffer is being pushed downstream.
While it would also be nice to do the same while pushing serialized event, it does not seems strictly needed to achieve decent performance here. What was very key here, was to control the case where the frame being _finish() is that last one. Basically, while pushing, if some events comes in, they should be appended to that frame, as we already pushed those pending events. But this was already covered, since we remove the frame from the internal list before we get there. The aliasing of the srcpad stream lock should cover most worries. At the moment we have two decoder/encoder implementation, both mimics each others, and for both, concurrent negotiation and finish_frame is unlikely. The internal data always get drained before such a situation. I also think these patches should hold for a while, at which point we'll probably have to fork it to remove that lock and simply find a better workflow for threaded decoders/encoders.
Attachment 373530 [details] pushed as fb49674 - videodecoder: Release STREAM_LOCK during gst_pad_push() Attachment 373531 [details] pushed as 595dd27 - videoencoder: Release STREAM_LOCK during gst_pad_push()
Created attachment 373532 [details] [review] omxvideodec: don't hold the stream lock when trying to push a frame The base class methods will lock this properly when needed, there seems to be no need to lock it explicitly. This allows the patch in gstvideodec for unlocking the stream lock when pushing buffers out to work.
This one is George OMX patch ported to latest code base. I've read it couple of time, going through the ramification, and indeed it's calling thread safe methods, or stuff that can only ever be called from the srcpad thread. I needed to protect the flow_ret though, just so that it does not get reset accidental.
Attachment 373532 [details] pushed as 04e1f76 - omxvideodec: don't hold the stream lock when trying to push a frame
This makes gst_video_decoder_clip_and_push_buf() drop the STREAM_LOCK where it didn't before, but gst_video_decoder_flush_parse() calls it assuming that the priv->output_queued list can't be mutated while it's called. A poorly timed flush in reverse playback could double-unref a queued buffer. It can be fixed by moving the g_list_delete_link() logic to the top of the loop in gst_video_decoder_flush_parse()
(In reply to Nicolas Dufresne (ndufresne) from comment #30) > This one is George OMX patch ported to latest code base. I've read it couple > of time, going through the ramification, and indeed it's calling thread safe > methods, or stuff that can only ever be called from the srcpad thread. I > needed to protect the flow_ret though, just so that it does not get reset > accidental. Shouldn't we do a similar change in omxvideoenc?
Reopening for Jan's remark (In reply to Guillaume Desmottes from comment #33) > > Shouldn't we do a similar change in omxvideoenc? And the the androidmedia elements.
Ok,I'll check what Jan reports. I had ignored reverse because it didn't work for me neither before or after, but I agree not making it worst. For the extra possible locking in the implementation, I'll just file bugs. I have not read AMC stuff in this regard, so I trust Sebastian that it is doing same spurious locking.
AMC code is basically the same as OMX
Ok, I'll check: - make reverse playback safe again - check if locking need to be removed/moved in OMX encoder - check if locking need to be removed in AMC
Created attachment 373562 [details] [review] videodecoder: Delete the link before pushing The gst_video_decoder_clip_and_push_buf() now drops the internal stream lock while pushing. This means, the output_queued list could be modififed during that time. To make the code safe again, we delete the link before pushing the data. The walk pointer will later be updated with the list head, which makes it safe in case the list was modififed.
Created attachment 373563 [details] [review] omxvideodec: Remove spurious unlock in error case This was forgotton in previous patch. We no long hold the lock when goto invalid_buffer is called.
Created attachment 373564 [details] [review] omxvideoenc: Remove spurious locking The method we call in the context of pushing a buffer are all thread safe. Holding a lock would prevent input buffers from being queued while pushing.
Created attachment 373565 [details] [review] omxvideoenc: Remove unneeded size check We only enter this branch if nFilledLen > 0, there is not need to check again.
The android part looks much more complicated, or maybe there is code that shoudn't be there (e.g. we call finish_frame() on old frames, without output buffer). We also access a lot more state, and the code using chained helper, rather then sequential helpers function to reduce the function size. This coding habit makes it much harder to verify locking. I think I'll file another bug for this one, and leave it for someone else to optimize.
Created attachment 373566 [details] [review] omxvideoenc: Remove spurious locking The method we call in the context of pushing a buffer are all thread safe. Holding a lock would prevent input buffers from being queued while pushing.
These patches has been tested using transcoding pipeline on the Zynq. The processing is much smoother now. We no longer need to massively increase the sender latency to ensure the network transmission isn't too jittery.
Comment on attachment 373562 [details] [review] videodecoder: Delete the link before pushing Attachment 373562 [details] pushed as 8ffd15a - videodecoder: Delete the link before pushing
Attachment 373563 [details] pushed as 35abdd1 - omxvideodec: Remove spurious unlock in error case Attachment 373565 [details] pushed as d80504a - omxvideoenc: Remove unneeded size check Attachment 373566 [details] pushed as a84badc - omxvideoenc: Remove spurious locking
Out of curiosity, can these modifications be applied with base audiodecoder? :)
It is a good question, one would need to read the code to see if the condition exist, and also check if a threaded audio encoder/decoder do exists, since such an optimization only helps when the element runs a streaming thread.
(In reply to Nicolas Dufresne (ndufresne) from comment #48) > also check if a threaded audio encoder/decoder do > exists, since such an optimization only helps when the element runs a > streaming thread. androidmedia and gst-omx audio decoders/encoders. They're basically working the same as the video ones. Someone want to file a bug for tracking this?