GNOME Bugzilla – Bug 720584
vaapidecode: hang while changing state from PAUSED to READY
Last modified: 2014-01-14 18:18:14 UTC
vaapidecode may hang sometimes if user interrupt the pipeline directly, e.g Ctrl+C. This is not always happening but in some special pipeline is easy to produce. e.g. gst-launch-1.0 v4l2src ! image/jpeg ! vaapidecode ! vaapisink sync=false. When user break the pipeline with 'ctrl+c'. vaapidecode hung there possible more than >80% times.
Created attachment 264382 [details] [review] 0001-plugin-fix-decode-hang-on-CTRL-C vaapidecode hang when piepline stopped without EOS,e.g: Ctl+C, which made vaapidecode srcpad task running all the time and locked srcpad without release, and caused deadlock on state change from PAUSED to READY
Gwenole, please review and comment. Thanks.
Created attachment 264460 [details] [review] 0002-plugin-fix-encode-hang-on-CTRL-C vaapiencode has the same potential issue. add vaapiencode's fix, it's similar with decode's fix.
(In reply to comment #1) > Created an attachment (id=264382) [details] [review] > 0001-plugin-fix-decode-hang-on-CTRL-C > > vaapidecode hang when piepline stopped without EOS,e.g: Ctl+C, which made > vaapidecode srcpad task running all the time and locked srcpad without release, > and caused deadlock on state change from PAUSED to READY It would be better to just do that in the GstVideoDecoder::stop() hook.
stop() hook is a little late, parent->change_state() was already called which made the dead lock.
(In reply to comment #5) > stop() hook is a little late, parent->change_state() was already called which > made the dead lock. The problem lays down somewhere else and the dead-lock is not due to the GstVideoDecoder stream lock then.
Created attachment 264490 [details] [review] vaapidecode: account for src pad deactivations vaapidecode: account for src pad deactivations. Track src pad deactivations early so that to make sure the decode task is paused on time for GstVideoDecoder's parent change_state() hook.
(In reply to comment #6) > (In reply to comment #5) > > stop() hook is a little late, parent->change_state() was already called which > > made the dead lock. > > The problem lays down somewhere else and the dead-lock is not due to the > GstVideoDecoder stream lock then. Maybe i didn't say clearly. The dead lock comes from GST_OBJECT_LOCK(decode->srcpad); 1st Thread: when gst_pad_start_task(decode->srcpad), srcpad lock was always locked there, without release unless the task stop() or pause(). 2nd Thread: when change_state called, stacks are GST_OBJECT_LOCK (pad); gst_pad_set_active(pad, active) activate_pads(pad, ret, FALSE) gst_element_pads_activate (element, FALSE) gst_element_change_state_func() gst_video_decoder_change_state() in set_active, src_pad try to lock itself but the lock was already locked in the loop.
(In reply to comment #7) > Created an attachment (id=264490) [details] [review] > vaapidecode: account for src pad deactivations > > vaapidecode: account for src pad deactivations. > > Track src pad deactivations early so that to make sure the decode task > is paused on time for GstVideoDecoder's parent change_state() hook. I don't think this patch can work. in dead lock, gst_pad_set_active(pad, FALSE) never finished, then the check always fails.
(In reply to comment #8) > (In reply to comment #6) > > (In reply to comment #5) > > > stop() hook is a little late, parent->change_state() was already called which > > > made the dead lock. > > > > The problem lays down somewhere else and the dead-lock is not due to the > > GstVideoDecoder stream lock then. > > Maybe i didn't say clearly. The dead lock comes from > GST_OBJECT_LOCK(decode->srcpad); > 1st Thread: when gst_pad_start_task(decode->srcpad), srcpad lock was always > locked there, without release unless the task stop() or pause(). > > 2nd Thread: when change_state called, stacks are > GST_OBJECT_LOCK (pad); > gst_pad_set_active(pad, active) > activate_pads(pad, ret, FALSE) > gst_element_pads_activate (element, FALSE) > gst_element_change_state_func() > gst_video_decoder_change_state() > > in set_active, src_pad try to lock itself but the lock was already locked in > the loop. Sorry, it's not the GST_OBJECT_LOCK(pad) but GST_PAD_GET_STREAM_LOCK(pad)
(In reply to comment #8) > (In reply to comment #6) > > (In reply to comment #5) > > > stop() hook is a little late, parent->change_state() was already called which > > > made the dead lock. > > > > The problem lays down somewhere else and the dead-lock is not due to the > > GstVideoDecoder stream lock then. > > Maybe i didn't say clearly. The dead lock comes from > GST_OBJECT_LOCK(decode->srcpad); > 1st Thread: when gst_pad_start_task(decode->srcpad), srcpad lock was always > locked there, without release unless the task stop() or pause(). Do you mean the mutex was always locked in the gst_pad_start_task() and it does not return without unlocking in some cases? If so, this rather looks like a GstPad bug. If you mean the mutex is always locked in the decode_loop() function, then this is not true. The function returns every 100 ms, at most. > 2nd Thread: when change_state called, stacks are > GST_OBJECT_LOCK (pad); > gst_pad_set_active(pad, active) > activate_pads(pad, ret, FALSE) > gst_element_pads_activate (element, FALSE) > gst_element_change_state_func() > gst_video_decoder_change_state() > > in set_active, src_pad try to lock itself but the lock was already locked in > the loop. From where precisely?
The locked mutex is GST_PAD_GET_STREAM_LOCK(pad), this lock would keep there always until task pause() or stop(), check gst_task_func() g_rec_mutex_lock (lock); /* lock is GST_PAD_GET_STREAM_LOCK(pad) */ while (G_LIKELY (GET_TASK_STATE (task) != GST_TASK_STOPPED)) { if (G_UNLIKELY (GET_TASK_STATE (task) == GST_TASK_PAUSED)) { ... } task->func (task->user_data); //gst_vaapidecode_decode_loop } done: g_rec_mutex_unlock (lock); Another thread blocked when trying to deactivate the pad, stacks
+ Trace 232941
(In reply to comment #12) > The locked mutex is > GST_PAD_GET_STREAM_LOCK(pad), this lock would keep there always until task > pause() or stop(), check gst_task_func() > g_rec_mutex_lock (lock); /* lock is GST_PAD_GET_STREAM_LOCK(pad) */ > while (G_LIKELY (GET_TASK_STATE (task) != GST_TASK_STOPPED)) { > if (G_UNLIKELY (GET_TASK_STATE (task) == GST_TASK_PAUSED)) { > ... > } > task->func (task->user_data); //gst_vaapidecode_decode_loop > } > done: > g_rec_mutex_unlock (lock); gst_vaapidecode_decode_loop() precisely returns with the task in paused state if the pad is deactivated, so the STREAM_LOCK is released. If the lock is held through push_decoded_frame() instead, then I need to know where/how. Thanks. Still, the latter situation should not be possible either because, if a GstPad function is called but it is not in an active mode (flushed, flushing), then that function returns with an appropriate error anyway, and we are not blocked through push_decoded_frame().
> gst_vaapidecode_decode_loop() precisely returns with the task in paused state > if the pad is deactivated, so the STREAM_LOCK is released. If the lock is held Yes, if task in paused state, the lock can be released. but the problem is here, gst_vaapidecode_decode_loop() doesn't going into paused. and with your patch, the loop is checking whether pad deactivated.and gst_pad_set_active(pad, false) need lock STREAM_LOCK too. Even your patch can solve(since lock happens in post_active, MODE_NONE was set in pre_active) this issue, it looks much tricky. > through push_decoded_frame() instead, then I need to know where/how. Thanks. > > Still, the latter situation should not be possible either because, if a GstPad > function is called but it is not in an active mode (flushed, flushing), then > that function returns with an appropriate error anyway, and we are not blocked > through push_decoded_frame(). Nothing about push_decoded_frame().
(In reply to comment #14) > > gst_vaapidecode_decode_loop() precisely returns with the task in paused state > > if the pad is deactivated, so the STREAM_LOCK is released. If the lock is held > Yes, if task in paused state, the lock can be released. but the problem is > here, gst_vaapidecode_decode_loop() doesn't going into paused. and with your > patch, the loop is checking whether pad deactivated.and gst_pad_set_active(pad, > false) need lock STREAM_LOCK too. Even your patch can solve(since lock happens > in post_active, MODE_NONE was set in pre_active) this issue, it looks much > tricky. If you think gst_pad_is_active() could be an issue, we could just make it GST_PAD_IS_ACTIVE() instead. The idea is to keep vaapidecode call task pause in a single location, and not add hook at the GstElement level since GstVideoDecoder hooks should be enough. If they aren't, then we need to fix GstVideoDecoder. So, that's why the best we can do is to make the decode_loop() gracefully check whether the pad is still active or not, and shut it down itself if the pad was deactivated.
(In reply to comment #15) > (In reply to comment #14) > > > gst_vaapidecode_decode_loop() precisely returns with the task in paused state > > > if the pad is deactivated, so the STREAM_LOCK is released. If the lock is held > > Yes, if task in paused state, the lock can be released. but the problem is > > here, gst_vaapidecode_decode_loop() doesn't going into paused. and with your > > patch, the loop is checking whether pad deactivated.and gst_pad_set_active(pad, > > false) need lock STREAM_LOCK too. Even your patch can solve(since lock happens > > in post_active, MODE_NONE was set in pre_active) this issue, it looks much > > tricky. > > If you think gst_pad_is_active() could be an issue, we could just make it > GST_PAD_IS_ACTIVE() instead. The idea is to keep vaapidecode call task pause in > a single location, and not add hook at the GstElement level since > GstVideoDecoder hooks should be enough. If they aren't, then we need to fix > GstVideoDecoder. gst_pad_is_active() is ok since multi-thread running. If new bugs found, then consider GST_PAD_IS_ACTIVE later. We can't expect GstVideoDeocder have all the hooks or do all the things, just like sometimes need to hook query, event. > > So, that's why the best we can do is to make the decode_loop() gracefully check > whether the pad is still active or not, and shut it down itself if the pad was > deactivated. I'm ok for this patch.
(In reply to comment #16) > (In reply to comment #15) > > (In reply to comment #14) > > > > gst_vaapidecode_decode_loop() precisely returns with the task in paused state > > > > if the pad is deactivated, so the STREAM_LOCK is released. If the lock is held > > > Yes, if task in paused state, the lock can be released. but the problem is > > > here, gst_vaapidecode_decode_loop() doesn't going into paused. and with your > > > patch, the loop is checking whether pad deactivated.and gst_pad_set_active(pad, > > > false) need lock STREAM_LOCK too. Even your patch can solve(since lock happens > > > in post_active, MODE_NONE was set in pre_active) this issue, it looks much > > > tricky. > > > > If you think gst_pad_is_active() could be an issue, we could just make it > > GST_PAD_IS_ACTIVE() instead. The idea is to keep vaapidecode call task pause in > > a single location, and not add hook at the GstElement level since > > GstVideoDecoder hooks should be enough. If they aren't, then we need to fix > > GstVideoDecoder. > gst_pad_is_active() is ok since multi-thread running. If new bugs found, then > consider GST_PAD_IS_ACTIVE later. > We can't expect GstVideoDeocder have all the hooks or do all the things, just > like sometimes need to hook query, event. I am wondering if we shouldn't better look into making GstPad stop the task itself when it is deactivated. This doesn't sound completely right to allow pad deactivations while still keeping running tasks around. Or, GstVideoDecoder could also be fixed to submit stop() before the parent's change_state() call? > > So, that's why the best we can do is to make the decode_loop() gracefully check > > whether the pad is still active or not, and shut it down itself if the pad was > > deactivated. > I'm ok for this patch. Meanwhile, we could indeed have that in vaapidecode. This works for other GstVideoDecoder based decoders + task probably because they exclusively rely on the (indirect pad) return status from one of the gst_video_decoder_*() function call. And since, in that case, all pad functions readily return GST_FLOW_FLAG_FLUSHING or EOS, the issue did not get noticed.
(In reply to comment #17) > I am wondering if we shouldn't better look into making GstPad stop the task > itself when it is deactivated. This doesn't sound completely right to allow pad > deactivations while still keeping running tasks around. Yes, the task should be stopped before pad deactivations. > Or, GstVideoDecoder could also be fixed to submit stop() before the parent's > change_state() call? +1, agree.
(In reply to comment #17) > > I am wondering if we shouldn't better look into making GstPad stop the task > itself when it is deactivated. This doesn't sound completely right to allow pad It's better stop task when user explicitly calls, not in deactivate function itself. sometimes task may have signal wait and user need to wakeup when something noticed. e.g. void _user_task(gpointer) { ... g_cond_wait(cond, muxtex); ... } User need to stop the task by g_cond_signal(cond); gst_pad_stop_task (pad); So, deactivate function can't stop the thread by itself. > deactivations while still keeping running tasks around. >
(In reply to comment #18) > (In reply to comment #17) > > I am wondering if we shouldn't better look into making GstPad stop the task > > itself when it is deactivated. This doesn't sound completely right to allow pad > > deactivations while still keeping running tasks around. > Yes, the task should be stopped before pad deactivations. > > > Or, GstVideoDecoder could also be fixed to submit stop() before the parent's > > change_state() call? > +1, agree. FYI, I opened bug #720805 to track any of those solutions.
commit d2f6274f31100e9a41dfc91a9064ee1a3c1b5e29 Author: Wind Yuan <feng.yuan@intel.com> Date: Tue Dec 17 10:26:03 2013 +0800 vaapidecode: fix hang on SIGINT. vaapidecode hangs when pipeline is stopped without any EOS, e.g. when <Ctrl>+C is pressed, thus causing the srcpad task to keep running and locked. This fixes a deadlock on state change from PAUSED to READY. https://bugzilla.gnome.org/show_bug.cgi?id=720584 Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com> commit 830566c17e3e2be79a2bbc6caf5b8e53643b3be8 Author: Wind Yuan <feng.yuan@intel.com> Date: Tue Dec 17 04:23:42 2013 -0500 vaapiencode: fix possible hang on SIGINT. vaapiencode might hang when the pipeline is stopped without any EOS, e.g. when <Ctrl>+C is pressed, thus causing the srcpad task to keep running and locked. This fixes a possible deadlock on state change from PAUSED to READY. https://bugzilla.gnome.org/show_bug.cgi?id=720584 Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>