After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 720584 - vaapidecode: hang while changing state from PAUSED to READY
vaapidecode: hang while changing state from PAUSED to READY
Status: RESOLVED FIXED
Product: gstreamer-vaapi
Classification: Other
Component: general
git master
Other Linux
: Normal critical
: ---
Assigned To: Wind Yuan
gstreamer-vaapi maintainer(s)
Depends on:
Blocks: 719412
 
 
Reported: 2013-12-17 02:52 UTC by Wind Yuan
Modified: 2014-01-14 18:18 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
0001-plugin-fix-decode-hang-on-CTRL-C (1.74 KB, patch)
2013-12-17 02:56 UTC, Wind Yuan
none Details | Review
0002-plugin-fix-encode-hang-on-CTRL-C (1.99 KB, patch)
2013-12-18 03:26 UTC, Wind Yuan
none Details | Review
vaapidecode: account for src pad deactivations (1.84 KB, patch)
2013-12-18 17:12 UTC, Gwenole Beauchesne
none Details | Review

Description Wind Yuan 2013-12-17 02:52:59 UTC
vaapidecode may hang sometimes if user interrupt the pipeline directly, e.g Ctrl+C.
This is not always happening but in some special pipeline is easy to produce. e.g. gst-launch-1.0 v4l2src ! image/jpeg ! vaapidecode ! vaapisink sync=false. When user break the pipeline with 'ctrl+c'. vaapidecode hung there possible more than >80% times.
Comment 1 Wind Yuan 2013-12-17 02:56:20 UTC
Created attachment 264382 [details] [review]
0001-plugin-fix-decode-hang-on-CTRL-C

vaapidecode hang when piepline stopped without EOS,e.g: Ctl+C, which made vaapidecode srcpad task running all the time and locked srcpad without release, and caused deadlock on state change from PAUSED to READY
Comment 2 Wind Yuan 2013-12-17 02:58:04 UTC
Gwenole, please review and comment.
Thanks.
Comment 3 Wind Yuan 2013-12-18 03:26:41 UTC
Created attachment 264460 [details] [review]
0002-plugin-fix-encode-hang-on-CTRL-C

vaapiencode has the same potential issue. add vaapiencode's fix, it's similar with decode's fix.
Comment 4 Gwenole Beauchesne 2013-12-18 05:06:37 UTC
(In reply to comment #1)
> Created an attachment (id=264382) [details] [review]
> 0001-plugin-fix-decode-hang-on-CTRL-C
> 
> vaapidecode hang when piepline stopped without EOS,e.g: Ctl+C, which made
> vaapidecode srcpad task running all the time and locked srcpad without release,
> and caused deadlock on state change from PAUSED to READY

It would be better to just do that in the GstVideoDecoder::stop() hook.
Comment 5 Wind Yuan 2013-12-18 05:11:56 UTC
stop() hook is a little late, parent->change_state() was already called which made the dead lock.
Comment 6 Gwenole Beauchesne 2013-12-18 17:11:18 UTC
(In reply to comment #5)
> stop() hook is a little late, parent->change_state() was already called which
> made the dead lock.

The problem lays down somewhere else and the dead-lock is not due to the GstVideoDecoder stream lock then.
Comment 7 Gwenole Beauchesne 2013-12-18 17:12:30 UTC
Created attachment 264490 [details] [review]
vaapidecode: account for src pad deactivations

    vaapidecode: account for src pad deactivations.
    
    Track src pad deactivations early so that to make sure the decode task
    is paused on time for GstVideoDecoder's parent change_state() hook.
Comment 8 Wind Yuan 2013-12-19 02:56:28 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > stop() hook is a little late, parent->change_state() was already called which
> > made the dead lock.
> 
> The problem lays down somewhere else and the dead-lock is not due to the
> GstVideoDecoder stream lock then.

Maybe i didn't say clearly. The dead lock comes from GST_OBJECT_LOCK(decode->srcpad);
1st Thread: when gst_pad_start_task(decode->srcpad), srcpad lock was always locked there, without release unless the task stop() or pause().

2nd Thread: when change_state called, stacks are
   GST_OBJECT_LOCK (pad);
   gst_pad_set_active(pad, active)
   activate_pads(pad, ret, FALSE)
   gst_element_pads_activate (element, FALSE)
   gst_element_change_state_func()
   gst_video_decoder_change_state()
  
 in set_active, src_pad try to lock itself but the lock was already locked in the loop.
Comment 9 Wind Yuan 2013-12-19 02:59:34 UTC
(In reply to comment #7)
> Created an attachment (id=264490) [details] [review]
> vaapidecode: account for src pad deactivations
> 
>     vaapidecode: account for src pad deactivations.
> 
>     Track src pad deactivations early so that to make sure the decode task
>     is paused on time for GstVideoDecoder's parent change_state() hook.

I don't think this patch can work. in dead lock, gst_pad_set_active(pad, FALSE) never finished, then the check always fails.
Comment 10 Wind Yuan 2013-12-19 03:12:14 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > stop() hook is a little late, parent->change_state() was already called which
> > > made the dead lock.
> > 
> > The problem lays down somewhere else and the dead-lock is not due to the
> > GstVideoDecoder stream lock then.
> 
> Maybe i didn't say clearly. The dead lock comes from
> GST_OBJECT_LOCK(decode->srcpad);
> 1st Thread: when gst_pad_start_task(decode->srcpad), srcpad lock was always
> locked there, without release unless the task stop() or pause().
> 
> 2nd Thread: when change_state called, stacks are
>    GST_OBJECT_LOCK (pad);
>    gst_pad_set_active(pad, active)
>    activate_pads(pad, ret, FALSE)
>    gst_element_pads_activate (element, FALSE)
>    gst_element_change_state_func()
>    gst_video_decoder_change_state()
> 
>  in set_active, src_pad try to lock itself but the lock was already locked in
> the loop.

Sorry, it's not the GST_OBJECT_LOCK(pad) but GST_PAD_GET_STREAM_LOCK(pad)
Comment 11 Gwenole Beauchesne 2013-12-19 04:59:08 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > stop() hook is a little late, parent->change_state() was already called which
> > > made the dead lock.
> > 
> > The problem lays down somewhere else and the dead-lock is not due to the
> > GstVideoDecoder stream lock then.
> 
> Maybe i didn't say clearly. The dead lock comes from
> GST_OBJECT_LOCK(decode->srcpad);
> 1st Thread: when gst_pad_start_task(decode->srcpad), srcpad lock was always
> locked there, without release unless the task stop() or pause().

Do you mean the mutex was always locked in the gst_pad_start_task() and it does not return without unlocking in some cases? If so, this rather looks like a GstPad bug.

If you mean the mutex is always locked in the decode_loop() function, then this is not true. The function returns every 100 ms, at most.

> 2nd Thread: when change_state called, stacks are
>    GST_OBJECT_LOCK (pad);
>    gst_pad_set_active(pad, active)
>    activate_pads(pad, ret, FALSE)
>    gst_element_pads_activate (element, FALSE)
>    gst_element_change_state_func()
>    gst_video_decoder_change_state()
> 
>  in set_active, src_pad try to lock itself but the lock was already locked in
> the loop.

From where precisely?
Comment 12 Wind Yuan 2013-12-19 05:13:16 UTC
The locked mutex is
GST_PAD_GET_STREAM_LOCK(pad), this lock would keep there always until task pause() or stop(), check gst_task_func()
  g_rec_mutex_lock (lock); /* lock is GST_PAD_GET_STREAM_LOCK(pad) */
  while (G_LIKELY (GET_TASK_STATE (task) != GST_TASK_STOPPED)) {
    if (G_UNLIKELY (GET_TASK_STATE (task) == GST_TASK_PAUSED)) {
      ...
    }
    task->func (task->user_data); //gst_vaapidecode_decode_loop
  }
  done:
  g_rec_mutex_unlock (lock);
  
Another thread blocked when trying to deactivate the pad, stacks
  • #0 __lll_lock_wait
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 135
  • #1 _L_lock_1035
    from /lib/x86_64-linux-gnu/libpthread.so.0
  • #2 __GI___pthread_mutex_lock
    at pthread_mutex_lock.c line 85
  • #3 post_activate
    at gstpad.c line 907
  • #4 gst_pad_activate_mode
    at gstpad.c line 1091
  • #5 gst_pad_set_active
    at gstpad.c line 970
  • #6 activate_pads
    at gstelement.c line 2687
  • #7 gst_iterator_fold
    at gstiterator.c line 614
  • #8 iterator_activate_fold_with_resync
    at gstelement.c line 2707
  • #9 gst_element_pads_activate
    at gstelement.c line 2743
  • #10 gst_element_change_state_func
    at gstelement.c line 2815
  • #11 gst_video_encoder_change_state
    at gstvideoencoder.c line 1405
  • #12 gst_element_change_state
    at gstelement.c line 2602
  • #13 gst_element_set_state_func
    at gstelement.c line 2558
  • #14 gst_element_set_state
    at gstelement.c line 2459
  • #15 gst_bin_element_set_state
    at gstbin.c line 2325
  • #16 gst_bin_change_state_func
    at gstbin.c line 2648
  • #17 gst_pipeline_change_state
    at gstpipeline.c line 471

Comment 13 Gwenole Beauchesne 2013-12-19 05:26:29 UTC
(In reply to comment #12)
> The locked mutex is
> GST_PAD_GET_STREAM_LOCK(pad), this lock would keep there always until task
> pause() or stop(), check gst_task_func()
>   g_rec_mutex_lock (lock); /* lock is GST_PAD_GET_STREAM_LOCK(pad) */
>   while (G_LIKELY (GET_TASK_STATE (task) != GST_TASK_STOPPED)) {
>     if (G_UNLIKELY (GET_TASK_STATE (task) == GST_TASK_PAUSED)) {
>       ...
>     }
>     task->func (task->user_data); //gst_vaapidecode_decode_loop
>   }
>   done:
>   g_rec_mutex_unlock (lock);

gst_vaapidecode_decode_loop() precisely returns with the task in paused state if the pad is deactivated, so the STREAM_LOCK is released. If the lock is held through push_decoded_frame() instead, then I need to know where/how. Thanks.

Still, the latter situation should not be possible either because, if a GstPad function is called but it is not in an active mode (flushed, flushing), then that function returns with an appropriate error anyway, and we are not blocked through push_decoded_frame().
Comment 14 Wind Yuan 2013-12-19 05:48:34 UTC
 
> gst_vaapidecode_decode_loop() precisely returns with the task in paused state
> if the pad is deactivated, so the STREAM_LOCK is released. If the lock is held
Yes, if task in paused state, the lock can be released. but the problem is here, gst_vaapidecode_decode_loop() doesn't going into paused. and with your patch, the loop is checking whether pad deactivated.and gst_pad_set_active(pad, false) need lock STREAM_LOCK too. Even your patch can solve(since lock happens in post_active, MODE_NONE was set in pre_active) this issue, it looks much tricky.

> through push_decoded_frame() instead, then I need to know where/how. Thanks.
> 
> Still, the latter situation should not be possible either because, if a GstPad
> function is called but it is not in an active mode (flushed, flushing), then
> that function returns with an appropriate error anyway, and we are not blocked
> through push_decoded_frame().
Nothing about push_decoded_frame().
Comment 15 Gwenole Beauchesne 2013-12-19 09:46:33 UTC
(In reply to comment #14)
> > gst_vaapidecode_decode_loop() precisely returns with the task in paused state
> > if the pad is deactivated, so the STREAM_LOCK is released. If the lock is held
> Yes, if task in paused state, the lock can be released. but the problem is
> here, gst_vaapidecode_decode_loop() doesn't going into paused. and with your
> patch, the loop is checking whether pad deactivated.and gst_pad_set_active(pad,
> false) need lock STREAM_LOCK too. Even your patch can solve(since lock happens
> in post_active, MODE_NONE was set in pre_active) this issue, it looks much
> tricky.

If you think gst_pad_is_active() could be an issue, we could just make it GST_PAD_IS_ACTIVE() instead. The idea is to keep vaapidecode call task pause in a single location, and not add hook at the GstElement level since GstVideoDecoder hooks should be enough. If they aren't, then we need to fix GstVideoDecoder.

So, that's why the best we can do is to make the decode_loop() gracefully check whether the pad is still active or not, and shut it down itself if the pad was deactivated.
Comment 16 Wind Yuan 2013-12-19 10:17:30 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > > gst_vaapidecode_decode_loop() precisely returns with the task in paused state
> > > if the pad is deactivated, so the STREAM_LOCK is released. If the lock is held
> > Yes, if task in paused state, the lock can be released. but the problem is
> > here, gst_vaapidecode_decode_loop() doesn't going into paused. and with your
> > patch, the loop is checking whether pad deactivated.and gst_pad_set_active(pad,
> > false) need lock STREAM_LOCK too. Even your patch can solve(since lock happens
> > in post_active, MODE_NONE was set in pre_active) this issue, it looks much
> > tricky.
> 
> If you think gst_pad_is_active() could be an issue, we could just make it
> GST_PAD_IS_ACTIVE() instead. The idea is to keep vaapidecode call task pause in
> a single location, and not add hook at the GstElement level since
> GstVideoDecoder hooks should be enough. If they aren't, then we need to fix
> GstVideoDecoder.
gst_pad_is_active() is ok since multi-thread running. If new bugs found, then consider GST_PAD_IS_ACTIVE later.
We can't expect GstVideoDeocder have all the hooks or do all the things, just like sometimes need to hook query, event.

> 
> So, that's why the best we can do is to make the decode_loop() gracefully check
> whether the pad is still active or not, and shut it down itself if the pad was
> deactivated.
I'm ok for this patch.
Comment 17 Gwenole Beauchesne 2013-12-19 10:50:21 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > (In reply to comment #14)
> > > > gst_vaapidecode_decode_loop() precisely returns with the task in paused state
> > > > if the pad is deactivated, so the STREAM_LOCK is released. If the lock is held
> > > Yes, if task in paused state, the lock can be released. but the problem is
> > > here, gst_vaapidecode_decode_loop() doesn't going into paused. and with your
> > > patch, the loop is checking whether pad deactivated.and gst_pad_set_active(pad,
> > > false) need lock STREAM_LOCK too. Even your patch can solve(since lock happens
> > > in post_active, MODE_NONE was set in pre_active) this issue, it looks much
> > > tricky.
> > 
> > If you think gst_pad_is_active() could be an issue, we could just make it
> > GST_PAD_IS_ACTIVE() instead. The idea is to keep vaapidecode call task pause in
> > a single location, and not add hook at the GstElement level since
> > GstVideoDecoder hooks should be enough. If they aren't, then we need to fix
> > GstVideoDecoder.
> gst_pad_is_active() is ok since multi-thread running. If new bugs found, then
> consider GST_PAD_IS_ACTIVE later.
> We can't expect GstVideoDeocder have all the hooks or do all the things, just
> like sometimes need to hook query, event.

I am wondering if we shouldn't better look into making GstPad stop the task itself when it is deactivated. This doesn't sound completely right to allow pad deactivations while still keeping running tasks around.

Or, GstVideoDecoder could also be fixed to submit stop() before the parent's change_state() call?

> > So, that's why the best we can do is to make the decode_loop() gracefully check
> > whether the pad is still active or not, and shut it down itself if the pad was
> > deactivated.
> I'm ok for this patch.

Meanwhile, we could indeed have that in vaapidecode. This works for other GstVideoDecoder based decoders + task probably because they exclusively rely on the (indirect pad) return status from one of the gst_video_decoder_*() function call. And since, in that case, all pad functions readily return GST_FLOW_FLAG_FLUSHING or EOS, the issue did not get noticed.
Comment 18 Wind Yuan 2013-12-20 05:22:06 UTC
(In reply to comment #17)
> I am wondering if we shouldn't better look into making GstPad stop the task
> itself when it is deactivated. This doesn't sound completely right to allow pad
> deactivations while still keeping running tasks around.
Yes, the task should be stopped before pad deactivations.

> Or, GstVideoDecoder could also be fixed to submit stop() before the parent's
> change_state() call?
+1, agree.
Comment 19 Wind Yuan 2013-12-20 05:37:51 UTC
(In reply to comment #17)
> 
> I am wondering if we shouldn't better look into making GstPad stop the task
> itself when it is deactivated. This doesn't sound completely right to allow pad
It's better stop task when user explicitly calls, not in deactivate function itself. sometimes task may have signal wait and user need to wakeup when something noticed. e.g.
   void _user_task(gpointer) {
       ...
       g_cond_wait(cond, muxtex);
       ...
   }
User need to stop the task by
  g_cond_signal(cond);
  gst_pad_stop_task (pad);

So, deactivate function can't stop the thread by itself.

> deactivations while still keeping running tasks around.
>
Comment 20 Gwenole Beauchesne 2013-12-20 05:45:14 UTC
(In reply to comment #18)
> (In reply to comment #17)
> > I am wondering if we shouldn't better look into making GstPad stop the task
> > itself when it is deactivated. This doesn't sound completely right to allow pad
> > deactivations while still keeping running tasks around.
> Yes, the task should be stopped before pad deactivations.
> 
> > Or, GstVideoDecoder could also be fixed to submit stop() before the parent's
> > change_state() call?
> +1, agree.

FYI, I opened bug #720805 to track any of those solutions.
Comment 21 Gwenole Beauchesne 2014-01-14 18:18:14 UTC
commit d2f6274f31100e9a41dfc91a9064ee1a3c1b5e29
Author: Wind Yuan <feng.yuan@intel.com>
Date:   Tue Dec 17 10:26:03 2013 +0800

    vaapidecode: fix hang on SIGINT.
    
    vaapidecode hangs when pipeline is stopped without any EOS, e.g. when
    <Ctrl>+C is pressed, thus causing the srcpad task to keep running and
    locked. This fixes a deadlock on state change from PAUSED to READY.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=720584
    
    Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>

commit 830566c17e3e2be79a2bbc6caf5b8e53643b3be8
Author: Wind Yuan <feng.yuan@intel.com>
Date:   Tue Dec 17 04:23:42 2013 -0500

    vaapiencode: fix possible hang on SIGINT.
    
    vaapiencode might hang when the pipeline is stopped without any EOS,
    e.g. when <Ctrl>+C is pressed, thus causing the srcpad task to keep
    running and locked. This fixes a possible deadlock on state change
    from PAUSED to READY.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=720584
    
    Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>