After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 783627 - gstvideodecoder.c: excessive locking triggering many deadlocks
gstvideodecoder.c: excessive locking triggering many deadlocks
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
git master
Other Linux
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-06-10 11:41 UTC by minfrin
Modified: 2018-11-03 11:57 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Draft patch to unblock the pipeline (749 bytes, patch)
2017-06-10 13:12 UTC, minfrin
rejected Details | Review

Description minfrin 2017-06-10 11:41:40 UTC
gst_video_decoder_negotiate() is triggering a deadlock during playback:

  GST_VIDEO_DECODER_STREAM_LOCK (decoder);
  gst_pad_check_reconfigure (decoder->srcpad);
  if (klass->negotiate) {
    ret = klass->negotiate (decoder); // deadlock here
    if (!ret)
      gst_pad_mark_reconfigure (decoder->srcpad);
  }
  GST_VIDEO_DECODER_STREAM_UNLOCK (decoder);

After locking the element, we get stop waiting for a condition to be completed inside klass->negotiate() as follows:

Thread 21 (Thread 0x6a1ff460 (LWP 28443))

  • #0 syscall
    at ../ports/sysdeps/unix/sysv/linux/arm/syscall.S line 37
  • #1 g_cond_wait
    at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthread-posix.c line 1396
  • #2 do_probe_callbacks
    at gstpad.c line 3666
  • #3 gst_pad_peer_query
    at gstpad.c line 4071
  • #4 gst_video_decoder_negotiate_pool
    at gstvideodecoder.c line 3708
  • #5 gst_video_decoder_negotiate_default
    at gstvideodecoder.c line 3859
  • #6 gst_video_decoder_negotiate
    at gstvideodecoder.c line 3900
  • #7 gst_omx_video_dec_reconfigure_output_port
    at gstomxvideodec.c line 974
  • #8 gst_omx_video_dec_loop
    at gstomxvideodec.c line 1289
  • #9 gst_task_func
    at gsttask.c line 332
  • #10 default_func
    at gsttaskpool.c line 69
  • #11 g_thread_pool_thread_proxy
    at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthreadpool.c line 307
  • #12 g_thread_proxy
    at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthread.c line 764
  • #13 start_thread
    at pthread_create.c line 311
  • #14 ??
    at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S line 92

Thread 17 (Thread 0x6c5ff460 (LWP 28437))

  • #0 __lll_lock_wait
    at ../ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.c line 46
  • #1 __GI___pthread_mutex_lock
    at pthread_mutex_lock.c line 114
  • #2 gst_video_decoder_chain
    at gstvideodecoder.c line 2430
  • #3 gst_pad_chain_data_unchecked
    at gstpad.c line 4205
  • #4 gst_pad_push_data
    at gstpad.c line 4457
  • #5 gst_pad_push
    at gstpad.c line 4576
  • #6 gst_base_transform_chain
    at gstbasetransform.c line 2312
  • #7 gst_pad_chain_data_unchecked
    at gstpad.c line 4205
  • #8 gst_pad_push_data
    at gstpad.c line 4457
  • #9 gst_pad_push
    at gstpad.c line 4576
  • #10 gst_base_parse_push_frame
    at gstbaseparse.c line 2520
  • #11 gst_base_parse_handle_and_push_frame
    at gstbaseparse.c line 2337
  • #12 gst_base_parse_finish_frame
    at gstbaseparse.c line 2678
  • #13 gst_h264_parse_handle_frame
    at gsth264parse.c line 1275
  • #14 gst_base_parse_handle_buffer
    at gstbaseparse.c line 2145
  • #15 gst_base_parse_chain
    at gstbaseparse.c line 3227
  • #16 gst_pad_chain_data_unchecked
    at gstpad.c line 4205
  • #17 gst_pad_push_data
    at gstpad.c line 4457
  • #18 gst_pad_push
    at gstpad.c line 4576
  • #19 gst_single_queue_push_one
    at gstmultiqueue.c line 1608
  • #20 gst_multi_queue_loop
    at gstmultiqueue.c line 1920
  • #21 gst_task_func
    at gsttask.c line 332
  • #22 default_func
    at gsttaskpool.c line 69
  • #23 g_thread_pool_thread_proxy
    at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthreadpool.c line 307
  • #24 g_thread_proxy
    at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthread.c line 764
  • #25 start_thread
    at pthread_create.c line 311
  • #26 ??
    at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S line 92

Comment 1 minfrin 2017-06-10 11:44:12 UTC
Sigh. Is there a way to turn off this ridiculous "stack trace parsing" nonsense that swallows up, discards, and utterly trashes half the bug report?
Comment 2 minfrin 2017-06-10 11:46:01 UTC
Let's try this again.

First stack trace, waiting inside klass->negotiate():

x Thread 21 (Thread 0x6a1ff460 (LWP 28443)):
x #0  syscall () at ../ports/sysdeps/unix/sysv/linux/arm/syscall.S:37
x #1  0x76d7943c in g_cond_wait (cond=0x6de57c54, mutex=0x6de57c04) at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthread-posix.c:1396
x #2  0x76f052c4 in do_probe_callbacks (pad=0x6de57bf8, info=0x6a1fea78, defaultval=GST_FLOW_OK) at gstpad.c:3666
x #3  0x76f07104 in gst_pad_peer_query (pad=0x6de57bf8, query=0x7190ca00) at gstpad.c:4071
x #4  0x76330cc4 in gst_video_decoder_negotiate_pool (decoder=0x71836150, caps=0x72204ea0) at gstvideodecoder.c:3708
x #5  0x763315c8 in gst_video_decoder_negotiate_default (decoder=0x71836150) at gstvideodecoder.c:3859
x #6  0x763317ac in gst_video_decoder_negotiate (decoder=0x71836150) at gstvideodecoder.c:3900
x #7  0x72e6c28c in gst_omx_video_dec_reconfigure_output_port (self=0x71836150) at gstomxvideodec.c:974
x #8  0x72e6d530 in gst_omx_video_dec_loop (self=0x71836150) at gstomxvideodec.c:1289
x #9  0x76f519e8 in gst_task_func (task=0x6de4c2c8) at gsttask.c:332
x #10 0x76f5305c in default_func (tdata=0x1ca9a0, pool=0x38c58) at gsttaskpool.c:69
x #11 0x76d5945c in g_thread_pool_thread_proxy (data=<optimized out>) at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthreadpool.c:307
x #12 0x76d588c4 in g_thread_proxy (data=0x718032f0) at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthread.c:764
x #13 0x76c9ae90 in start_thread (arg=0x6a1ff460) at pthread_create.c:311
x #14 0x76c24598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6
x Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Comment 3 minfrin 2017-06-10 11:48:34 UTC
Second stack trace, the pipeline is pumping data, and this gets blocked because the element above is locked, giving us deadlock:

x Thread 17 (Thread 0x6c5ff460 (LWP 28437)):
x #0  0x76ca2780 in __lll_lock_wait (futex=futex@entry=0x71801688, private=<optimized out>) at ../ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.c:46
x #1  0x76c9d3c4 in __GI___pthread_mutex_lock (mutex=0x71801688) at pthread_mutex_lock.c:114
x #2  0x76329b20 in gst_video_decoder_chain (pad=0x6de57aa8, parent=0x71836150, buf=0x75643970) at gstvideodecoder.c:2430
x #3  0x76f07bd0 in gst_pad_chain_data_unchecked (pad=0x6de57aa8, type=4112, data=0x75643970) at gstpad.c:4205
x #4  0x76f08bfc in gst_pad_push_data (pad=0x6de45958, type=4112, data=0x75643970) at gstpad.c:4457
x #5  0x76f094c0 in gst_pad_push (pad=0x6de45958, buffer=0x75643970) at gstpad.c:4576
x #6  0x767c4128 in gst_base_transform_chain (pad=0x6de45808, parent=0x2d5bb0, buffer=0x75643970) at gstbasetransform.c:2312
x #7  0x76f07bd0 in gst_pad_chain_data_unchecked (pad=0x6de45808, type=4112, data=0x75643970) at gstpad.c:4205
x #8  0x76f08bfc in gst_pad_push_data (pad=0x6de456b8, type=4112, data=0x75643970) at gstpad.c:4457
x #9  0x76f094c0 in gst_pad_push (pad=0x6de456b8, buffer=0x75643970) at gstpad.c:4576
x #10 0x7678ef1c in gst_base_parse_push_frame (parse=0x6de4a730, frame=0x71803b68) at gstbaseparse.c:2520
x #11 0x7678dd08 in gst_base_parse_handle_and_push_frame (parse=0x6de4a730, frame=0x71803b68) at gstbaseparse.c:2337
x #12 0x7678f850 in gst_base_parse_finish_frame (parse=0x6de4a730, frame=0x71803b68, size=43130) at gstbaseparse.c:2678
x #13 0x72ec2188 in gst_h264_parse_handle_frame (parse=0x6de4a730, frame=0x71803b68, skipsize=0x6c5fea6c) at gsth264parse.c:1275
x #14 0x7678cee0 in gst_base_parse_handle_buffer (parse=0x6de4a730, buffer=0x756b3a00, skip=0x6c5fea6c, flushed=0x6c5fea68) at gstbaseparse.c:2145
x #15 0x767923d4 in gst_base_parse_chain (pad=0x6de45568, parent=0x6de4a730, buffer=0x75692cb8) at gstbaseparse.c:3227
x #16 0x76f07bd0 in gst_pad_chain_data_unchecked (pad=0x6de45568, type=4112, data=0x75692cb8) at gstpad.c:4205
x #17 0x76f08bfc in gst_pad_push_data (pad=0x6de45418, type=4112, data=0x75692cb8) at gstpad.c:4457
x #18 0x76f094c0 in gst_pad_push (pad=0x6de45418, buffer=0x75692cb8) at gstpad.c:4576
x #19 0x75fdcf78 in gst_single_queue_push_one (mq=0x6de452a8, sq=0x6de48cc8, object=0x75692cb8, allow_drop=0x6c5fed44) at gstmultiqueue.c:1608
x #20 0x75fdede8 in gst_multi_queue_loop (pad=0x6de45418) at gstmultiqueue.c:1920
x #21 0x76f519e8 in gst_task_func (task=0x7561e418) at gsttask.c:332
x #22 0x76f5305c in default_func (tdata=0x1affe0, pool=0x38c58) at gsttaskpool.c:69
x #23 0x76d5945c in g_thread_pool_thread_proxy (data=<optimized out>) at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthreadpool.c:307
x #24 0x76d588c4 in g_thread_proxy (data=0x75602000) at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthread.c:764
x #25 0x76c9ae90 in start_thread (arg=0x6c5ff460) at pthread_create.c:311
x #26 0x76c24598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Comment 4 minfrin 2017-06-10 13:12:33 UTC
Created attachment 353524 [details] [review]
Draft patch to unblock the pipeline

Here is a draft patch that unblocks the pipeline, is this problem just excessive locking or is it more subtle than this?
Comment 5 minfrin 2017-06-10 17:00:07 UTC
I see this attempted to fix part of the problem by using the object lock instead of the stream lock.

https://bugzilla.gnome.org/show_bug.cgi?id=768361
Comment 6 minfrin 2017-06-11 13:38:54 UTC
Digging through this code, there are many deadlocks, not just the one.

A repeated pattern is code such as the following which locks the stream:

static void
gst_video_decoder_reset (GstVideoDecoder * decoder, gboolean full,
    gboolean flush_hard)
{
  GstVideoDecoderPrivate *priv = decoder->priv;

  GST_DEBUG_OBJECT (decoder, "reset full %d", full);

  GST_VIDEO_DECODER_STREAM_LOCK (decoder);

  ...
}

gst_video_decoder_reset() is then called in various places like this, triggering an attempt to lock the stream twice:

static GstStateChangeReturn
gst_video_decoder_change_state (GstElement * element, GstStateChange transition)
{
  ...

    case GST_STATE_CHANGE_READY_TO_PAUSED:
      GST_VIDEO_DECODER_STREAM_LOCK (decoder);
      gst_video_decoder_reset (decoder, TRUE, TRUE);
      GST_VIDEO_DECODER_STREAM_UNLOCK (decoder);

  ...

      GST_VIDEO_DECODER_STREAM_LOCK (decoder);
      gst_video_decoder_reset (decoder, TRUE, TRUE);
      GST_VIDEO_DECODER_STREAM_UNLOCK (decoder);

  ...
}

This is one example, there are many.

In a non-trivial pipeline (one that includes regegotiations) that processes non-trivial data (live data, complete with discontinuities and dynamic behaviour), we deadlock repeatedly.

This is caused by the gstvideodecoder.c code locking the stream, then calling downstream functions of other elements, which make queries and send events back upstream, which are in turn received by gstvideodecoder.c again, which then tries to relock the already-locked stream - deadlock.

The same pattern exists in gstvideoencoder.c, and the same bugs.
Comment 7 minfrin 2017-06-11 14:58:49 UTC
Just learned that the stream lock is recursive and as I understand can be called multiple times from the same thread.

gstvideodecoder.c in this case is being called from gstomxvideodec.c - in this case the OMX video hardware is processed by two different threads, one thread feeds data into the GPU, and the second thread pulls data out of the GPU.

From what I can see this causes potential problems with locking, as a query from one thread can be blocked on the other, which is in turn waiting for the first thread to be ready to receive data.
Comment 8 Nicolas Dufresne (ndufresne) 2017-06-11 15:10:35 UTC
The decoder stream lock is likely redundant, but we are stuck with it, since it's in the API/Abi.

That lock though should always be dropped when a flush or a transition to ready State occurs. Subclass are responsible for that if these subclass have a separated thread, like omxvideodec and v4l2videodec.
Comment 9 Nicolas Dufresne (ndufresne) 2017-06-11 15:12:40 UTC
Review of attachment 353524 [details] [review]:

That would be an ABI break, drop de stream lock in the sublass instead, only while doing queries, or pushing events.
Comment 10 GStreamer system administrator 2018-11-03 11:57:10 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/issues/358.