GNOME Bugzilla – Bug 783627
gstvideodecoder.c: excessive locking triggering many deadlocks
Last modified: 2018-11-03 11:57:10 UTC
gst_video_decoder_negotiate() is triggering a deadlock during playback: GST_VIDEO_DECODER_STREAM_LOCK (decoder); gst_pad_check_reconfigure (decoder->srcpad); if (klass->negotiate) { ret = klass->negotiate (decoder); // deadlock here if (!ret) gst_pad_mark_reconfigure (decoder->srcpad); } GST_VIDEO_DECODER_STREAM_UNLOCK (decoder); After locking the element, we get stop waiting for a condition to be completed inside klass->negotiate() as follows:
+ Trace 237564
Thread 21 (Thread 0x6a1ff460 (LWP 28443))
Thread 17 (Thread 0x6c5ff460 (LWP 28437))
Sigh. Is there a way to turn off this ridiculous "stack trace parsing" nonsense that swallows up, discards, and utterly trashes half the bug report?
Let's try this again. First stack trace, waiting inside klass->negotiate(): x Thread 21 (Thread 0x6a1ff460 (LWP 28443)): x #0 syscall () at ../ports/sysdeps/unix/sysv/linux/arm/syscall.S:37 x #1 0x76d7943c in g_cond_wait (cond=0x6de57c54, mutex=0x6de57c04) at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthread-posix.c:1396 x #2 0x76f052c4 in do_probe_callbacks (pad=0x6de57bf8, info=0x6a1fea78, defaultval=GST_FLOW_OK) at gstpad.c:3666 x #3 0x76f07104 in gst_pad_peer_query (pad=0x6de57bf8, query=0x7190ca00) at gstpad.c:4071 x #4 0x76330cc4 in gst_video_decoder_negotiate_pool (decoder=0x71836150, caps=0x72204ea0) at gstvideodecoder.c:3708 x #5 0x763315c8 in gst_video_decoder_negotiate_default (decoder=0x71836150) at gstvideodecoder.c:3859 x #6 0x763317ac in gst_video_decoder_negotiate (decoder=0x71836150) at gstvideodecoder.c:3900 x #7 0x72e6c28c in gst_omx_video_dec_reconfigure_output_port (self=0x71836150) at gstomxvideodec.c:974 x #8 0x72e6d530 in gst_omx_video_dec_loop (self=0x71836150) at gstomxvideodec.c:1289 x #9 0x76f519e8 in gst_task_func (task=0x6de4c2c8) at gsttask.c:332 x #10 0x76f5305c in default_func (tdata=0x1ca9a0, pool=0x38c58) at gsttaskpool.c:69 x #11 0x76d5945c in g_thread_pool_thread_proxy (data=<optimized out>) at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthreadpool.c:307 x #12 0x76d588c4 in g_thread_proxy (data=0x718032f0) at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthread.c:764 x #13 0x76c9ae90 in start_thread (arg=0x6a1ff460) at pthread_create.c:311 x #14 0x76c24598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6 x Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Second stack trace, the pipeline is pumping data, and this gets blocked because the element above is locked, giving us deadlock: x Thread 17 (Thread 0x6c5ff460 (LWP 28437)): x #0 0x76ca2780 in __lll_lock_wait (futex=futex@entry=0x71801688, private=<optimized out>) at ../ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.c:46 x #1 0x76c9d3c4 in __GI___pthread_mutex_lock (mutex=0x71801688) at pthread_mutex_lock.c:114 x #2 0x76329b20 in gst_video_decoder_chain (pad=0x6de57aa8, parent=0x71836150, buf=0x75643970) at gstvideodecoder.c:2430 x #3 0x76f07bd0 in gst_pad_chain_data_unchecked (pad=0x6de57aa8, type=4112, data=0x75643970) at gstpad.c:4205 x #4 0x76f08bfc in gst_pad_push_data (pad=0x6de45958, type=4112, data=0x75643970) at gstpad.c:4457 x #5 0x76f094c0 in gst_pad_push (pad=0x6de45958, buffer=0x75643970) at gstpad.c:4576 x #6 0x767c4128 in gst_base_transform_chain (pad=0x6de45808, parent=0x2d5bb0, buffer=0x75643970) at gstbasetransform.c:2312 x #7 0x76f07bd0 in gst_pad_chain_data_unchecked (pad=0x6de45808, type=4112, data=0x75643970) at gstpad.c:4205 x #8 0x76f08bfc in gst_pad_push_data (pad=0x6de456b8, type=4112, data=0x75643970) at gstpad.c:4457 x #9 0x76f094c0 in gst_pad_push (pad=0x6de456b8, buffer=0x75643970) at gstpad.c:4576 x #10 0x7678ef1c in gst_base_parse_push_frame (parse=0x6de4a730, frame=0x71803b68) at gstbaseparse.c:2520 x #11 0x7678dd08 in gst_base_parse_handle_and_push_frame (parse=0x6de4a730, frame=0x71803b68) at gstbaseparse.c:2337 x #12 0x7678f850 in gst_base_parse_finish_frame (parse=0x6de4a730, frame=0x71803b68, size=43130) at gstbaseparse.c:2678 x #13 0x72ec2188 in gst_h264_parse_handle_frame (parse=0x6de4a730, frame=0x71803b68, skipsize=0x6c5fea6c) at gsth264parse.c:1275 x #14 0x7678cee0 in gst_base_parse_handle_buffer (parse=0x6de4a730, buffer=0x756b3a00, skip=0x6c5fea6c, flushed=0x6c5fea68) at gstbaseparse.c:2145 x #15 0x767923d4 in gst_base_parse_chain (pad=0x6de45568, parent=0x6de4a730, buffer=0x75692cb8) at gstbaseparse.c:3227 x #16 0x76f07bd0 in gst_pad_chain_data_unchecked (pad=0x6de45568, type=4112, data=0x75692cb8) at gstpad.c:4205 x #17 0x76f08bfc in gst_pad_push_data (pad=0x6de45418, type=4112, data=0x75692cb8) at gstpad.c:4457 x #18 0x76f094c0 in gst_pad_push (pad=0x6de45418, buffer=0x75692cb8) at gstpad.c:4576 x #19 0x75fdcf78 in gst_single_queue_push_one (mq=0x6de452a8, sq=0x6de48cc8, object=0x75692cb8, allow_drop=0x6c5fed44) at gstmultiqueue.c:1608 x #20 0x75fdede8 in gst_multi_queue_loop (pad=0x6de45418) at gstmultiqueue.c:1920 x #21 0x76f519e8 in gst_task_func (task=0x7561e418) at gsttask.c:332 x #22 0x76f5305c in default_func (tdata=0x1affe0, pool=0x38c58) at gsttaskpool.c:69 x #23 0x76d5945c in g_thread_pool_thread_proxy (data=<optimized out>) at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthreadpool.c:307 x #24 0x76d588c4 in g_thread_proxy (data=0x75602000) at /build/glib2.0-tTvduh/glib2.0-2.42.1/./glib/gthread.c:764 x #25 0x76c9ae90 in start_thread (arg=0x6c5ff460) at pthread_create.c:311 x #26 0x76c24598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Created attachment 353524 [details] [review] Draft patch to unblock the pipeline Here is a draft patch that unblocks the pipeline, is this problem just excessive locking or is it more subtle than this?
I see this attempted to fix part of the problem by using the object lock instead of the stream lock. https://bugzilla.gnome.org/show_bug.cgi?id=768361
Digging through this code, there are many deadlocks, not just the one. A repeated pattern is code such as the following which locks the stream: static void gst_video_decoder_reset (GstVideoDecoder * decoder, gboolean full, gboolean flush_hard) { GstVideoDecoderPrivate *priv = decoder->priv; GST_DEBUG_OBJECT (decoder, "reset full %d", full); GST_VIDEO_DECODER_STREAM_LOCK (decoder); ... } gst_video_decoder_reset() is then called in various places like this, triggering an attempt to lock the stream twice: static GstStateChangeReturn gst_video_decoder_change_state (GstElement * element, GstStateChange transition) { ... case GST_STATE_CHANGE_READY_TO_PAUSED: GST_VIDEO_DECODER_STREAM_LOCK (decoder); gst_video_decoder_reset (decoder, TRUE, TRUE); GST_VIDEO_DECODER_STREAM_UNLOCK (decoder); ... GST_VIDEO_DECODER_STREAM_LOCK (decoder); gst_video_decoder_reset (decoder, TRUE, TRUE); GST_VIDEO_DECODER_STREAM_UNLOCK (decoder); ... } This is one example, there are many. In a non-trivial pipeline (one that includes regegotiations) that processes non-trivial data (live data, complete with discontinuities and dynamic behaviour), we deadlock repeatedly. This is caused by the gstvideodecoder.c code locking the stream, then calling downstream functions of other elements, which make queries and send events back upstream, which are in turn received by gstvideodecoder.c again, which then tries to relock the already-locked stream - deadlock. The same pattern exists in gstvideoencoder.c, and the same bugs.
Just learned that the stream lock is recursive and as I understand can be called multiple times from the same thread. gstvideodecoder.c in this case is being called from gstomxvideodec.c - in this case the OMX video hardware is processed by two different threads, one thread feeds data into the GPU, and the second thread pulls data out of the GPU. From what I can see this causes potential problems with locking, as a query from one thread can be blocked on the other, which is in turn waiting for the first thread to be ready to receive data.
The decoder stream lock is likely redundant, but we are stuck with it, since it's in the API/Abi. That lock though should always be dropped when a flush or a transition to ready State occurs. Subclass are responsible for that if these subclass have a separated thread, like omxvideodec and v4l2videodec.
Review of attachment 353524 [details] [review]: That would be an ABI break, drop de stream lock in the sublass instead, only while doing queries, or pushing events.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/issues/358.