Bug 702520 – queue: deadlock when reconfigure event

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 702520 - queue: deadlock when reconfigure event


Summary:	queue: deadlock when reconfigure event


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gstreamer (core)
Version:	1.1.1
Hardware:	Other Linux

Importance:	Normal blocker
Target Milestone:	1.1.2
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-06-18 00:50 UTC by Aleix Conchillo Flaqué
Modified:	2013-06-19 18:09 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Aleix Conchillo Flaqué 2013-06-18 00:50:13 UTC

I just switched from 1.0.7 to 1.1.1 and a working program got stuck. It seems something (mutex lock) introduced in this commit:

http://cgit.freedesktop.org/gstreamer/gstreamer/commit/plugins/elements/gstqueue.c?id=c955ddc712f3b4de9ef5d822b95a6f4bd9985eb3

I still don't understand why this happens, I'll keep on checking.

This is the backtrace with the affected threads:

+ Trace 232085

Thread 4 (Thread 0x7f3a73004700 (LWP 26202))

#0 __lll_lock_wait
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 132
#1 _L_lock_1006
from /lib/x86_64-linux-gnu/libpthread.so.0
#2 __pthread_mutex_lock
at pthread_mutex_lock.c line 101
#3 g_mutex_lock
from /opt/oblong/deps/lib/libglib-2.0.so.0
#4 gst_queue_handle_src_event
at gstqueue.c line 1288
#5 gst_pad_send_event_unchecked
at gstpad.c line 4996
#6 gst_pad_push_event_unchecked
at gstpad.c line 4685
#7 gst_pad_push_event
at gstpad.c line 4808
#8 event_forward_func
at gstpad.c line 2741
#9 gst_pad_forward
at gstpad.c line 2695
#10 gst_pad_event_default
at gstpad.c line 2792
#11 gst_pad_send_event_unchecked
at gstpad.c line 4996
#12 gst_pad_push_event_unchecked
at gstpad.c line 4685
#13 gst_pad_push_event
at gstpad.c line 4808
#14 event_forward_func
at gstpad.c line 2741
#15 gst_pad_forward
at gstpad.c line 2695
#16 gst_pad_event_default
at gstpad.c line 2792
#17 gst_pad_send_event_unchecked
at gstpad.c line 4996
#18 gst_pad_push_event_unchecked
at gstpad.c line 4685
#19 gst_pad_push_event
at gstpad.c line 4808
#20 gst_type_find_element_src_event
at gsttypefindelement.c line 519
#21 gst_pad_send_event_unchecked
at gstpad.c line 4996
#22 gst_pad_push_event_unchecked
at gstpad.c line 4685
#23 gst_pad_push_event
at gstpad.c line 4808
#24 event_forward_func
at gstpad.c line 2741
#25 gst_pad_forward
at gstpad.c line 2695
#26 gst_pad_event_default
at gstpad.c line 2792
#27 gst_pad_send_event_unchecked
at gstpad.c line 4996
#28 gst_pad_send_event
at gstpad.c line 5154
#29 gst_pad_link_full
at gstpad.c line 2289

Comment 1 Aleix Conchillo Flaqué 2013-06-18 01:37:58 UTC

(In reply to comment #0)
> 
> This is the backtrace with the affected threads:
> 

Sorry, only thread 4 and 2 from the bt apply to the same queue.

Comment 2 Sebastian Dröge (slomo) 2013-06-18 11:25:12 UTC

Can you provide backtraces of all GStreamer related threads? From these threads there should not be a deadlock as none of them hold that mutex while waiting for the GCond. There must be one thread that is currently blocking this mutex and not releasing it because it waits for something else to finish, namely the allocation query or the pad linking that caused the reconfigure event.

Comment 3 Aleix Conchillo Flaqué 2013-06-18 14:16:11 UTC

(In reply to comment #2)
> Can you provide backtraces of all GStreamer related threads? 

I'll do as soon as I get to the office.

> From these threads
> there should not be a deadlock as none of them hold that mutex while waiting
> for the GCond. There must be one thread that is currently blocking this mutex
> and not releasing it because it waits for something else to finish, namely the
> allocation query or the pad linking that caused the reconfigure event.

It might not be this, but isn't gst_queue_handle_sink_query holding the mutex and waiting on query_handled (thread 2)? The task (queue_loop) seems to start when the reconfigure event is received, but this can't happen because gst_queue_handle_sink_query has the mutex (thread 4).

Comment 4 Aleix Conchillo Flaqué 2013-06-18 16:55:27 UTC

(In reply to comment #2)
> Can you provide backtraces of all GStreamer related threads?

Just rechecked and the backtrace I originally sent contained all the GStreamer related threads.

Comment 5 Aleix Conchillo Flaqué 2013-06-18 17:22:39 UTC

I cut some parts before, here is the full bt.

(gdb) thread apply all bt

+ Trace 232094

Thread 4 (Thread 0x7f3a73004700 (LWP 26202))

#0 __lll_lock_wait
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 132
#1 _L_lock_1006
from /lib/x86_64-linux-gnu/libpthread.so.0
#2 __pthread_mutex_lock
at pthread_mutex_lock.c line 101
#3 g_mutex_lock
from /opt/oblong/deps/lib/libglib-2.0.so.0
#4 gst_queue_handle_src_event
at gstqueue.c line 1288
#5 gst_pad_send_event_unchecked
at gstpad.c line 4996
#6 gst_pad_push_event_unchecked
at gstpad.c line 4685
#7 gst_pad_push_event
at gstpad.c line 4808
#8 event_forward_func
at gstpad.c line 2741
#9 gst_pad_forward
at gstpad.c line 2695
#10 gst_pad_event_default
at gstpad.c line 2792
#11 gst_pad_send_event_unchecked
at gstpad.c line 4996
#12 gst_pad_push_event_unchecked
at gstpad.c line 4685
#13 gst_pad_push_event
at gstpad.c line 4808
#14 event_forward_func
at gstpad.c line 2741
#15 gst_pad_forward
at gstpad.c line 2695
#16 gst_pad_event_default
at gstpad.c line 2792
#17 gst_pad_send_event_unchecked
at gstpad.c line 4996
#18 gst_pad_push_event_unchecked
at gstpad.c line 4685
#19 gst_pad_push_event
at gstpad.c line 4808
#20 gst_type_find_element_src_event
at gsttypefindelement.c line 519
#21 gst_pad_send_event_unchecked
at gstpad.c line 4996
#22 gst_pad_push_event_unchecked
at gstpad.c line 4685
#23 gst_pad_push_event
at gstpad.c line 4808
#24 event_forward_func
at gstpad.c line 2741
#25 gst_pad_forward
at gstpad.c line 2695
#26 gst_pad_event_default
at gstpad.c line 2792
#27 gst_pad_send_event_unchecked
at gstpad.c line 4996
#28 gst_pad_send_event
at gstpad.c line 5154
#29 gst_pad_link_full
at gstpad.c line 2289
#30 XXXXXXXX
#31 new_decoded_pad_trambampoline
#32 g_cclosure_marshal_VOID__OBJECTv
from /opt/oblong/deps/lib/libgobject-2.0.so.0
#33 ??
from /opt/oblong/deps/lib/libgobject-2.0.so.0
#34 g_signal_emit_valist
from /opt/oblong/deps/lib/libgobject-2.0.so.0
#35 g_signal_emit
from /opt/oblong/deps/lib/libgobject-2.0.so.0
#36 gst_element_add_pad
at gstelement.c line 697
#37 gst_decode_bin_expose
at gstdecodebin2.c line 3957
#38 source_pad_blocked_cb
at gstdecodebin2.c line 4129
#39 probe_hook_marshal
at gstpad.c line 3061
#40 g_hook_list_marshal
from /opt/oblong/deps/lib/libglib-2.0.so.0
#41 do_probe_callbacks
at gstpad.c line 3155
#42 gst_pad_push_event_unchecked
at gstpad.c line 4652
#43 push_sticky
at gstpad.c line 3324
#44 events_foreach
at gstpad.c line 530
#45 check_sticky
at gstpad.c line 3380
#46 gst_pad_peer_query
at gstpad.c line 3581
#47 query_forward_func
at gstpad.c line 2931
#48 gst_pad_forward
at gstpad.c line 2695
#49 gst_pad_query_default
at gstpad.c line 2995
#50 gst_pad_query
at gstpad.c line 3465
#51 gst_pad_peer_query
at gstpad.c line 3596
#52 query_forward_func
at gstpad.c line 2931
#53 gst_pad_forward
at gstpad.c line 2695
#54 gst_pad_query_default
at gstpad.c line 2995
#55 gst_pad_query
at gstpad.c line 3465
#56 gst_pad_peer_query
at gstpad.c line 3596
#57 query_forward_func
at gstpad.c line 2931
#58 gst_pad_forward
at gstpad.c line 2695
#59 gst_pad_query_default
at gstpad.c line 2995
#60 gst_pad_query
at gstpad.c line 3465
#61 gst_pad_peer_query
at gstpad.c line 3596
#62 gst_queue_push_one
at gstqueue.c line 1184
#63 gst_queue_loop
at gstqueue.c line 1238
#64 gst_task_func
at gsttask.c line 316
#65 ??
from /opt/oblong/deps/lib/libglib-2.0.so.0
#66 ??
from /opt/oblong/deps/lib/libglib-2.0.so.0
#67 ??
from /usr/lib/libGL.so.1
#68 start_thread
at pthread_create.c line 308
#69 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
#70 ??

Comment 6 Aleix Conchillo Flaqué 2013-06-18 17:34:24 UTC

If I am not wrong the problem is that we are in gst_queue_push_one (#62) that is called from gst_queue_loop (#63) which is holding a mutex. And we end up in gst_queue_handle_src_event (#4) which takes the same mutex.

Comment 7 Aleix Conchillo Flaqué 2013-06-18 19:18:17 UTC

A g_mutex_trylock in gst_queue_handle_src_event seems to solve the issue. But I don't really know this code, so it might have side effects.

Everything works, though.

----------------

diff --git a/plugins/elements/gstqueue.c b/plugins/elements/gstqueue.c
index 765005d..f9bf545 100644
--- a/plugins/elements/gstqueue.c
+++ b/plugins/elements/gstqueue.c
@@ -138,6 +138,8 @@ enum
     goto label;                                                         \
 } G_STMT_END
 
+#define GST_QUEUE_MUTEX_TRYLOCK(q) g_mutex_trylock (&q->qlock)
+
 #define GST_QUEUE_MUTEX_UNLOCK(q) G_STMT_START {                        \
   g_mutex_unlock (&q->qlock);                                            \
 } G_STMT_END
@@ -1283,18 +1285,20 @@ gst_queue_handle_src_event (GstPad * pad, GstObject * parent, GstEvent * event)
 #endif
 
   switch (GST_EVENT_TYPE (event)) {
-    case GST_EVENT_RECONFIGURE:
-      GST_QUEUE_MUTEX_LOCK (queue);
+    case GST_EVENT_RECONFIGURE: {
+      gboolean locked = GST_QUEUE_MUTEX_TRYLOCK (queue);
       if (queue->srcresult == GST_FLOW_NOT_LINKED) {
         /* when we got not linked, assume downstream is linked again now and we
          * can try to start pushing again */
         queue->srcresult = GST_FLOW_OK;
         gst_pad_start_task (pad, (GstTaskFunction) gst_queue_loop, pad, NULL);
       }
-      GST_QUEUE_MUTEX_UNLOCK (queue);
+      if (locked)
+        GST_QUEUE_MUTEX_UNLOCK (queue);
 
       res = gst_pad_push_event (queue->sinkpad, event);
       break;
+    }
     default:
       res = gst_pad_event_default (pad, parent, event);
       break;

Comment 8 Sebastian Dröge (slomo) 2013-06-19 08:54:33 UTC

This should fix it, please test :) Thanks for your analysis

commit 2f8e572887219759cedd3d1cf9a044a6e083c8c7
Author: Sebastian Dröge <slomo@circular-chaos.org>
Date:   Wed Jun 19 10:53:21 2013 +0200

    queue: Don't hold the queue mutex while doing serialized queries downstream
    
    https://bugzilla.gnome.org/show_bug.cgi?id=702520

Comment 9 Aleix Conchillo Flaqué 2013-06-19 18:09:43 UTC

Yes, that fixed it. Thanks!