GNOME Bugzilla – Bug 702520
queue: deadlock when reconfigure event
Last modified: 2013-06-19 18:09:43 UTC
I just switched from 1.0.7 to 1.1.1 and a working program got stuck. It seems something (mutex lock) introduced in this commit: http://cgit.freedesktop.org/gstreamer/gstreamer/commit/plugins/elements/gstqueue.c?id=c955ddc712f3b4de9ef5d822b95a6f4bd9985eb3 I still don't understand why this happens, I'll keep on checking. This is the backtrace with the affected threads:
+ Trace 232085
Thread 4 (Thread 0x7f3a73004700 (LWP 26202))
(In reply to comment #0) > > This is the backtrace with the affected threads: > Sorry, only thread 4 and 2 from the bt apply to the same queue.
Can you provide backtraces of all GStreamer related threads? From these threads there should not be a deadlock as none of them hold that mutex while waiting for the GCond. There must be one thread that is currently blocking this mutex and not releasing it because it waits for something else to finish, namely the allocation query or the pad linking that caused the reconfigure event.
(In reply to comment #2) > Can you provide backtraces of all GStreamer related threads? I'll do as soon as I get to the office. > From these threads > there should not be a deadlock as none of them hold that mutex while waiting > for the GCond. There must be one thread that is currently blocking this mutex > and not releasing it because it waits for something else to finish, namely the > allocation query or the pad linking that caused the reconfigure event. It might not be this, but isn't gst_queue_handle_sink_query holding the mutex and waiting on query_handled (thread 2)? The task (queue_loop) seems to start when the reconfigure event is received, but this can't happen because gst_queue_handle_sink_query has the mutex (thread 4).
(In reply to comment #2) > Can you provide backtraces of all GStreamer related threads? Just rechecked and the backtrace I originally sent contained all the GStreamer related threads.
I cut some parts before, here is the full bt. (gdb) thread apply all bt
+ Trace 232094
If I am not wrong the problem is that we are in gst_queue_push_one (#62) that is called from gst_queue_loop (#63) which is holding a mutex. And we end up in gst_queue_handle_src_event (#4) which takes the same mutex.
A g_mutex_trylock in gst_queue_handle_src_event seems to solve the issue. But I don't really know this code, so it might have side effects. Everything works, though. ---------------- diff --git a/plugins/elements/gstqueue.c b/plugins/elements/gstqueue.c index 765005d..f9bf545 100644 --- a/plugins/elements/gstqueue.c +++ b/plugins/elements/gstqueue.c @@ -138,6 +138,8 @@ enum goto label; \ } G_STMT_END +#define GST_QUEUE_MUTEX_TRYLOCK(q) g_mutex_trylock (&q->qlock) + #define GST_QUEUE_MUTEX_UNLOCK(q) G_STMT_START { \ g_mutex_unlock (&q->qlock); \ } G_STMT_END @@ -1283,18 +1285,20 @@ gst_queue_handle_src_event (GstPad * pad, GstObject * parent, GstEvent * event) #endif switch (GST_EVENT_TYPE (event)) { - case GST_EVENT_RECONFIGURE: - GST_QUEUE_MUTEX_LOCK (queue); + case GST_EVENT_RECONFIGURE: { + gboolean locked = GST_QUEUE_MUTEX_TRYLOCK (queue); if (queue->srcresult == GST_FLOW_NOT_LINKED) { /* when we got not linked, assume downstream is linked again now and we * can try to start pushing again */ queue->srcresult = GST_FLOW_OK; gst_pad_start_task (pad, (GstTaskFunction) gst_queue_loop, pad, NULL); } - GST_QUEUE_MUTEX_UNLOCK (queue); + if (locked) + GST_QUEUE_MUTEX_UNLOCK (queue); res = gst_pad_push_event (queue->sinkpad, event); break; + } default: res = gst_pad_event_default (pad, parent, event); break;
This should fix it, please test :) Thanks for your analysis commit 2f8e572887219759cedd3d1cf9a044a6e083c8c7 Author: Sebastian Dröge <slomo@circular-chaos.org> Date: Wed Jun 19 10:53:21 2013 +0200 queue: Don't hold the queue mutex while doing serialized queries downstream https://bugzilla.gnome.org/show_bug.cgi?id=702520
Yes, that fixed it. Thanks!