GNOME Bugzilla – Bug 745319
queue: can lock up the pipeline on serialized queries when downstream returns errors
Last modified: 2015-03-09 07:28:58 UTC
I have a simple pipeline like this: gst-launch-1.0 playbin video-sink=fakesink uri=file:///path/to/file I am using a custom decoder element which gets picked up by playbin. The decoder has a task running on its src pad to push decoded buffers downstream which fails due to caps negotiation failing. handle_frame then starts to return GST_FLOW_NOT_NEGOTIATED as well. The problem is vqueue locks up and the pipeline hangs. I traced it farther and the problem seems to be in gstqueue decoder tries to finish a decoded frame gst_queue_chain gets called with a buffer gst_queue_loop wakes up because a buffer got pushed to the queue gst_queue_push_one picks up the buffer and then returns a flow error _loop() goes to out_flushing which will pause the task but it will the custom decoder is still not aware of this error tries to finish a new frame but since caps negotiation has not succeeded then gstvideodecoder will try to negotiate and send an allocation query which will freeze waiting on gstqueue query_handled which will never get signaled because the task has been paused. The stack trace from gdb indicates that:
+ Trace 234731
$1 = {mini_object = {type = 21345144, refcount = 1, lockstate = 0, flags = 0, copy = 0xb6ed7c15 <_gst_query_copy>, dispose = 0x0, free = 0xb6ed7161 <_gst_query_free>, n_qdata = 0, qdata = 0x0}, type = GST_QUERY_ALLOCATION} I have been reading the code and the only thing I can think of is a race condition: gst_queue_handle_sink_query queue->srcresult is set to GST_FLOW_OK before g_cond_wait gets invoked (gstqueue.c:909 and 917) meanwhile gst_queue_loop updates queue->srcresult (line 1299) and the cond never gets signaled and we wait forever. I tried with a few other decoders and I am not sure why my custom decoder is causing this ;-)
Can you write a testcase for this? And maybe in queue (and probably queue2 and multiqueue) we should first check for the srcresult before waiting for the GCond... might this already fix your problem?
Looking at the code again, it already checks the srcresult before waiting in the macro. So in theory what you described should never happen.
Actually... please test this commit :) commit a941b4651ce769c87243c84050120c8d244588f6 Author: Sebastian Dröge <sebastian@centricular.com> Date: Tue Mar 3 12:48:34 2015 +0100 queue: Wake up the query function on errors from the loop function Otherwise we might wait forever for serialized queries to be handled as the loop function is stopped and as such we will never ever dequeue the query and handle it. https://bugzilla.gnome.org/show_bug.cgi?id=745319
This one is also relevant, same problem really: commit bc77a3fa0a610187d468c38ce83ceec2e9a79688 Author: Sebastian Dröge <sebastian@centricular.com> Date: Tue Mar 3 12:53:13 2015 +0100 queue2: Signal the sinkpad thread if a flow error happened It might still be waiting for a query to be handled, or the queue to become empty again for the next item. Also if downstream returns FLUSHING, flush the queue like we do in queue and multiqueue.
I tried creating a test case but miserably failed :( I also cannot reproduce it anymore. Apparently the fixes I put in the custom decoder fixed that issue somehow. I will try harder to reproduce it but it will need some more time.
Let's assume this is fixed then for now. Please reopen otherwise.