GNOME Bugzilla – Bug 755400
splitmuxsink: deadlocks when inserted into running pipeline
Last modified: 2018-11-03 15:04:32 UTC
Created attachment 311839 [details] test case Inserting splitmuxsink into a running pipeline causes a deadlock for the configuration described here. The use case is to have a live source of video+audio running continuously, with the possibility of starting a recording using splitmuxsink. A test case is provided as an attachment. The initial pipeline is as follows: audiotestsrc is-live=true ! faac ! aacparse ! audio/mpeg,stream-format=raw ! queue ! fakesink videotestsrc is-live=true ! x264enc key-int-max=10 ! h264parse ! video/x-h264,alignment=au,stream-format=avc ! queue ! fakesink After 10s blocking pad probes are installed on the src pads of the two queues. The fakesinks are removed from the pipeline, and a new splitmuxsink is created and inserted. The state of the splitmuxsink is set to PLAYING, and the block probes are removed. A few buffers are received on the sinkpad of the multiqueue inside the splitmuxsink, but then dataflow stops. Using GDB, I can see that the streaming thread of the audio queue srcpad is waiting in gstsplitmuxsink.c:1071 GST_LOG_OBJECT (pad, "Sleeping for GOP start"); ------> GST_SPLITMUX_WAIT (splitmux); GST_LOG_OBJECT (pad, "Done sleeping for GOP start state now %d", splitmux->state); The thread for the video src pad is waiting in gstmultiqueue.c:1938 with an allocation type query. 1925 GST_DEBUG_OBJECT (mq, 1926 "SingleQueue %d : Enqueuing query %p of type %s with id %d", 1927 sq->id, query, GST_QUERY_TYPE_NAME (query), curid); 1928 GST_MULTI_QUEUE_MUTEX_UNLOCK (mq); 1929 res = gst_data_queue_push (sq->queue, (GstDataQueueItem *) item); 1930 GST_MULTI_QUEUE_MUTEX_LOCK (mq); 1931 /* it might be that the query has been taken out of the queue 1932 * while we were unlocked. So, we need to check if the last 1933 * handled query is the same one than the one we just 1934 * pushed. If it is, we don't need to wait for the condition 1935 * variable, otherwise we wait for the condition variable to 1936 * be signaled. */ 1937 if (sq->last_handled_query != query) 1938 g_cond_wait (&sq->query_handled, &mq->qlock); For debugging purposes, I tried dropping any allocation queries on the srcpads of the queues. This changes the behaviour a bit. A few more buffers reaches the multiqueue, but dataflow still stalls. The audio streaming thread still stalls waiting for GOP complete. The video thread is waiting in gstdataqueue.c:520, which seems to indicate that the queue is full. Several GOPs are queued, but the output file is still empty. The tests have been done using 1.5.91. Pipeline works just fine if the splitmuxsink is used from the beginning instead of the fakesinks.
Created attachment 311840 [details] debug-log-1 GST_DEBUG=*sink*:6 log with allocation queries
Created attachment 311841 [details] debug-log-2 GST_DEBUG=*sink*:6 log without allocation queries
Took me a bit to realize, but the element you're setting to PLAYING is the containing pipeline, rather than the element. The pipeline is already in PLAYING so that's why the splitmuxsink bin never switches (blocks in preroll). Once this is found and changed, the pipeline works with your probe dropping the allocation queries. If they're not dropped, it still blocks, and that is not clear why to me.
What splitmuxsink does is rather obscure. The hang is due to splitmuxsink waiting for events such as EOS, flush, and misc more, while in the streaming thread (in response to the push of a buffer). However, this is the thread that will be pushing the allocation query that the upstream multiqueue is sending. I feel like this should be resolved by the wait ending in a "normal" way (ie, not an EOS nor flush), leading to the buffer push returning, and the (serialized) allocation query going next. Since taking out the allocation query makes it work, this implies this is not the case, however. BTW, the pad block code in the test case needs locking, as it is racy.
Created attachment 325811 [details] [review] deadlock fix I could not find a good fix. This is not due to non-keyframes being seen before the first keyframe (the most obvious difference when inserting the element in a running pipeline) as discarding those does not fix anything. In the end, discarding the allocation query in the multiqueue probe does help, even though not a great fix.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/224.