GNOME Bugzilla – Bug 764939
basesink: deadlock caused by klass->wait_event() which leaves preroll mutex locked
Last modified: 2016-04-15 12:18:05 UTC
I'm seeing a deadlock in certain cases when after calling klass->wait_event() preroll mutex remains locked. This is happening because gst_base_sink_default_wait_event() calls to gst_base_sink_do_sync() which requires preroll lock to be taken before. Sadly gst_base_sink_default_wait_event() doesn't do that and eventually GST_BASE_SINK_PREROLL_WAIT() is called without preroll mutex being taken. After the wait finishes corresponding preroll mutex remains locked causing a deadlock later in the execution (state change in my case). A fix for that doesn't seem to be so obvious as there are other functions (like gst_base_sink_event()) which end up calling gst_base_sink_do_sync() with preroll lock taken. Is this a genuine issue or gstbasesink usage error on my part?
From where is default_wait_event() called in your case? Do you call it, or basesink?
I'm calling klass->wait_event() (GstBaseSink::event) in my event handler.
You shouldn't do that (wait_event is supposed to be called by the base class), but also the way how it's used in the base class is wrong: gst_base_sink_default_wait_event() calls do_sync(), which must be called with the STREAM_LOCK (it is in all cases) and the PREROLL_LOCK (it is not in all cases).
Created attachment 325792 [details] [review] basesink: Take PREROLL_LOCK in wait_event() It is calling do_sync(), which requires the STREAM_LOCK and PREROLL_LOCK to be taken. The STREAM_LOCK is already taken in all callers, the PREROLL_LOCK not.
Attachment 325792 [details] pushed as 828a462 - basesink: Take PREROLL_LOCK in wait_event()
Why exactly do you want to call wait_event() btw? What's your use case?
Actually nevermind: the lock was already taken in gst_base_sink_event() for all relevant events. Reverted that one again commit 00e4499b15b5d1b4c241349c246ba40b4253a39c Author: Sebastian Dröge <sebastian@centricular.com> Date: Tue Apr 12 15:17:36 2016 +0300 Revert "basesink: Take PREROLL_LOCK in wait_event()" This reverts commit 828a4627db0cb6a6706b96d9be97e5e5c7d22215. The lock was already taken elsewhere, in gst_base_sink_event().
I'm getting fifo empty callback from the video decoder and want to send EOS upstream. So in reality, my case is more complex then I've described. Here's the callstack:
+ Trace 236176
So its my funsink calling GST_BASE_SINK_CLASS( gst_nexus_sink_parent_class )->event( &sink->parent, gst_event_new_eos() ); from gst_fun_sink_on_error(), this results in gst_base_sink_default_event() being called, as seen in callstack. I want to send an EOS upstream, and I have a terrible feeling I'm doing it wrong :)
EOS should go downstream, not upstream. You could probably use gst_pad_send_event() with the EOS event on the basesink sinkpad here. There's probably a better solution but I don't completely understand your situation yet.
A downstream element can return GST_FLOW_EOS to signal upstream elements that they shoul EOS (but this would be aggregated if there are multiple branches). Not sure if that's what you want here though, I'm not sure if I understand the exact context.
When GST_EVENT_EOS arrives, we want to let the hardware output drain, so that EOS is not signalled before the output is empty. In the class_init function of our element we override the wait_event virtual method: gstbasesink_class->wait_event = GST_DEBUG_FUNCPTR (gst_fun_sink_wait_event); The gst_fun_sink_wait_event() function calls the base sink wait_event: static GstFlowReturn gst_fun_sink_wait_event(GstBaseSink *basesink, GstEvent *event) { GstFlowReturn ret; GstFunSink* sink = GST_FUN_SINK( basesink ); switch( GST_EVENT_TYPE( event ) ) { case GST_EVENT_EOS: GST_DEBUG_OBJECT (sink, "End of stream"); gst_fun_sink_wait_until_empty (sink); GST_DEBUG_OBJECT (sink, "End of stream - done"); break; default: break; } ret = GST_BASE_SINK_CLASS( gst_fun_sink_parent_class )->wait_event( basesink, event ); return ret; }
But for all serialized events your wait_event() is going to be called with STREAM_LOCK and PREROLL_LOCK. See basesink's event function.
I think the problem is that in our funsink we do not call straight to event handler in the base class. For the reason explained above, we want the decoder to drain first and only then we call base class ->event() with EOS.
But doesn't gst_fun_sink_wait_until_empty() going to wait there, and then from the same thread forwarding the EOS event later?
In our use case, which is HLS, if we called basesink event() straight from our funsink_event() then we wouldn't be able to seek or pause for the remaining few seconds of the stream. This is why we've deferred the call to the base class. When that eventually happens gst_fun_sink_wait_until_empty() will return immediatly becasuse it's already called from decoder fifo empty handler. We shouldn't be calling gst_fun_sink_wait_until_empty() in the example above because it has no effect.
Well, you have to send your EOS event to the pad then so that the normal event handling takes place. Otherwise you have no way of getting basesink do the right thing with all the locks :)
It seems to work fine now. Many thanks for clarifying our incorrect usage.