After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 764939 - basesink: deadlock caused by klass->wait_event() which leaves preroll mutex locked
basesink: deadlock caused by klass->wait_event() which leaves preroll mutex l...
Status: RESOLVED NOTABUG
Product: GStreamer
Classification: Platform
Component: gstreamer (core)
git master
Other Linux
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2016-04-12 10:35 UTC by mariuszb
Modified: 2016-04-15 12:18 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
basesink: Take PREROLL_LOCK in wait_event() (1.06 KB, patch)
2016-04-12 12:12 UTC, Sebastian Dröge (slomo)
rejected Details | Review

Description mariuszb 2016-04-12 10:35:10 UTC
I'm seeing a deadlock in certain cases when after calling klass->wait_event() preroll mutex remains locked. This is happening because gst_base_sink_default_wait_event() calls to gst_base_sink_do_sync() which requires preroll lock to be taken before. Sadly gst_base_sink_default_wait_event() doesn't do that and eventually GST_BASE_SINK_PREROLL_WAIT() is called without preroll mutex being taken. After the wait finishes corresponding preroll mutex remains locked causing a deadlock later in the execution (state change in my case). 

A fix for that doesn't seem to be so obvious as there are other functions (like gst_base_sink_event()) which end up calling gst_base_sink_do_sync() with preroll lock taken.

Is this a genuine issue or gstbasesink usage error on my part?
Comment 1 Sebastian Dröge (slomo) 2016-04-12 11:11:27 UTC
From where is default_wait_event() called in your case? Do you call it, or basesink?
Comment 2 mariuszb 2016-04-12 11:39:35 UTC
I'm calling klass->wait_event() (GstBaseSink::event) in my event handler.
Comment 3 Sebastian Dröge (slomo) 2016-04-12 12:12:24 UTC
You shouldn't do that (wait_event is supposed to be called by the base class), but also the way how it's used in the base class is wrong: gst_base_sink_default_wait_event() calls do_sync(), which must be called with the STREAM_LOCK (it is in all cases) and the PREROLL_LOCK (it is not in all cases).
Comment 4 Sebastian Dröge (slomo) 2016-04-12 12:12:32 UTC
Created attachment 325792 [details] [review]
basesink: Take PREROLL_LOCK in wait_event()

It is calling do_sync(), which requires the STREAM_LOCK and PREROLL_LOCK to be
taken. The STREAM_LOCK is already taken in all callers, the PREROLL_LOCK not.
Comment 5 Sebastian Dröge (slomo) 2016-04-12 12:12:47 UTC
Attachment 325792 [details] pushed as 828a462 - basesink: Take PREROLL_LOCK in wait_event()
Comment 6 Sebastian Dröge (slomo) 2016-04-12 12:13:54 UTC
Why exactly do you want to call wait_event() btw? What's your use case?
Comment 7 Sebastian Dröge (slomo) 2016-04-12 12:18:48 UTC
Actually nevermind: the lock was already taken in gst_base_sink_event() for all relevant events. Reverted that one again

commit 00e4499b15b5d1b4c241349c246ba40b4253a39c
Author: Sebastian Dröge <sebastian@centricular.com>
Date:   Tue Apr 12 15:17:36 2016 +0300

    Revert "basesink: Take PREROLL_LOCK in wait_event()"
    
    This reverts commit 828a4627db0cb6a6706b96d9be97e5e5c7d22215.
    
    The lock was already taken elsewhere, in gst_base_sink_event().
Comment 8 mariuszb 2016-04-12 12:42:16 UTC
I'm getting fifo empty callback from the video decoder and want to send EOS upstream. So in reality, my case is more complex then I've described. Here's the callstack:

  • #0 gst_base_sink_wait_preroll
    at gstbasesink.c line 2209
  • #1 gst_base_sink_do_preroll
    at gstbasesink.c line 2303
  • #2 gst_base_sink_do_sync
    at gstbasesink.c line 2505
  • #3 gst_base_sink_default_wait_event
    at gstbasesink.c line 3025
  • #4 gst_fun_sink_wait_event_locked
  • #5 gst_fun_sink_wait_event
  • #6 gst_base_sink_wait_event
    at gstbasesink.c line 3040
  • #7 gst_base_sink_default_event
    at gstbasesink.c line 3082
  • #8 gst_fun_sink_on_error

So its my funsink calling 

GST_BASE_SINK_CLASS( gst_nexus_sink_parent_class )->event( &sink->parent, gst_event_new_eos() );

from gst_fun_sink_on_error(), this results in gst_base_sink_default_event() being called, as seen in callstack.

I want to send an EOS upstream, and I have a terrible feeling I'm doing it wrong :)
Comment 9 Sebastian Dröge (slomo) 2016-04-12 12:50:00 UTC
EOS should go downstream, not upstream. You could probably use gst_pad_send_event() with the EOS event on the basesink sinkpad here. There's probably a better solution but I don't completely understand your situation yet.
Comment 10 Tim-Philipp Müller 2016-04-12 12:58:14 UTC
A downstream element can return GST_FLOW_EOS to signal upstream elements that they shoul EOS (but this would be aggregated if there are multiple branches). Not sure if that's what you want here though, I'm not sure if I understand the exact context.
Comment 11 A Ashley 2016-04-12 12:59:05 UTC
When GST_EVENT_EOS arrives, we want to let the hardware output drain, so that EOS is not signalled before the output is empty.

In the class_init function of our element we override the wait_event virtual method:
 gstbasesink_class->wait_event = GST_DEBUG_FUNCPTR (gst_fun_sink_wait_event);


The gst_fun_sink_wait_event() function calls the base sink wait_event:

static GstFlowReturn
gst_fun_sink_wait_event(GstBaseSink *basesink, GstEvent *event)
{
  GstFlowReturn ret;
  GstFunSink* sink = GST_FUN_SINK( basesink );

  switch( GST_EVENT_TYPE( event ) ) {
  case GST_EVENT_EOS:
       GST_DEBUG_OBJECT (sink, "End of stream");
       gst_fun_sink_wait_until_empty (sink);
       GST_DEBUG_OBJECT (sink, "End of stream - done");
      break;
  default:
       break;
  }
  ret = GST_BASE_SINK_CLASS( gst_fun_sink_parent_class )->wait_event( basesink, event );
  return ret;
}
Comment 12 Sebastian Dröge (slomo) 2016-04-12 13:00:32 UTC
But for all serialized events your wait_event() is going to be called with STREAM_LOCK and PREROLL_LOCK. See basesink's event function.
Comment 13 mariuszb 2016-04-12 13:41:34 UTC
I think the problem is that in our funsink we do not call straight to event handler in the base class. For the reason explained above, we want the decoder to drain first and only then we call base class ->event() with EOS.
Comment 14 Sebastian Dröge (slomo) 2016-04-12 14:29:21 UTC
But doesn't gst_fun_sink_wait_until_empty() going to wait there, and then from the same thread forwarding the EOS event later?
Comment 15 mariuszb 2016-04-12 15:41:22 UTC
In our use case, which is HLS, if we called basesink event() straight from our funsink_event() then we wouldn't be able to seek or pause for the remaining few seconds of the stream. This is why we've deferred the call to the base class.
When that eventually happens gst_fun_sink_wait_until_empty() will return immediatly becasuse it's already called from decoder fifo empty handler.

We shouldn't be calling gst_fun_sink_wait_until_empty() in the example above because it has no effect.
Comment 16 Sebastian Dröge (slomo) 2016-04-13 06:49:51 UTC
Well, you have to send your EOS event to the pad then so that the normal event handling takes place. Otherwise you have no way of getting basesink do the right thing with all the locks :)
Comment 17 mariuszb 2016-04-15 12:18:05 UTC
It seems to work fine now. Many thanks for clarifying our incorrect usage.