After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 783301 - basesrc: Should not hold LIVE_LOCK when calling GstBaseSrcClass::create
basesrc: Should not hold LIVE_LOCK when calling GstBaseSrcClass::create
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gstreamer (core)
1.8.2
Other Linux
: Normal normal
: 1.13.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-05-31 20:06 UTC by Nicolas Dufresne (ndufresne)
Modified: 2018-03-07 10:44 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Test that demonstrate the issue (2.95 KB, text/plain)
2017-05-31 20:06 UTC, Nicolas Dufresne (ndufresne)
  Details
New test, without a sleep and without aggregator (2.96 KB, text/plain)
2017-06-01 16:06 UTC, Nicolas Dufresne (ndufresne)
  Details
basesrc: Protect access to pool and allocator (2.70 KB, patch)
2017-06-01 16:07 UTC, Nicolas Dufresne (ndufresne)
committed Details | Review
basesrc: Don't hold LIVE_LOCK in create/alloc/fill (3.46 KB, patch)
2017-06-01 16:07 UTC, Nicolas Dufresne (ndufresne)
none Details | Review
basesrc: Don't hold LIVE_LOCK in create/alloc/fill (21.15 KB, patch)
2017-06-21 13:38 UTC, Nicolas Dufresne (ndufresne)
committed Details | Review

Description Nicolas Dufresne (ndufresne) 2017-05-31 20:06:24 UTC
Created attachment 352965 [details]
Test that demonstrate the issue

Inside the virtual method ::create() the live lock is held. As pushing caps may block on a queue, or on some downstream element doing allocation query toward a queue, this can lead to yet another appsrc deadlock.

I'm attaching an example application to demonstrate one way to trigger a deadlock. I do believe this could be reproduce with just one appsrc, two sinks and a queue, but the initial bug was reported to me with a similar setup to this test.
Comment 1 Sebastian Dröge (slomo) 2017-06-01 06:16:44 UTC
Where should it do that instead? AFAIK other basesrc subclasses also set caps from ::create(), e.g. if the format for the current buffer changes.
Comment 2 Nicolas Dufresne (ndufresne) 2017-06-01 13:11:09 UTC
(In reply to Sebastian Dröge (slomo) from comment #1)
> Where should it do that instead? 

It is safe when done from negotiate(), in the case it's called by the base_src_lopp() function. This is called when the reconfigure flag is set on the pad. When that happens, we don't yet hold the live lock. It's quite a limitation of the base class. I'll be fixing this today.

> AFAIK other basesrc subclasses also set
> caps from ::create(), e.g. if the format for the current buffer changes.

It would be nice to share which other live sources does that. This problem only affect live sources, as in non-live, the LIVE_LOCK is only taking back in the PAUSED->READY state, and the pipeline has been unlocked already at this point.
Comment 3 Nicolas Dufresne (ndufresne) 2017-06-01 16:06:32 UTC
Created attachment 353019 [details]
New test, without a sleep and without aggregator

This test also demonstrate the issue, but only with elements from -base. With hope I can turn this into a unit test. It also no longer rely on a sleep.
Comment 4 Nicolas Dufresne (ndufresne) 2017-06-01 16:07:24 UTC
Created attachment 353020 [details] [review]
basesrc: Protect access to pool and allocator

This was only partly protected by the object lock. Always take the
object lock to access the currently configured pool and allocator.
Comment 5 Nicolas Dufresne (ndufresne) 2017-06-01 16:07:28 UTC
Created attachment 353021 [details] [review]
basesrc: Don't hold LIVE_LOCK in create/alloc/fill

Holding this lock on live source prevents the source from changing
the caps in ::create() without risking a deadlock. Note that this
makes the public function gst_base_src_wait_playing() useless as we
will wait after create, when we take back the lock.
Comment 6 Nicolas Dufresne (ndufresne) 2017-06-01 16:14:00 UTC
I took in consideration that decklinkvideosrc is a live source also set the caps inside create() call, and decided to drop the live lock from create/alloc/fill virtual method in basesrc. Overall I'm quite happy with the results, as it's better to not hold any private locks while calling virtual functions. It also remove some limitations that were not documented in basesrc base class. As a side effect though, we should probably deprecate gst_base_src_wait_playing() and remove that call from the audio base class. It's insane to wait in the middle of capturing a buffer, as this endup hiding a time gap. BaseSrc will now wait if needed after create(), which will let the sync code in gst figure-out if the buffer should be dropped or not.
Comment 7 Nicolas Dufresne (ndufresne) 2017-06-14 19:24:59 UTC
Review of attachment 353021 [details] [review]:

That will need a new version. I missed that basesrc was relying on the LIVE_LOCK to determin that the create() virtual have returned. For the gst_basesrc_wait_playing(), that worked, because the function is checking the basesrc flushing state before returning.
Comment 8 Nicolas Dufresne (ndufresne) 2017-06-21 13:38:54 UTC
Created attachment 354154 [details] [review]
basesrc: Don't hold LIVE_LOCK in create/alloc/fill

Holding this lock on live source prevents the source from changing
the caps in ::create() without risking a deadlock. This has consequences
as the LIVE_LOCK was replacing the STREAM_LOCK in many situation. As a
side effect:

- We no longer need to unlock when doing play/pause as the LIVE_LOCK
  isn't held. We then let the create() call finish, but will block if
  the state have changed meanwhile. This has the benefit that
  wait_preroll() calls in subclass is no longer needed.
- We no longer need to change the state to unlock, simplifying the
  set_flushing() interface
- We need different handling for EOS depending if we are in push or pull
  mode.

This patch also document the locking of each private class member and
the locking order.
Comment 9 Nicolas Dufresne (ndufresne) 2017-06-21 18:23:17 UTC
Nothing is simple, this patch demonstrate it. This version now passes gst-validate, I did 10 runs to be sure, I also ran all unit tests I could find. It's quite possible that more reduction can be done (it's most likely possible to replace the LIVE_LOCK with the object lock now, would need someone to try). But this already simplify the base class.
Comment 10 Olivier Crête 2017-06-28 22:30:52 UTC
Review of attachment 353020 [details] [review]:

Looks good
Comment 11 Olivier Crête 2017-06-28 22:46:29 UTC
Review of attachment 354154 [details] [review]:

Looks good!

::: libs/gst/base/gstbasesrc.c
@@ +255,2 @@
   /* QoS *//* with LOCK */
+  gboolean qos_enabled;         /* unused */

Should just remove qos_enabled as it's a private structure anyway.
Comment 12 Nicolas Dufresne (ndufresne) 2017-06-29 14:38:32 UTC
(In reply to Olivier Crête from comment #11)
> Review of attachment 354154 [details] [review] [review]:
> 
> Looks good!
> 
> ::: libs/gst/base/gstbasesrc.c
> @@ +255,2 @@
>    /* QoS *//* with LOCK */
> +  gboolean qos_enabled;         /* unused */
> 
> Should just remove qos_enabled as it's a private structure anyway.

Ah, thanks, I wanted to put that in a seperate patch, but forgot.
Comment 13 Nicolas Dufresne (ndufresne) 2017-06-29 14:56:31 UTC
Attachment 353020 [details] pushed as 946622e - basesrc: Protect access to pool and allocator
Attachment 354154 [details] pushed as 523de1a - basesrc: Don't hold LIVE_LOCK in create/alloc/fill
Comment 14 Nicolas Dufresne (ndufresne) 2017-07-04 01:13:12 UTC
I've simplified further my test, and pushed a unit test.

Author: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Date:   Mon Jul 3 21:08:02 2017 -0400

    test-appsrc: Test state when blocked in caps Event
    
    In GStreamer 1.12 and older, the GstBaseSrc live lock used to be held while
    create() virtual function was called. As appsrc pushes serialized event in
    that virtual function, we ended up with some deadlock while setting the
    state to NULL. This test simulates this situation.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=783301