After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 766064 - basesink: deadlock waiting prerolling while pipeline goes to NULL state
basesink: deadlock waiting prerolling while pipeline goes to NULL state
Status: RESOLVED NOTABUG
Product: GStreamer
Classification: Platform
Component: gstreamer (core)
1.8.2
Other Linux
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2016-05-06 14:56 UTC by Miguel París Díaz
Modified: 2016-06-23 15:06 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
GDB info about the state of the BaseSink (2.01 KB, text/plain)
2016-05-06 14:57 UTC, Miguel París Díaz
  Details
GDB trace where the streaming_thread is blocked (2.15 KB, text/plain)
2016-05-06 14:58 UTC, Miguel París Díaz
  Details
GDB trace where the app_thread is blocked (2.25 KB, text/plain)
2016-05-06 14:58 UTC, Miguel París Díaz
  Details
Test that reproduces the deadlock (1.37 KB, text/plain)
2016-05-27 12:22 UTC, Miguel París Díaz
  Details
GDB info threads of the deadlock test (5.72 KB, text/plain)
2016-05-27 12:27 UTC, Miguel París Díaz
  Details
Test that reproduces the deadlock (changing the whole pipeline state) (1.67 KB, text/plain)
2016-06-07 16:51 UTC, Miguel París Díaz
  Details
bin: set SINK flag for bin parents when a sink element is added (1.24 KB, patch)
2016-06-07 16:53 UTC, Miguel París Díaz
none Details | Review
bin: set SINK flag for bin parents when a sink element is added (1.38 KB, patch)
2016-06-07 20:35 UTC, Miguel París Díaz
none Details | Review
utils: fix memory leak in find_common_root (896 bytes, patch)
2016-06-08 12:40 UTC, Miguel París Díaz
rejected Details | Review
bin: set SINK flag for bin parents when a sink element is added (1.67 KB, patch)
2016-06-08 14:11 UTC, Miguel París Díaz
reviewed Details | Review
Simple program to force sticky event misordeing in tee src's peer pad (3.17 KB, text/plain)
2016-06-23 14:55 UTC, Miguel París Díaz
  Details
pad: simple patch to force race conditions (1.35 KB, patch)
2016-06-23 14:56 UTC, Miguel París Díaz
rejected Details | Review

Description Miguel París Díaz 2016-05-06 14:56:39 UTC
Hello,
there is a deadlock when a pipeline goes to NULL state and BaseSink is waiting for prerolling.

There are 2 threads:
 streaming_thread: managed by the task of the SrcPad of a BaseSrc (in our case a NiceSrc)
 app_thread: performs the state change of the pipeline to NULL (gst_element_set_state (pipeline, GST_STATE_NULL);)

The deadlock takes places with this race condition:
  1 - [app_thread] when the pipeline is going to NULL the BaseSink (in our case a FakeSink) is an state which it needs to do prerolling (see [gdb-fakesink_state.txt] where you can see the state)
  2 - [streaming_thread] A buffer arrives to the BaseSink
    2.1 - This causes preroll_wait [1] (see [gdb-wait_preroll.txt])
    2.2 - Here, the streaming_thread is blocked and so all upstream pads [2]
  3- [app_thread] tries to change the state of one of the pads blocked in step (2)
    3.1 - it is blocked in [3] why the mutex is locked by [streaming_thread] (ver [gdb-activate_mode_internal.txt])

Refs
[1]https://github.com/Kurento/gstreamer/blob/0fb3a083ce04551fbdba7a94f0ac5612515bda67/libs/gst/base/gstbasesink.c#L2212
[2] https://github.com/Kurento/gstreamer/blob/0fb3a083ce04551fbdba7a94f0ac5612515bda67/gst/gstpad.c#L4125
[3] https://github.com/Kurento/gstreamer/blob/0fb3a083ce04551fbdba7a94f0ac5612515bda67/gst/gstpad.c#L1012
Comment 1 Miguel París Díaz 2016-05-06 14:57:27 UTC
Created attachment 327396 [details]
GDB info about the state of the BaseSink
Comment 2 Miguel París Díaz 2016-05-06 14:58:05 UTC
Created attachment 327397 [details]
GDB trace where the streaming_thread is blocked
Comment 3 Miguel París Díaz 2016-05-06 14:58:27 UTC
Created attachment 327398 [details]
GDB trace where the app_thread is blocked
Comment 4 Nicolas Dufresne (ndufresne) 2016-05-06 15:44:24 UTC
Did you implement unlock() virtual method properly ?
Comment 5 Miguel París Díaz 2016-05-08 12:59:26 UTC
Hello @stomer,
I don't see that FakeSink implements the unlock() virtual method [1]
Should FakeSink implement this method?
If yes, in which way?

Refs
[1] https://github.com/Kurento/gstreamer/blob/0fb3a083ce04551fbdba7a94f0ac5612515bda67/plugins/elements/gstfakesink.c
Comment 6 Tim-Philipp Müller 2016-05-08 13:14:21 UTC
Could you distill this into a little unit test (in tests/check/libs/basesink.c) that demonstrates the problem?
Comment 7 Nicolas Dufresne (ndufresne) 2016-05-08 13:30:17 UTC
Fakesink only blocks within basesink. I cannot reproduce the issue with your instructions. Can you provide a test that reproduce this (like Tim also propose) ? When looking at your trace, your deadlock occures while deactivating a ghostpad, which indicate that you may be doing a complex pipeline manipulation, in which case, is it likely that your the issue come from something else, which may or may not be related to the base class.

In general, it's better to report the bugs rather then trying to report an explanation for that bug, unless you have a patch and trust the solution. For backtraces, you should also provide a complete backtrace instead of only providing the parts that you think means something.
Comment 8 Miguel París Díaz 2016-05-27 12:22:59 UTC
Created attachment 328634 [details]
Test that reproduces the deadlock
Comment 9 Miguel París Díaz 2016-05-27 12:27:21 UTC
Created attachment 328635 [details]
GDB info threads of the deadlock test
Comment 10 Miguel París Díaz 2016-06-06 08:40:53 UTC
Any idea about how we can deal with this?
Comment 11 Sebastian Dröge (slomo) 2016-06-06 09:28:15 UTC
The state change should've first shut down the fakesink, then deactivated the fakesink sink pad, and only then went on to deactivate the ghostpad. The first steps would've unblocked wait_preroll() (and returned FLUSHING) so that the stream lock is unlocked all the way upstream.

It's expected that you will deadlock if you just shut down elements upstream while downstream their streaming threads are waiting for something. Something first has to unlock downstream, or you have to wait until downstream is finished waiting for whatever it is waiting for.
Comment 12 Miguel París Díaz 2016-06-06 20:41:06 UTC
@slomo, I agree with your explanation and this is what does NOT happen when gst_element_set_state (pipeline, GST_STATE_NULL). Because of that the deadlock ;(.
The test is a simple way of simulate the situation.

If I am not wrong, the root of the problem is that when the pipeline is set to NULL:
 1 - All elements of the pipeline are set to PAUSED
 2 - All elements of the pipeline are set to READY
 3 - All elements of the pipeline are set to NULL
and in the (1) step the pipeline is deadlocked.

So, what can we do to fix this issue?
There is a way of checking that the BaseSink is going to NULL state to not wait for prerolling?
Comment 13 Sebastian Dröge (slomo) 2016-06-06 20:58:46 UTC
Your problem is that you don't set the pipeline to NULL state, but some element that is upstream of a sink while its streaming thread is busy inside the sink. This can't work and will block the state change until something is shutting down the sink (or flushing it).

If you set the pipeline to NULL state it should work just fine, as the state changes are happening from sink to source.


It all seems to be behaving like it should, no? If you want to shut down just the upstream part of a pipeline you need to ensure that none of the streaming threads of that part are blocked somewhere downstream.
Comment 14 Miguel París Díaz 2016-06-06 21:42:14 UTC
(In reply to Sebastian Dröge (slomo) from comment #13)
> Your problem is that you don't set the pipeline to NULL state, but some
> element that is upstream of a sink while its streaming thread is busy inside
> the sink. This can't work and will block the state change until something is
> shutting down the sink (or flushing it).
No, in the real case I am setting the entire pipeline to NULL (the test is an example that forces the situation in an easy way, in the GDB output attached you can see that the deadlock is the same in both cases).

> If you set the pipeline to NULL state it should work just fine, as the state
> changes are happening from sink to source.
> 
Here is the problem, when all elements of the pipeline are set to PAUSE state. Take a look to this race condition:
  1 - [app_thread] when the pipeline is going to NULL the BaseSink (in our case a FakeSink) is set to READY, so it needs to do prerolling.
  2 - [streaming_thread] A buffer arrives to the BaseSink
    2.1 - This causes preroll_wait [1].
    2.2 - Here, the streaming_thread is blocked and so all upstream pads
  3- [app_thread] tries to change the state to PAUSE of the bin that contains the as deadlocked ProsyPad (in 2.2)
    3.1 - it is blocked in [3] because the mutex is locked by [streaming_thread]

Reviewing the GDB traces I have noticed that the transition to be set is PAUSED_TO_READY [1].
This should've been PLAYING_TO_PAUSE, shouldn't it?
Can the problem be here?

Refs
[1]
  • #14 gst_element_change_state
    at gstelement.c line 2648
  • #15 gst_element_set_state_func
    at gstelement.c line 2602

Comment 15 Sebastian Dröge (slomo) 2016-06-07 06:10:27 UTC
Can you provide a testcase that replicates the real situation?
Comment 16 Miguel París Díaz 2016-06-07 16:51:58 UTC
Created attachment 329313 [details]
Test that reproduces the deadlock (changing the whole pipeline state)
Comment 17 Miguel París Díaz 2016-06-07 16:53:34 UTC
Created attachment 329314 [details] [review]
bin: set SINK flag for bin parents when a sink element is added
Comment 18 Miguel París Díaz 2016-06-07 20:35:54 UTC
Created attachment 329337 [details] [review]
bin: set SINK flag for bin parents when a sink element is added

Fix the bug
Comment 19 Miguel París Díaz 2016-06-08 08:07:18 UTC
Patch #329337 fixes the test case #329313, but does not fix the bug entirely.

What happens if we have more than one bin containing sink elements and their elements are connected directly to other elements in other bins using gst_pad_link_full (srcpad, sinkpad, GST_PAD_LINK_CHECK_CAPS)? (that is, the bins does not have sinkpads)

In that case the algorithm to set the "degree" used for changing the state of the elements in the correct order does not work properly because of [1] and [2].

Refs
[1] https://github.com/Kurento/gstreamer/blob/0fb3a083ce04551fbdba7a94f0ac5612515bda67/gst/gstbin.c#L2052
[2] https://github.com/Kurento/gstreamer/blob/0fb3a083ce04551fbdba7a94f0ac5612515bda67/gst/gstbin.c#L2078
Comment 20 Sebastian Dröge (slomo) 2016-06-08 08:18:54 UTC
Review of attachment 329337 [details] [review]:

This will need a closer look, in any case thanks for the testcase :) I can't see anything obviously wrong in it, that is supposed to work

::: gst/gstbin.c
@@ +1146,3 @@
+      GST_DEBUG_OBJECT (parent, "Set as sink");
+      GST_OBJECT_FLAG_SET (parent, GST_ELEMENT_FLAG_SINK);
+      parent = GST_OBJECT_PARENT (parent);

This seems racy, both the setting of the flags and the getting of the parent elements
Comment 21 Sebastian Dröge (slomo) 2016-06-08 08:21:13 UTC
Oh actually the testcase *is* wrong. You're linking elements together that are not in the same bin, and this only works because you don't have GST_PAD_LINK_CHECK_HIERARCHY during linking.

This is expected to fail in various interesting ways. You need to use ghostpads to link elements from different bins together.
Comment 22 Miguel París Díaz 2016-06-08 08:43:11 UTC
Yes, I am linking elements in this way because I am using bins to add and remove a set of elements dynamically in an easy way, in other words, these bins have NO logic. This shouldn't be wrong if the API allow it, should it?

So, from your explanation and my answer:
 - Should we fix the algorithm to set the "degree" of the elements taking into account these cases?
 - Should we mandate users to ALWAYS create ghostpads if they use bins? In that case:
    - Which is the easier way of doing this?
    - We have to take into account that this is less efficient (more and more useless pads :()
Comment 23 Sebastian Dröge (slomo) 2016-06-08 09:07:08 UTC
(In reply to Miguel París Díaz from comment #22)
> Yes, I am linking elements in this way because I am using bins to add and
> remove a set of elements dynamically in an easy way, in other words, these
> bins have NO logic. This shouldn't be wrong if the API allow it, should it?

The gst_pad_link_full() API allows to disable checks for performance reasons. When you know that what you're doing is correct and checks don't have to happen. In your case, what you're doing is not correct, so the result after linking is wrong.

> So, from your explanation and my answer:
>  - Should we fix the algorithm to set the "degree" of the elements taking
> into account these cases?

No, the pipeline graph that is created is invalid :)

>  - Should we mandate users to ALWAYS create ghostpads if they use bins? In
> that case:
>     - Which is the easier way of doing this?

gst_element_link() and gst_element_link_pads() will automatically create ghostpads for you if needed. Or gst_pad_link_maybe_ghosting()

>     - We have to take into account that this is less efficient (more and
> more useless pads :()

That would have to be optimized instead then :) If ghostpads are too expensive, then something has to be improved there in general. Ghostpads are not doing anything at all other than proxying.
Comment 24 Miguel París Díaz 2016-06-08 12:40:34 UTC
Created attachment 329377 [details] [review]
utils: fix memory leak in find_common_root

This patch fixed a memory leak that I have found using gst_element_link () instead of linking directly the pads of the elements into different bins.
Comment 25 Miguel París Díaz 2016-06-08 12:47:13 UTC
I have tried to apply attachment 329377 [details] [review] in master branch, and I have noticed that it was already fixed by https://bugzilla.gnome.org/show_bug.cgi?id=765961

I should have tested it before, sorry for the noise :(.
Comment 26 Sebastian Dröge (slomo) 2016-06-08 13:16:00 UTC
Comment on attachment 329377 [details] [review]
utils: fix memory leak in find_common_root

No problem, shows that the patch was more than correct :)
Comment 27 Miguel París Díaz 2016-06-08 13:30:12 UTC
Review of attachment 329337 [details] [review]:

::: gst/gstbin.c
@@ +1146,3 @@
+      GST_DEBUG_OBJECT (parent, "Set as sink");
+      GST_OBJECT_FLAG_SET (parent, GST_ELEMENT_FLAG_SINK);
+      parent = GST_OBJECT_PARENT (parent);

Why is it racy?
How can I fix it?
Comment 28 Sebastian Dröge (slomo) 2016-06-08 13:36:48 UTC
The parent could go away while you're in the loop for example
Comment 29 Sebastian Dröge (slomo) 2016-06-08 13:45:43 UTC
Note that using an invalid pipeline like this is going to cause you more problems than what you noticed so far. Better build valid pipelines and use those in Kurento and if there are performance problems, fix those instead of creating invalid pipelines.
Comment 30 Miguel París Díaz 2016-06-08 14:11:35 UTC
Created attachment 329386 [details] [review]
bin: set SINK flag for bin parents when a sink element is added
Comment 31 Sebastian Dröge (slomo) 2016-06-08 16:23:15 UTC
Comment on attachment 329386 [details] [review]
bin: set SINK flag for bin parents when a sink element is added

You probably want to do the same for IS_SOURCE... and probably something is needed to *unset* the flag again.

That is, if you have a bin A containing a bin B, and B contains a sink C. Then you want A, B and C to have the IS_SINK flag set. But when C is removed from B, both A and B shouldn't have it set anymore. I think with the current code, only B would get the flag removed, right?
Comment 32 Sebastian Dröge (slomo) 2016-06-08 16:27:31 UTC
It would also be good to add some tests for this flags handling :) And this should all be done in a different bug report, as the flags handling problems are valid problems while this bug is about a problem that happened because of a problem in your application/testcase
Comment 33 Miguel París Díaz 2016-06-23 14:55:40 UTC
Created attachment 330265 [details]
Simple program to force sticky event misordeing in tee src's peer pad
Comment 34 Miguel París Díaz 2016-06-23 14:56:24 UTC
Created attachment 330266 [details] [review]
pad: simple patch to force race conditions
Comment 35 Miguel París Díaz 2016-06-23 15:01:24 UTC
I am so sorry, I have uploaded attachment 330265 [details] and attachment 330266 [details] [review] here, but I was wrong, could you remove them please?