After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 765906 - typefind: Regression due to fix for 763491
typefind: Regression due to fix for 763491
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gstreamer (core)
git master
Other All
: Normal blocker
: 1.8.2
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2016-05-02 14:53 UTC by Xabier Rodríguez Calvar
Modified: 2016-05-11 13:42 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Pipeline (76.86 KB, image/png)
2016-05-02 14:53 UTC, Xabier Rodríguez Calvar
  Details
GST_DEBUG="*:5" (917.71 KB, text/x-log)
2016-05-02 14:54 UTC, Xabier Rodríguez Calvar
  Details
Threads Backtrace (24.67 KB, text/plain)
2016-05-02 15:00 UTC, Xabier Rodríguez Calvar
  Details
Desktop threads backtrace (52.07 KB, text/plain)
2016-05-03 15:19 UTC, Xabier Rodríguez Calvar
  Details
Bump WebKitGTK+ dependencies to 1.8.1 (3.91 KB, patch)
2016-05-03 15:21 UTC, Xabier Rodríguez Calvar
rejected Details | Review
typefind: Only push a CAPS event downstream if the sinkpad is not in PULL mode (1.23 KB, patch)
2016-05-11 12:08 UTC, Sebastian Dröge (slomo)
committed Details | Review

Description Xabier Rodríguez Calvar 2016-05-02 14:53:34 UTC
Created attachment 327154 [details]
Pipeline

The deadlock happens already at a very early step, that's why the pipeline dump is so simple.

+++ This bug was initially created as a clone of Bug #763491 +++
Comment 1 Xabier Rodríguez Calvar 2016-05-02 14:54:09 UTC
Created attachment 327155 [details]
GST_DEBUG="*:5"
Comment 2 Sebastian Dröge (slomo) 2016-05-02 14:55:24 UTC
Can you provide a testcase for reproducing the problem too?
Comment 3 Xabier Rodríguez Calvar 2016-05-02 14:56:05 UTC
The used media file can be found at https://github.com/youtube/js_mse_eme/blob/master/media/car_20130125_18.mp4.
Comment 4 Xabier Rodríguez Calvar 2016-05-02 15:00:47 UTC
Created attachment 327156 [details]
Threads Backtrace

(In reply to Sebastian Dröge (slomo) from comment #2)
> Can you provide a testcase for reproducing the problem too?

I'm trying, but as a first step I thought it could be useful to provide some more information.
Comment 5 Sebastian Dröge (slomo) 2016-05-03 07:41:00 UTC
So one thread (thread 1) is deactivating the typefind pad from PULL mode due to a state change (why is thread 1 doing that? is it shutting down the pipeline or is that still the state change of starting it?), while the other thread (typefind's task) is sending sticky events on that pad. For some reason it looks like there is a lock order problem or so, both threads seem to wait on a different stream lock: one is locking from srcpad to sinkpad (thread 1), the other from sinkpad to srcpad (the other thread that pushes events).
Comment 6 Sebastian Dröge (slomo) 2016-05-03 07:41:23 UTC
A testcase for this would definitely be helpful, it's not clear to me what this application is actually doing.
Comment 7 Xabier Rodríguez Calvar 2016-05-03 15:19:37 UTC
Created attachment 327236 [details]
Desktop threads backtrace

I bumped GStreamer versions to 1.8.1 in WebKitGTK+ and manage to reproduce what I think is the same problem. I'm attaching the backtraces.
Comment 8 Xabier Rodríguez Calvar 2016-05-03 15:21:28 UTC
Created attachment 327237 [details] [review]
Bump WebKitGTK+ dependencies to 1.8.1

To ease the the process of bumping the GStreamer versions at WebKitGTK+ though you'd probably like to have your own repos instead :)
Comment 9 Xabier Rodríguez Calvar 2016-05-03 15:22:31 UTC
I must say that at my desktop it was no easy to get it blocked there.
Comment 10 Xabier Rodríguez Calvar 2016-05-03 15:24:32 UTC
(In reply to Sebastian Dröge (slomo) from comment #6)
> A testcase for this would definitely be helpful, it's not clear to me what
> this application is actually doing.

As you could see in my last comments I tried to reproduce the bug in WKGTK+ to see if you can get some interesting info from it. I'll continue to try to produce a test case with an appsrc feeding info a playbin.
Comment 11 Sebastian Dröge (slomo) 2016-05-05 07:25:37 UTC
I like WTF::ParkingLot ;)

Looks indeed like the same problem. type found sends out the sticky events from the src pad and blocks on the stream lock of the next sink pad, all this happening from the task function on the sinkpad (i.e. the sinkpad stream lock is taken of typefind and the one of the next downstream element is trying to be taken but deadlocks). At the same time the main thread shuts down the pipeline, which then deadlocks on the sinkpad stream lock of typefind, supposedly because the next downstream sinkpad stream lock is already taken.

In thread 1, gst_qtdemux_change_state() (frame 14) deactivates the pads of typefind (frame 4).


So basically we have upstream stream lock -> downstream stream lock vs. the other way around here. And this happens because of the PULL mode special case (sinkpad of qtdemux), which directly deactivates the peer srcpad.


I assume the problem here is that typefind was pulling itself (that's why the task is running at all), then it was emitting type-found from there, then qtdemux was plugged, was activating in pull mode itself and deactivating typefind from pull mode. typefind should've shut down the task though but doesn't, it only pauses the task because otherwise we would deadlock (we would shut down the task from itself). Then the pipeline is shut down while the task is still running, so qtdemux deactivates from pull mode which then again pauses the task.


Not sure what to do about this. Maybe we just have to detect this case in typefind where it is actually not running in pull mode anymore so won't have to shut down the task anymore here (although it is still running and would just disappear on the next opportunity).
Comment 12 Sebastian Dröge (slomo) 2016-05-05 07:26:35 UTC
Note that this is unrelated to the actual fix for bug #763491 but a general problem that should've existed before already.
Comment 13 Sebastian Dröge (slomo) 2016-05-05 07:29:37 UTC
A non-streamlock protected boolean that remembers the (target) task state might be an option btw
Comment 14 Sebastian Dröge (slomo) 2016-05-11 12:08:17 UTC
Created attachment 327639 [details] [review]
typefind: Only push a CAPS event downstream if the sinkpad is not in PULL mode

The other signal handlers of the type-found signal might have reactivated
typefind in PULL mode already, pushing a CAPS event at that point would cause
deadlocks and is in general unexpected by elements that are in PULL mode.
Comment 15 Xabier Rodríguez Calvar 2016-05-11 13:28:33 UTC
It works. \o/
Comment 16 Sebastian Dröge (slomo) 2016-05-11 13:41:16 UTC
Attachment 327639 [details] pushed as 5e43ee5 - typefind: Only push a CAPS event downstream if the sinkpad is not in PULL mode
Comment 17 Sebastian Dröge (slomo) 2016-05-11 13:42:15 UTC
Will backport to 1.8.2 in a bit