GNOME Bugzilla – Bug 785124
decklinksrc: Deadlock on EOS since 523de1a9dc7b7f79c78120bed15c364336f067cb
Last modified: 2017-07-19 17:11:44 UTC
I get the following backtrace on EOS:
+ Trace 237671
Thread 4 (Thread 0x7fee2a68f700 (LWP 14167))
523de1a9dc7b7f79c78120bed15c364336f067cb is the first bad commit commit 523de1a9dc7b7f79c78120bed15c364336f067cb Author: Nicolas Dufresne <nicolas.dufresne@collabora.com> Date: Thu Jun 1 10:36:26 2017 -0400 basesrc: Don't hold LIVE_LOCK in create/alloc/fill Holding this lock on live source prevents the source from changing the caps in ::create() without risking a deadlock. This has consequences as the LIVE_LOCK was replacing the STREAM_LOCK in many situation. As a side effect: - We no longer need to unlock when doing play/pause as the LIVE_LOCK isn't held. We then let the create() call finish, but will block if the state have changed meanwhile. This has the benefit that wait_preroll() calls in subclass is no longer needed. - We no longer need to change the state to unlock, simplifying the set_flushing() interface - We need different handling for EOS depending if we are in push or pull mode. This patch also document the locking of each private class member and the locking order. https://bugzilla.gnome.org/show_bug.cgi?id=783301 Reverting commits 523de1a9dc7b7f79c78120bed15c364336f067cb , 2be51ba60ce718b6febf5c1bd40ca761c17bfb80 and dd5905c31a3b385cd4ee51141fccb072e04e8239 fixes the issue. Haven't yet found a way to reproduce it with upstream-only stuff - if something turns up I'll let you know!
I'll have a look at the backtrace, but meanwhile if you could describe what you are doing prior to the deadlock it would be useful.
(In reply to Vivia Nikolaidou from comment #0) > I get the following backtrace on EOS: - Can you provide full back trace please, just 1 thread isn't useful for looking up deadlock. - The backtrace seems truncated, what's before _send_eos ? - What thread is thread 4 ? - Is you pipeline in paused state ?
I am using the LTC patch for timecodestamper as per https://bugzilla.gnome.org/show_bug.cgi?id=784295 .
+ Trace 237672
Thread 40 (Thread 0x7fb5e8ff9700 (LWP 7025))
Thread 39 (Thread 0x7fb5f0fea700 (LWP 7018))
Thread 38 (Thread 0x7fb5f17eb700 (LWP 7017))
Thread 37 (Thread 0x7fb5f1fec700 (LWP 7016))
Thread 36 (Thread 0x7fb5f27ed700 (LWP 7015))
Thread 35 (Thread 0x7fb5f2fee700 (LWP 7014))
Thread 34 (Thread 0x7fb5f37ef700 (LWP 7013))
Thread 31 (Thread 0x7fb613fff700 (LWP 7009))
Thread 30 (Thread 0x7fb630fe9700 (LWP 7008))
Thread 29 (Thread 0x7fb6317ea700 (LWP 7007))
Thread 28 (Thread 0x7fb631feb700 (LWP 7006))
Thread 27 (Thread 0x7fb6327ec700 (LWP 7005))
Thread 26 (Thread 0x7fb632fed700 (LWP 7004))
Thread 24 (Thread 0x7fb633fef700 (LWP 7002))
Thread 23 (Thread 0x7fb6347f0700 (LWP 7001))
Thread 22 (Thread 0x7fb634ff1700 (LWP 7000))
Thread 21 (Thread 0x7fb6357f2700 (LWP 6999))
Thread 20 (Thread 0x7fb635ff3700 (LWP 6998))
Thread 19 (Thread 0x7fb6367f4700 (LWP 6997))
Thread 18 (Thread 0x7fb636ff5700 (LWP 6996))
Thread 17 (Thread 0x7fb6377f6700 (LWP 6995))
Thread 16 (Thread 0x7fb637ff7700 (LWP 6994))
Thread 15 (Thread 0x7fb6387f8700 (LWP 6993))
Thread 14 (Thread 0x7fb638ff9700 (LWP 6992))
Thread 13 (Thread 0x7fb6397fa700 (LWP 6991))
Thread 12 (Thread 0x7fb639ffb700 (LWP 6990))
Thread 11 (Thread 0x7fb63a7fc700 (LWP 6989))
Thread 10 (Thread 0x7fb63affd700 (LWP 6988))
Thread 9 (Thread 0x7fb63b7fe700 (LWP 6987))
Thread 8 (Thread 0x7fb63bfff700 (LWP 6986))
Thread 7 (Thread 0x7fb654dc4700 (LWP 6985))
Thread 6 (Thread 0x7fb6555c5700 (LWP 6984))
Thread 4 (Thread 0x7fb65f583700 (LWP 6976))
Also the pipeline is in PLAYING state.
Just noticed that threads below 4 got truncated.
+ Trace 237673
+ Trace 237674
Okay, that's really a bug in Bugzilla itself? Here it is: https://paste.debian.net/977200/
Despite the unlock/unlock_stop implementation being decent, it appears that decklinkaudiosrc or decklinkvideosrc didn't return from ->create() and that's what caused the observed deadlock. Marking as a decklinksrc bug for now, this need further investigation.
It is pulsesrc that is deadlocking on the stream-lock when receiving the EOS event. While it is flushing basesrc upon receiving EOS, this is only unlocking the source itself but not if it is blocked downstream: in this case pulesrc is blocked downstream by a full queue. And until that queue accepts another buffer, the EOS event will wait.
See https://bugzilla.gnome.org/show_bug.cgi?id=784295#c5