GNOME Bugzilla – Bug 764020
adaptivedemux: Deadlock on HLS and DASH streams when scrub seeking
Last modified: 2016-04-20 13:54:57 UTC
A good example of such an issue is: https://ci.gstreamer.net/job/GStreamer-master-validate/2881/testReport/junit/validate.hls.playback/scrub_forward_seeking/hls_bibbop/ You can reproduce with: $ gst-validate-launcher -t validate.hls.playback.scrub_forward_seeking.hls_bibbop -l stdout Which starts a local http server and runs starts seeking the stream every 0.1 second for 10 seconds. The stacktrace is: (gdb) t a a bt
+ Trace 236099
Thread 31 (Thread 0x7fce09b6b700 (LWP 11388))
Thread 30 (Thread 0x7fce08912700 (LWP 11390))
Thread 29 (Thread 0x7fcdf7fff700 (LWP 11392))
Thread 28 (Thread 0x7fcdf77fe700 (LWP 11393))
Thread 27 (Thread 0x7fcdf6ffd700 (LWP 11395))
Thread 26 (Thread 0x7fcdf614b700 (LWP 11396))
Thread 25 (Thread 0x7fcdf5721700 (LWP 11397))
Thread 14 (Thread 0x7fcdcbb86700 (LWP 11409))
Pasting some IRC discussion we had about this: >bilboed oh, could this be because of "not-completely-linked" parts in decodebin ? >thiblahute bilboed, You mean https://bugzilla.gnome.org/show_bug.cgi?id=606382 ? or? >bilboed notice how the two decoder threads are stuck in a query probe >thiblahute Yes, saw that, not sure what those probes are? >bilboed decodebin2's source_pad_blocked_cb >bilboed oh balls, it's indeed due to different groups >thiblahute Yeah, not surprising I think, we know it is still racy >thiblahute bilboed, How did you come to that conclusion? >bilboed because two decoders are currently pushing in the playsink queues, and two decoders are blocked in that query probe >bilboed i.e ... two groups >thiblahute Indeed. >thaytan thiblahute, bilboed: I've been looking at that one too >bilboed great, at least it's not a false positive >thaytan It's one of many deadlocks in hlsdemux/adaptivedemux when switching bitrates / groups >thaytan and seeking, or pausing due to buffering >thiblahute thaytan, We do not have a bug for following those do we? >thaytan no, not that I know of >thiblahute OK, I guess we can use the one I just opened https://bugzilla.gnome.org/show_bug.cgi?id=764020 >there's quite a few places it can deadlock on the manifest lock >thaytan or where things can hang on state locks trying to simultaneously change the state of the uri-fetching source bins
Another deadlock happening on dash demux: (gdb) t a a bt
+ Trace 236102
Thread 22 (Thread 0x7fa23a4da700 (LWP 30167))
Thread 21 (Thread 0x7fa23905a700 (LWP 30169))
Thread 20 (Thread 0x7fa22bfff700 (LWP 30170))
Thread 19 (Thread 0x7fa22b7fe700 (LWP 30171))
Thread 18 (Thread 0x7fa22affd700 (LWP 30172))
Thread 17 (Thread 0x7fa22a7fc700 (LWP 30173))
Thread 16 (Thread 0x7fa229ffb700 (LWP 30174))
Thread 15 (Thread 0x7fa2297fa700 (LWP 30175))
Thread 14 (Thread 0x7fa228ff9700 (LWP 30176))
Thread 13 (Thread 0x7fa20bfff700 (LWP 30179))
Thread 12 (Thread 0x7fa20b7fe700 (LWP 30180))
Thread 3 (Thread 0x7fa200d94700 (LWP 30189))
Thread 2 (Thread 0x7fa1d7bdd700 (LWP 30190))
Created attachment 324570 [details] debug log of multiqueue buffering Debug log snippet that shows a problem - multiqueue1 posts buffering msg 0x7f526401b610 (seqnum 5196 - 98%). Later it posts 0x7f5230968740 (seqnum 5199 - 100%) from the same thread id - 0x7f526813bed0. Somehow, after processing through several intermediate buses (and presumably decodebin and other handle_message functions) the order of the 2 messages is reversed, and the app receives the 98% buffering message last and never recovers.
Created attachment 324643 [details] [review] decodebin2: Hold new buffering_post lock while posting msgs There's a small window between decodebin choosing a buffering level to post and another thread choosing a different buffering level where things can race. Close that window by holding a new lock that's only for posting buffering messages - like what was done in multiqueue.
This patch fixes the DASH seeking case, and improves HLS but doesn't fix it entirely. HLS does more recreating of pads and can hang / deadlock in different ways.
Review of attachment 324643 [details] [review]: Looks good to me, something for 1.8.1 ::: gst/playback/gstdecodebin2.c @@ +5474,3 @@ } if (drop) if (foo) bar else { baz } doesn't give a compiler warning anymore? Please add some curly braces :)
Created attachment 324672 [details] [review] decodebin2: Hold new buffering_post lock while posting msgs There's a small window between decodebin choosing a buffering level to post and another thread choosing a different buffering level where things can race. Close that window by holding a new lock that's only for posting buffering messages - like what was done in multiqueue.
Attachment 324672 [details] pushed as fd92bdf - decodebin2: Hold new buffering_post lock while posting msgs