GNOME Bugzilla – Bug 741355
playbin: deadlock
Last modified: 2015-02-24 16:09:13 UTC
Created attachment 292471 [details] stacktrace This is using trunk (except plugins-bad is at 1.4 because it doesn't build here atm, but I think it's not relevant here) See attached trace.
Is this a regression from the fix for https://bugzilla.gnome.org/show_bug.cgi?id=741198 ?
And what exactly is the scenario to reproduce it?
No, also happens without the patch. I use quodlibet, select 100 ogg vorbis songs in the playlist and hit the "next song" key on my keyboard and wait until it freezes.
Does the same happen with gst-play-1.0 --interactive ~/Music/*.ogg and pressing '>' repeatedly?
I can't with gst-play-1.0 and the same set of songs. quodlibet recreates the pipeline on each song change, so this might be stressing different paths, or different timings I guess. I'll try to bisect then.
Created attachment 292487 [details] segfault stacktrace I failed to bisect, at some point in the history (between 1.4 and trunk) the deadlock just seems to get rarer and just happen sometimes, confusing me and bisect. That's my guess at least, since three bisects got me to three different commits. I got a few crashes (see attachment) while at it. And occasionally one of the three: ######################################### GStreamer-WARNING **: loop detected in the graph of bin 'decodebin3376'!! ######################################### Attempt to unlock mutex that was not locked ######################################### (quodlibet:6374): GStreamer-CRITICAL **: Failed to deactivate pad oggdemux3084:sink, very bad (quodlibet:6374): GStreamer-CRITICAL **: Trying to dispose element typefind, but it is in READY instead of the NULL state. You need to explicitly set elements to the NULL state before dropping the final reference, to allow them to clean up. This problem may also be caused by a refcounting bug in the application or some element. (quodlibet:6374): GStreamer-CRITICAL **: Trying to dispose element decodebin3085, but it is in READY instead of the NULL state. You need to explicitly set elements to the NULL state before dropping the final reference, to allow them to clean up. This problem may also be caused by a refcounting bug in the application or some element. #########################################
Created attachment 292515 [details] flac deadlock trace Something similar happens with flac files as well
Possibly fixed by this patch. Can you try it and see if you can reproduce the bug ? commit 661588b15056984398598c1d8d772c62e040e729 Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Tue Jan 20 15:08:24 2015 +0000 dcodebin2: fix lock/unlock mismatch on multiqueue overrun
Created attachment 295003 [details] stacktrace I can still reproduce with gstreamer/gst-plugins-base trunk
I think that patch was just cosmetic, surely group->dbin does not change during this function, if ever?
Hah. I went back to look and had not realized these two were aliasing each other the whole time. More debugging needed then.
Created attachment 296046 [details] [review] possible fix for deadlock Does this fix the problem ?
Can you add some explanation why this could cause a deadlock? Lock order problem (chain mutex vs expose mutex?)?
Created attachment 296079 [details] [review] possible fix, with detailed explanation of the bug Explanations added to the commit message.
Comment on attachment 296079 [details] [review] possible fix, with detailed explanation of the bug Completely makes sense, good analysis :) I think this should go in, independent of fixing this specific bug because it's the right thing to do.
Created attachment 296091 [details] crash with patch Got quite a few crashes with and without the patch during testing, filed in bug 743980 With the patch I can't seem to reproduce the hang in attachment 292471 [details] but I get a new crash, see this attachment. Also I still can reproduce the deadlock with flac files, same stacktrace as attachment 292515 [details]
Maybe the problem is that the chain goes away while it is used in change_state or elsewhere in decodebin. There seems to be missing locking to prevent that.
Created attachment 296092 [details] [review] guard against the decode chain going while used This moves all the uses of dbin->decode_chain within the expose lock, which is taken when tearing it down.
With both patches ogg vorbis works now (also the crash I reported in bug 743980 is gone). With flac I still get the deadlock described above. Thanks Vincent.
Review of attachment 296092 [details] [review]: Looks good to me.
Created attachment 296137 [details] [review] guard against decode_chain going while used I forgot to add and commit --amend before creating the patch. This one does the guard against all uses.
The FLAC deadlock seems like it might be solved by this latest patch too, since one thread is waiting in analyze_pad and the other in a downward state change.
OK, thanks, I'll give it a another try.
Created attachment 296188 [details] flac deadlock trace new I can still trigger the flac deadlock with both patches applied. See attached stacktrace.
Created attachment 296190 [details] flac critical I also occasionally get this with flac: ** CRITICAL **: gst_pb_utils_add_codec_description_to_tag_list: assertion 'caps != NULL' failed maybe it's related; trace of the first critical attached.
Created attachment 296192 [details] [review] flac deadlock likely fix This third patch should fix the flac deadlock. And hopefully that critical if it was related. If not, I have something else to look at :)
Deadlock seems to be gone now, critical still there. I can create a new bug report for it if wanted.
Yes, it seems like a flac issue so a new bug is best. I'll wait a bit for comments and push soonish if none are forthcoming.
All three patches pushed. Thanks for the testing. commit a2ee84fa80cde8174a2e44258b3b57e94c28216e Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Thu Feb 5 12:07:50 2015 +0000 decodebin: fix deadlock between downward state change and pad addition If caps on a newly added pad are NULL, analyze_new_pad will try to acquire the chain lock to add a probe to the pad so the chain can be built later. This comes from the streaming thread, in response to headers or other buffers causing this pad to be added, so the stream lock is taken. Meanwhile, another thread might be destroying the chain from a downward state change. This will cause the chain to be freed with the chain lock taken, and some elements are set to NULL here, which can include the parser. This causes pad deactivation, which tries to take the element's pad's stream lock, deadlocking. Fix this by keeping track of which elements need setting to NULL, and only do this after the chain lock is released. Only the chain manipulation needs to be locked, not the elements' state changes. https://bugzilla.gnome.org/show_bug.cgi?id=741355 commit a848ac7abe0664d6aa0d7b8e6368534ac34be22b Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Wed Feb 4 11:46:09 2015 +0000 decodebin: guard against the decode chain going while a pad is added https://bugzilla.gnome.org/show_bug.cgi?id=741355 commit 9036dc85940c1decb5c12ee253a6657b9cf70a2b Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Tue Feb 3 17:06:43 2015 +0000 decodebin: possible fix for deadlock when spamming "next song" There was a deadlock between a thread changing decodebin/demuxer state from PAUSED to READY, and another thread pushing data when starting. From the stack trace at https://bug741355.bugzilla-attachments.gnome.org/attachment.cgi?id=292471, I deduce the following is happening, though I did not reproduce the problem so I'm not sure this patch fixes it. The streaming thread (thread 2 in that stack trace) takes the demuxer's sink pad's stream lock in gst_ogg_demux_perform_seek_pull and will activate a new chain. This ends up causing the expose lock being taken in _pad_added_cb in decodebin. Meanwhile, a state changed is triggered on thread 1, which takes the expose lock in decodebin in gst_decode_bin_change_state, then frees the previous chain, which ends up calling gst_pad_stop_task on the demuxer's task, which in turn takes the demuxer's sink pad's stream lock, deadlocking as both threads are now waiting for each other. https://bugzilla.gnome.org/show_bug.cgi?id=741355
Thanks!
*** Bug 743980 has been marked as a duplicate of this bug. ***
Hey, with a848ac7abe0664d6aa0d7b8e6368534ac34be22b I observe a deadlock when running : gst-launch-1.0 playbin uri=http://dev-iplatforms.kw.bbc.co.uk/dash/news24-avc3/news24.php
Fixed. The original test case did not trigger the buffering code path so the double lock was not found. commit 561ddabd97b52229d65ac05a5e281f031e42e621 Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Tue Feb 24 16:06:08 2015 +0000 decodebin: fix deadlock when resetting buffering This function is static, and only ever called with the expose lock taken. It thus has no reason to take this lock itself. This was introduced by one of my locking fixes from 741355. https://bugzilla.gnome.org/show_bug.cgi?id=741355