After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 741355 - playbin: deadlock
playbin: deadlock
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
git master
Other Linux
: High blocker
: 1.5.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2014-12-10 18:07 UTC by Christoph Reiter (lazka)
Modified: 2015-02-24 16:09 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
stacktrace (20.13 KB, text/plain)
2014-12-10 18:07 UTC, Christoph Reiter (lazka)
  Details
segfault stacktrace (23.04 KB, text/plain)
2014-12-10 22:01 UTC, Christoph Reiter (lazka)
  Details
flac deadlock trace (22.76 KB, text/plain)
2014-12-11 09:50 UTC, Christoph Reiter (lazka)
  Details
stacktrace (20.51 KB, text/plain)
2015-01-20 15:43 UTC, Christoph Reiter (lazka)
  Details
possible fix for deadlock (1.46 KB, patch)
2015-02-03 17:08 UTC, Vincent Penquerc'h
none Details | Review
possible fix, with detailed explanation of the bug (2.42 KB, patch)
2015-02-04 09:31 UTC, Vincent Penquerc'h
committed Details | Review
crash with patch (48.67 KB, text/plain)
2015-02-04 11:35 UTC, Christoph Reiter (lazka)
  Details
guard against the decode chain going while used (1.36 KB, patch)
2015-02-04 12:01 UTC, Vincent Penquerc'h
accepted-commit_now Details | Review
guard against decode_chain going while used (5.73 KB, patch)
2015-02-04 15:53 UTC, Vincent Penquerc'h
committed Details | Review
flac deadlock trace new (47.75 KB, text/plain)
2015-02-05 11:26 UTC, Christoph Reiter (lazka)
  Details
flac critical (48.75 KB, text/plain)
2015-02-05 11:46 UTC, Christoph Reiter (lazka)
  Details
flac deadlock likely fix (2.88 KB, patch)
2015-02-05 12:18 UTC, Vincent Penquerc'h
committed Details | Review

Description Christoph Reiter (lazka) 2014-12-10 18:07:20 UTC
Created attachment 292471 [details]
stacktrace

This is using trunk (except plugins-bad is at 1.4 because it doesn't build here atm, but I think it's not relevant here)

See attached trace.
Comment 1 Thiago Sousa Santos 2014-12-10 18:11:45 UTC
Is this a regression from the fix for https://bugzilla.gnome.org/show_bug.cgi?id=741198 ?
Comment 2 Thiago Sousa Santos 2014-12-10 18:14:56 UTC
And what exactly is the scenario to reproduce it?
Comment 3 Christoph Reiter (lazka) 2014-12-10 18:22:40 UTC
No, also happens without the patch.

I use quodlibet, select 100 ogg vorbis songs in the playlist and hit the "next song" key on my keyboard and wait until it freezes.
Comment 4 Tim-Philipp Müller 2014-12-10 18:30:26 UTC
Does the same happen with

 gst-play-1.0 --interactive ~/Music/*.ogg

and pressing '>' repeatedly?
Comment 5 Christoph Reiter (lazka) 2014-12-10 19:29:57 UTC
I can't with gst-play-1.0 and the same set of songs. quodlibet recreates the pipeline on each song change, so this might be stressing different paths, or different timings I guess.

I'll try to bisect then.
Comment 6 Christoph Reiter (lazka) 2014-12-10 22:01:35 UTC
Created attachment 292487 [details]
segfault stacktrace

I failed to bisect, at some point in the history (between 1.4 and trunk) the deadlock just seems to get rarer and just happen sometimes, confusing me and bisect. That's my guess at least, since three bisects got me to three different commits.

I got a few crashes (see attachment) while at it.

And occasionally one of the three:

#########################################

GStreamer-WARNING **: loop detected in the graph of bin 'decodebin3376'!!

#########################################

Attempt to unlock mutex that was not locked

#########################################

(quodlibet:6374): GStreamer-CRITICAL **: Failed to deactivate pad oggdemux3084:sink, very bad

(quodlibet:6374): GStreamer-CRITICAL **: 
Trying to dispose element typefind, but it is in READY instead of the NULL state.
You need to explicitly set elements to the NULL state before
dropping the final reference, to allow them to clean up.
This problem may also be caused by a refcounting bug in the
application or some element.

(quodlibet:6374): GStreamer-CRITICAL **: 
Trying to dispose element decodebin3085, but it is in READY instead of the NULL state.
You need to explicitly set elements to the NULL state before
dropping the final reference, to allow them to clean up.
This problem may also be caused by a refcounting bug in the
application or some element.

#########################################
Comment 7 Christoph Reiter (lazka) 2014-12-11 09:50:00 UTC
Created attachment 292515 [details]
flac deadlock trace

Something similar happens with flac files as well
Comment 8 Vincent Penquerc'h 2015-01-20 15:10:50 UTC
Possibly fixed by this patch. Can you try it and see if you can reproduce the bug ?

commit 661588b15056984398598c1d8d772c62e040e729
Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk>
Date:   Tue Jan 20 15:08:24 2015 +0000

    dcodebin2: fix lock/unlock mismatch on multiqueue overrun
Comment 9 Christoph Reiter (lazka) 2015-01-20 15:43:21 UTC
Created attachment 295003 [details]
stacktrace

I can still reproduce with gstreamer/gst-plugins-base trunk
Comment 10 Tim-Philipp Müller 2015-01-20 16:10:07 UTC
I think that patch was just cosmetic, surely group->dbin does not change during this function, if ever?
Comment 11 Vincent Penquerc'h 2015-01-20 16:13:22 UTC
Hah. I went back to look and had not realized these two were aliasing each other the whole time.
More debugging needed then.
Comment 12 Vincent Penquerc'h 2015-02-03 17:08:45 UTC
Created attachment 296046 [details] [review]
possible fix for deadlock

Does this fix the problem ?
Comment 13 Sebastian Dröge (slomo) 2015-02-03 19:24:24 UTC
Can you add some explanation why this could cause a deadlock? Lock order problem (chain mutex vs expose mutex?)?
Comment 14 Vincent Penquerc'h 2015-02-04 09:31:05 UTC
Created attachment 296079 [details] [review]
possible fix, with detailed explanation of the bug

Explanations added to the commit message.
Comment 15 Sebastian Dröge (slomo) 2015-02-04 10:40:31 UTC
Comment on attachment 296079 [details] [review]
possible fix, with detailed explanation of the bug

Completely makes sense, good analysis :) I think this should go in, independent of fixing this specific bug because it's the right thing to do.
Comment 16 Christoph Reiter (lazka) 2015-02-04 11:35:55 UTC
Created attachment 296091 [details]
crash with patch

Got quite a few crashes with and without the patch during testing, filed in bug 743980

With the patch I can't seem to reproduce the hang in attachment 292471 [details] but I get a new crash, see this attachment.

Also I still can reproduce the deadlock with flac files, same stacktrace as attachment 292515 [details]
Comment 17 Sebastian Dröge (slomo) 2015-02-04 11:42:22 UTC
Maybe the problem is that the chain goes away while it is used in change_state or elsewhere in decodebin. There seems to be missing locking to prevent that.
Comment 18 Vincent Penquerc'h 2015-02-04 12:01:20 UTC
Created attachment 296092 [details] [review]
guard against the decode chain going while used

This moves all the uses of dbin->decode_chain within the expose lock, which is taken when tearing it down.
Comment 19 Christoph Reiter (lazka) 2015-02-04 14:18:08 UTC
With both patches ogg vorbis works now (also the crash I reported in bug
743980 is gone).

With flac I still get the deadlock described above.

Thanks Vincent.
Comment 20 Nicolas Dufresne (ndufresne) 2015-02-04 15:27:05 UTC
Review of attachment 296092 [details] [review]:

Looks good to me.
Comment 21 Vincent Penquerc'h 2015-02-04 15:53:10 UTC
Created attachment 296137 [details] [review]
guard against decode_chain going while used

I forgot to add and commit --amend before creating the patch. This one does the guard against all uses.
Comment 22 Vincent Penquerc'h 2015-02-05 09:49:12 UTC
The FLAC deadlock seems like it might be solved by this latest patch too, since one thread is waiting in analyze_pad and the other in a downward state change.
Comment 23 Christoph Reiter (lazka) 2015-02-05 10:14:51 UTC
OK, thanks, I'll give it a another try.
Comment 24 Christoph Reiter (lazka) 2015-02-05 11:26:25 UTC
Created attachment 296188 [details]
flac deadlock trace new

I can still trigger the flac deadlock with both patches applied. See attached stacktrace.
Comment 25 Christoph Reiter (lazka) 2015-02-05 11:46:59 UTC
Created attachment 296190 [details]
flac critical

I also occasionally get this with flac:

** CRITICAL **: gst_pb_utils_add_codec_description_to_tag_list: assertion 'caps != NULL' failed

maybe it's related; trace of the first critical attached.
Comment 26 Vincent Penquerc'h 2015-02-05 12:18:06 UTC
Created attachment 296192 [details] [review]
flac deadlock likely fix

This third patch should fix the flac deadlock.

And hopefully that critical if it was related. If not, I have something else to look at :)
Comment 27 Christoph Reiter (lazka) 2015-02-05 12:50:55 UTC
Deadlock seems to be gone now, critical still there. I can create a new bug report for it if wanted.
Comment 28 Vincent Penquerc'h 2015-02-05 12:58:23 UTC
Yes, it seems like a flac issue so a new bug is best.

I'll wait a bit for comments and push soonish if none are forthcoming.
Comment 29 Vincent Penquerc'h 2015-02-19 13:43:22 UTC
All three patches pushed. Thanks for the testing.

commit a2ee84fa80cde8174a2e44258b3b57e94c28216e
Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk>
Date:   Thu Feb 5 12:07:50 2015 +0000

    decodebin: fix deadlock between downward state change and pad addition
    
    If caps on a newly added pad are NULL, analyze_new_pad will try to
    acquire the chain lock to add a probe to the pad so the chain can
    be built later. This comes from the streaming thread, in response
    to headers or other buffers causing this pad to be added, so the
    stream lock is taken.
    
    Meanwhile, another thread might be destroying the chain from a
    downward state change. This will cause the chain to be freed with
    the chain lock taken, and some elements are set to NULL here, which
    can include the parser. This causes pad deactivation, which tries
    to take the element's pad's stream lock, deadlocking.
    
    Fix this by keeping track of which elements need setting to NULL,
    and only do this after the chain lock is released. Only the chain
    manipulation needs to be locked, not the elements' state changes.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=741355

commit a848ac7abe0664d6aa0d7b8e6368534ac34be22b
Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk>
Date:   Wed Feb 4 11:46:09 2015 +0000

    decodebin: guard against the decode chain going while a pad is added
    
    https://bugzilla.gnome.org/show_bug.cgi?id=741355

commit 9036dc85940c1decb5c12ee253a6657b9cf70a2b
Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk>
Date:   Tue Feb 3 17:06:43 2015 +0000

    decodebin: possible fix for deadlock when spamming "next song"
    
    There was a deadlock between a thread changing decodebin/demuxer
    state from PAUSED to READY, and another thread pushing data
    when starting.
    
    From the stack trace at
    https://bug741355.bugzilla-attachments.gnome.org/attachment.cgi?id=292471,
    I deduce the following is happening, though I did not reproduce the
    problem so I'm not sure this patch fixes it.
    
    The streaming thread (thread 2 in that stack trace) takes the demuxer's
    sink pad's stream lock in gst_ogg_demux_perform_seek_pull and will
    activate a new chain. This ends up causing the expose lock being taken
    in _pad_added_cb in decodebin.
    
    Meanwhile, a state changed is triggered on thread 1, which takes the
    expose lock in decodebin in gst_decode_bin_change_state, then frees
    the previous chain, which ends up calling gst_pad_stop_task on the
    demuxer's task, which in turn takes the demuxer's sink pad's stream
    lock, deadlocking as both threads are now waiting for each other.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=741355
Comment 30 Christoph Reiter (lazka) 2015-02-19 13:47:12 UTC
Thanks!
Comment 31 Tim-Philipp Müller 2015-02-19 15:12:49 UTC
*** Bug 743980 has been marked as a duplicate of this bug. ***
Comment 32 Mathieu Duponchelle 2015-02-24 14:04:53 UTC
Hey, with a848ac7abe0664d6aa0d7b8e6368534ac34be22b I observe a deadlock when running :

gst-launch-1.0 playbin uri=http://dev-iplatforms.kw.bbc.co.uk/dash/news24-avc3/news24.php
Comment 33 Vincent Penquerc'h 2015-02-24 16:09:13 UTC
Fixed. The original test case did not trigger the buffering code path so the double lock was not found.

commit 561ddabd97b52229d65ac05a5e281f031e42e621
Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk>
Date:   Tue Feb 24 16:06:08 2015 +0000

    decodebin: fix deadlock when resetting buffering
    
    This function is static, and only ever called with the expose lock
    taken. It thus has no reason to take this lock itself.
    
    This was introduced by one of my locking fixes from 741355.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=741355