GNOME Bugzilla – Bug 700342
decodebin: Crashes and deadlocks when setting to READY while still autoplugging
Last modified: 2016-02-23 17:32:14 UTC
+ Trace 231947
Thread 665 (Thread 0x7fffb971e700 (LWP 4433))
Inferior 1 [process 3757] will be killed. Quit anyway? (y or n)
Created attachment 244244 [details] gdb log bugzilla seems to truncate the traceback..
Created attachment 244245 [details] gdb-log-2 And here is another one..
How can this be reproduced, do you have a testcase? What seems to happen here is that decodebin is shut down, shutting down baseparse too, at the same time as baseparse is setting caps on a pad and making decodebin check if it can expose pads now. As in one thread we go baseparse(stream-lock)->decodebin(chain-lock) and in another thread decodebin(chain-lock)->baseparse(stream-lock) this is deadlocking.
My testcase is creating a playlist of my whole library and holding down the "next song" key while after a few hundred it either deadlocks or crashes (https://bugzilla.gnome.org/show_bug.cgi?id=700340) No simple one... sorry.
This should fix at least the first deadlock. Can you test? And if there's still something going wrong, attach new backtraces? commit 83f247697670a163357d81ef78746ab2ebc02d6c Author: Sebastian Dröge <sebastian.droege@collabora.co.uk> Date: Wed May 15 14:47:53 2013 +0200 decodebin: Hold the expose lock when freeing a chain https://bugzilla.gnome.org/show_bug.cgi?id=700342 http://cgit.freedesktop.org/gstreamer/gst-plugins-base/commit/?id=83f247697670a163357d81ef78746ab2ebc02d6c
There are still some others left, and also crashes and g_warnings() and other things :)
Created attachment 244320 [details] test.c TIMEOUT is the number of milliseconds until the element is set back to READY/NULL If REUSE is defined, the same playbin instance is used all the time
There's also bug #695328 btw, which might be one of those things observed.
Yes, that crash has the same cause as one of the deadlocks I found :) Basically, we can still be in our callbacks in decodebin and try autoplugging, after the state change function has chained up to the parent for PAUSED->READY, then we free the decode chains that are still in use. Leading to deadlocks and/or crashes :) The basic assumption here apparently was, that we can never be in the autoplugging callbacks after chaining up to the parent's state change function.
*** Bug 695328 has been marked as a duplicate of this bug. ***
This is broken since 0.10 times btw
Created attachment 244338 [details] gdb-log-3 Thanks, I get the attached with current trunk.
Created attachment 244340 [details] gdb-log-4 And another one, probably the same, on trunk again.
Ok, what I said above really can't happen. When the decode chains are freed, the state change function already chained up to the GstBin and GstElement state change function. This would've stopped all dataflow while holding the pad's stream locks. Now our callbacks are called from inside the data flow and thus these stream locks are hold. So the callbacks should've returned already before the GstBin and GstElement state change functions returned and before we free the decode chain. That's for gdb-log-3. For gdb-log-4, the above is exactly happening. The GstBin and GstElement state change functions are shutting down all data flow and are holding the pad's stream lock for that... unfortunately some multiqueue is holding the stream lock still and waiting forever for an allocation query to be answered.
(In reply to comment #14) > For gdb-log-4, the above is exactly happening. The GstBin and GstElement state > change functions are shutting down all data flow and are holding the pad's > stream lock for that... unfortunately some multiqueue is holding the stream > lock still and waiting forever for an allocation query to be answered. And this seems to be possible to happen in a few flushing scenarios when the queue is just flushed but any pending queries are not waken up. Same has to be checked in queue and queue2
And for the queue problems see bug #688824 too
So although this bug is handled in bug #688824 and bug #690420 , I'd prefer to keep it open until we fixed these two bugs and then confirmed that nothing else is left.
Author: Sebastian Dröge <slomo@circular-chaos.org> Date: Fri May 24 18:30:44 2013 +0200 multiqueue: Make sure to always signal any possible pending serialized queries And don't unref them when flushing the queue, they're owned by the caller! https://bugzilla.gnome.org/show_bug.cgi?id=700342 This fixes the deadlock with the queue.
Created attachment 245265 [details] gdb-log-5 No more deadlocks with trunk as far as I can tell.. but the attached crash.
Yes I had that crash too, also some other crash in gst-libav that happen far more often.
Still a problem but not as bad as before.
*** Bug 690605 has been marked as a duplicate of this bug. ***
I can't reproduce this any more, and I there definitely have been fixes in that respect too, so let's close this. Please reopen if you still have problems.
All the commits I merged earlier today are related to this too, the ones from bug #759539 and the related ones.
1.6 has been stable here. Thanks everyone!