Bug 700342 – decodebin: Crashes and deadlocks when setting to READY while still autoplugging

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 700342 - decodebin: Crashes and deadlocks when setting to READY while still autoplugging


Summary:	decodebin: Crashes and deadlocks when setting to READY while still autoplugging


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-base
Version:	1.0.7
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	NONE
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:	playback

Duplicates:	690605 695328 (view as bug list)
Depends on:
Blocks:

Reported:	2013-05-14 20:45 UTC by Christoph Reiter (lazka)
Modified:	2016-02-23 17:32 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
gdb log (10.35 KB, text/plain) 2013-05-14 20:47 UTC, Christoph Reiter (lazka)	Details
gdb-log-2 (33.15 KB, text/plain) 2013-05-14 20:52 UTC, Christoph Reiter (lazka)	Details
test.c (3.10 KB, text/plain) 2013-05-15 13:48 UTC, Sebastian Dröge (slomo)	Details
gdb-log-3 (36.53 KB, text/plain) 2013-05-15 16:25 UTC, Christoph Reiter (lazka)	Details
gdb-log-4 (33.30 KB, text/plain) 2013-05-15 16:45 UTC, Christoph Reiter (lazka)	Details
gdb-log-5 (15.08 KB, text/plain) 2013-05-24 19:22 UTC, Christoph Reiter (lazka)	Details

Description Christoph Reiter (lazka) 2013-05-14 20:45:57 UTC

+ Trace 231947

Thread 665 (Thread 0x7fffb971e700 (LWP 4433))

#0 __lll_lock_wait
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 135
#1 _L_lock_1134
from /lib/x86_64-linux-gnu/libpthread.so.0
#2 __GI___pthread_mutex_lock
at pthread_mutex_lock.c line 104
#3 g_mutex_lock
at /tmp/buildd/glib2.0-2.36.1/./glib/gthread-posix.c line 210
#4 gst_decode_chain_is_complete
at gstdecodebin2.c line 3194
#5 analyze_new_pad
at gstdecodebin2.c line 1717
#6 pad_added_cb
at gstdecodebin2.c line 2453
#7 caps_notify_cb
at gstdecodebin2.c line 2567
#8 g_closure_invoke
at /tmp/buildd/glib2.0-2.36.1/./gobject/gclosure.c line 777
#9 signal_emit_unlocked_R
at /tmp/buildd/glib2.0-2.36.1/./gobject/gsignal.c line 3584
#10 g_signal_emit_valist
at /tmp/buildd/glib2.0-2.36.1/./gobject/gsignal.c line 3328
#11 g_signal_emit
at /tmp/buildd/glib2.0-2.36.1/./gobject/gsignal.c line 3384
#12 g_object_dispatch_properties_changed
at /tmp/buildd/glib2.0-2.36.1/./gobject/gobject.c line 1042
#13 gst_object_dispatch_properties_changed
at gstobject.c line 439
#14 g_object_notify_by_spec_internal
at /tmp/buildd/glib2.0-2.36.1/./gobject/gobject.c line 1136
#15 g_object_notify_by_pspec
at /tmp/buildd/glib2.0-2.36.1/./gobject/gobject.c line 1237
#16 gst_pad_store_sticky_event
at gstpad.c line 4430
#17 gst_pad_push_event
at gstpad.c line 4627
#18 gst_pad_set_caps
at /usr/include/gstreamer-1.0/gst/gstcompat.h line 71
#19 gst_mpeg_audio_parse_handle_frame
at gstmpegaudioparse.c line 721
#20 gst_base_parse_handle_buffer
at gstbaseparse.c line 1828
#21 gst_base_parse_scan_frame
at gstbaseparse.c line 2901
#22 gst_base_parse_loop
at gstbaseparse.c line 2970
#23 gst_task_func
at gsttask.c line 316
#24 g_thread_pool_thread_proxy
at /tmp/buildd/glib2.0-2.36.1/./glib/gthreadpool.c line 309
#25 g_thread_proxy
at /tmp/buildd/glib2.0-2.36.1/./glib/gthread.c line 798
#26 start_thread
at pthread_create.c line 311
#27 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 113


	Inferior 1 [process 3757] will be killed.

Quit anyway? (y or n)

Comment 1 Christoph Reiter (lazka) 2013-05-14 20:47:38 UTC

Created attachment 244244 [details]
gdb log

bugzilla seems to truncate the traceback..

Comment 2 Christoph Reiter (lazka) 2013-05-14 20:52:29 UTC

Created attachment 244245 [details]
gdb-log-2

And here is another one..

Comment 3 Sebastian Dröge (slomo) 2013-05-15 07:37:42 UTC

How can this be reproduced, do you have a testcase?

What seems to happen here is that decodebin is shut down, shutting down baseparse too, at the same time as baseparse is setting caps on a pad and making decodebin check if it can expose pads now.

As in one thread we go baseparse(stream-lock)->decodebin(chain-lock) and in another thread decodebin(chain-lock)->baseparse(stream-lock) this is deadlocking.

Comment 4 Christoph Reiter (lazka) 2013-05-15 08:28:10 UTC

My testcase is creating a playlist of my whole library and holding down the "next song" key while after a few hundred it either deadlocks or crashes (https://bugzilla.gnome.org/show_bug.cgi?id=700340)

No simple one... sorry.

Comment 5 Sebastian Dröge (slomo) 2013-05-15 12:54:57 UTC

This should fix at least the first deadlock. Can you test? And if there's still something going wrong, attach new backtraces?

commit 83f247697670a163357d81ef78746ab2ebc02d6c
Author: Sebastian Dröge <sebastian.droege@collabora.co.uk>
Date:   Wed May 15 14:47:53 2013 +0200

    decodebin: Hold the expose lock when freeing a chain
    
    https://bugzilla.gnome.org/show_bug.cgi?id=700342

http://cgit.freedesktop.org/gstreamer/gst-plugins-base/commit/?id=83f247697670a163357d81ef78746ab2ebc02d6c

Comment 6 Sebastian Dröge (slomo) 2013-05-15 13:46:26 UTC

There are still some others left, and also crashes and g_warnings() and other things :)

Comment 7 Sebastian Dröge (slomo) 2013-05-15 13:48:01 UTC

Created attachment 244320 [details]
test.c

TIMEOUT is the number of milliseconds until the element is set back to READY/NULL
If REUSE is defined, the same playbin instance is used all the time

Comment 8 Tim-Philipp Müller 2013-05-15 13:59:31 UTC

There's also bug #695328 btw, which might be one of those things observed.

Comment 9 Sebastian Dröge (slomo) 2013-05-15 14:12:16 UTC

Yes, that crash has the same cause as one of the deadlocks I found :)

Basically, we can still be in our callbacks in decodebin and try autoplugging, after the state change function has chained up to the parent for PAUSED->READY, then we free the decode chains that are still in use. Leading to deadlocks and/or crashes :) The basic assumption here apparently was, that we can never be in the autoplugging callbacks after chaining up to the parent's state change function.

Comment 10 Sebastian Dröge (slomo) 2013-05-15 14:12:34 UTC

*** Bug 695328 has been marked as a duplicate of this bug. ***

Comment 11 Sebastian Dröge (slomo) 2013-05-15 14:31:33 UTC

This is broken since 0.10 times btw

Comment 12 Christoph Reiter (lazka) 2013-05-15 16:25:54 UTC

Created attachment 244338 [details]
gdb-log-3

Thanks,

I get the attached with current trunk.

Comment 13 Christoph Reiter (lazka) 2013-05-15 16:45:57 UTC

Created attachment 244340 [details]
gdb-log-4

And another one, probably the same, on trunk again.

Comment 14 Sebastian Dröge (slomo) 2013-05-22 17:20:51 UTC

Ok, what I said above really can't happen. When the decode chains are freed, the state change function already chained up to the GstBin and GstElement state change function. This would've stopped all dataflow while holding the pad's stream locks.

Now our callbacks are called from inside the data flow and thus these stream locks are hold. So the callbacks should've returned already before the GstBin and GstElement state change functions returned and before we free the decode chain.

That's for gdb-log-3.


For gdb-log-4, the above is exactly happening. The GstBin and GstElement state change functions are shutting down all data flow and are holding the pad's stream lock for that... unfortunately some multiqueue is holding the stream lock still and waiting forever for an allocation query to be answered.

Comment 15 Sebastian Dröge (slomo) 2013-05-22 17:24:58 UTC

(In reply to comment #14)

> For gdb-log-4, the above is exactly happening. The GstBin and GstElement state
> change functions are shutting down all data flow and are holding the pad's
> stream lock for that... unfortunately some multiqueue is holding the stream
> lock still and waiting forever for an allocation query to be answered.

And this seems to be possible to happen in a few flushing scenarios when the queue is just flushed but any pending queries are not waken up. Same has to be checked in queue and queue2

Comment 16 Sebastian Dröge (slomo) 2013-05-22 17:26:30 UTC

And for the queue problems see bug #688824 too

Comment 17 Sebastian Dröge (slomo) 2013-05-23 21:02:53 UTC

So although this bug is handled in bug #688824 and bug #690420 , I'd prefer to keep it open until we fixed these two bugs and then confirmed that nothing else is left.

Comment 18 Sebastian Dröge (slomo) 2013-05-24 16:32:49 UTC

Author: Sebastian Dröge <slomo@circular-chaos.org>
Date:   Fri May 24 18:30:44 2013 +0200

    multiqueue: Make sure to always signal any possible pending serialized queries
    
    And don't unref them when flushing the queue, they're owned by the caller!
    
    https://bugzilla.gnome.org/show_bug.cgi?id=700342


This fixes the deadlock with the queue.

Comment 19 Christoph Reiter (lazka) 2013-05-24 19:22:46 UTC

Created attachment 245265 [details]
gdb-log-5

No more deadlocks with trunk as far as I can tell.. but the attached crash.

Comment 20 Sebastian Dröge (slomo) 2013-05-24 21:04:19 UTC

Yes I had that crash too, also some other crash in gst-libav that happen far more often.

Comment 21 Sebastian Dröge (slomo) 2013-07-10 07:58:32 UTC

Still a problem but not as bad as before.

Comment 22 Sebastian Dröge (slomo) 2013-08-16 09:54:00 UTC

*** Bug 690605 has been marked as a duplicate of this bug. ***

Comment 23 Tim-Philipp Müller 2016-02-23 17:21:49 UTC

I can't reproduce this any more, and I there definitely have been fixes in that respect too, so let's close this. Please reopen if you still have problems.

Comment 24 Sebastian Dröge (slomo) 2016-02-23 17:29:39 UTC

All the commits I merged earlier today are related to this too, the ones from bug #759539 and the related ones.

Comment 25 Christoph Reiter (lazka) 2016-02-23 17:32:14 UTC

1.6 has been stable here. Thanks everyone!