After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 700342 - decodebin: Crashes and deadlocks when setting to READY while still autoplugging
decodebin: Crashes and deadlocks when setting to READY while still autoplugging
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
1.0.7
Other Linux
: Normal major
: NONE
Assigned To: GStreamer Maintainers
GStreamer Maintainers
playback
: 690605 695328 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2013-05-14 20:45 UTC by Christoph Reiter (lazka)
Modified: 2016-02-23 17:32 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gdb log (10.35 KB, text/plain)
2013-05-14 20:47 UTC, Christoph Reiter (lazka)
Details
gdb-log-2 (33.15 KB, text/plain)
2013-05-14 20:52 UTC, Christoph Reiter (lazka)
Details
test.c (3.10 KB, text/plain)
2013-05-15 13:48 UTC, Sebastian Dröge (slomo)
Details
gdb-log-3 (36.53 KB, text/plain)
2013-05-15 16:25 UTC, Christoph Reiter (lazka)
Details
gdb-log-4 (33.30 KB, text/plain)
2013-05-15 16:45 UTC, Christoph Reiter (lazka)
Details
gdb-log-5 (15.08 KB, text/plain)
2013-05-24 19:22 UTC, Christoph Reiter (lazka)
Details

Description Christoph Reiter (lazka) 2013-05-14 20:45:57 UTC


Thread 665 (Thread 0x7fffb971e700 (LWP 4433))

  • #0 __lll_lock_wait
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 135
  • #1 _L_lock_1134
    from /lib/x86_64-linux-gnu/libpthread.so.0
  • #2 __GI___pthread_mutex_lock
    at pthread_mutex_lock.c line 104
  • #3 g_mutex_lock
    at /tmp/buildd/glib2.0-2.36.1/./glib/gthread-posix.c line 210
  • #4 gst_decode_chain_is_complete
    at gstdecodebin2.c line 3194
  • #5 analyze_new_pad
    at gstdecodebin2.c line 1717
  • #6 pad_added_cb
    at gstdecodebin2.c line 2453
  • #7 caps_notify_cb
    at gstdecodebin2.c line 2567
  • #8 g_closure_invoke
    at /tmp/buildd/glib2.0-2.36.1/./gobject/gclosure.c line 777
  • #9 signal_emit_unlocked_R
    at /tmp/buildd/glib2.0-2.36.1/./gobject/gsignal.c line 3584
  • #10 g_signal_emit_valist
    at /tmp/buildd/glib2.0-2.36.1/./gobject/gsignal.c line 3328
  • #11 g_signal_emit
    at /tmp/buildd/glib2.0-2.36.1/./gobject/gsignal.c line 3384
  • #12 g_object_dispatch_properties_changed
    at /tmp/buildd/glib2.0-2.36.1/./gobject/gobject.c line 1042
  • #13 gst_object_dispatch_properties_changed
    at gstobject.c line 439
  • #14 g_object_notify_by_spec_internal
    at /tmp/buildd/glib2.0-2.36.1/./gobject/gobject.c line 1136
  • #15 g_object_notify_by_pspec
    at /tmp/buildd/glib2.0-2.36.1/./gobject/gobject.c line 1237
  • #16 gst_pad_store_sticky_event
    at gstpad.c line 4430
  • #17 gst_pad_push_event
    at gstpad.c line 4627
  • #18 gst_pad_set_caps
    at /usr/include/gstreamer-1.0/gst/gstcompat.h line 71
  • #19 gst_mpeg_audio_parse_handle_frame
    at gstmpegaudioparse.c line 721
  • #20 gst_base_parse_handle_buffer
    at gstbaseparse.c line 1828
  • #21 gst_base_parse_scan_frame
    at gstbaseparse.c line 2901
  • #22 gst_base_parse_loop
    at gstbaseparse.c line 2970
  • #23 gst_task_func
    at gsttask.c line 316
  • #24 g_thread_pool_thread_proxy
    at /tmp/buildd/glib2.0-2.36.1/./glib/gthreadpool.c line 309
  • #25 g_thread_proxy
    at /tmp/buildd/glib2.0-2.36.1/./glib/gthread.c line 798
  • #26 start_thread
    at pthread_create.c line 311
  • #27 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 113

	Inferior 1 [process 3757] will be killed.

Quit anyway? (y or n)
Comment 1 Christoph Reiter (lazka) 2013-05-14 20:47:38 UTC
Created attachment 244244 [details]
gdb log

bugzilla seems to truncate the traceback..
Comment 2 Christoph Reiter (lazka) 2013-05-14 20:52:29 UTC
Created attachment 244245 [details]
gdb-log-2

And here is another one..
Comment 3 Sebastian Dröge (slomo) 2013-05-15 07:37:42 UTC
How can this be reproduced, do you have a testcase?

What seems to happen here is that decodebin is shut down, shutting down baseparse too, at the same time as baseparse is setting caps on a pad and making decodebin check if it can expose pads now.

As in one thread we go baseparse(stream-lock)->decodebin(chain-lock) and in another thread decodebin(chain-lock)->baseparse(stream-lock) this is deadlocking.
Comment 4 Christoph Reiter (lazka) 2013-05-15 08:28:10 UTC
My testcase is creating a playlist of my whole library and holding down the "next song" key while after a few hundred it either deadlocks or crashes (https://bugzilla.gnome.org/show_bug.cgi?id=700340)

No simple one... sorry.
Comment 5 Sebastian Dröge (slomo) 2013-05-15 12:54:57 UTC
This should fix at least the first deadlock. Can you test? And if there's still something going wrong, attach new backtraces?

commit 83f247697670a163357d81ef78746ab2ebc02d6c
Author: Sebastian Dröge <sebastian.droege@collabora.co.uk>
Date:   Wed May 15 14:47:53 2013 +0200

    decodebin: Hold the expose lock when freeing a chain
    
    https://bugzilla.gnome.org/show_bug.cgi?id=700342

http://cgit.freedesktop.org/gstreamer/gst-plugins-base/commit/?id=83f247697670a163357d81ef78746ab2ebc02d6c
Comment 6 Sebastian Dröge (slomo) 2013-05-15 13:46:26 UTC
There are still some others left, and also crashes and g_warnings() and other things :)
Comment 7 Sebastian Dröge (slomo) 2013-05-15 13:48:01 UTC
Created attachment 244320 [details]
test.c

TIMEOUT is the number of milliseconds until the element is set back to READY/NULL
If REUSE is defined, the same playbin instance is used all the time
Comment 8 Tim-Philipp Müller 2013-05-15 13:59:31 UTC
There's also bug #695328 btw, which might be one of those things observed.
Comment 9 Sebastian Dröge (slomo) 2013-05-15 14:12:16 UTC
Yes, that crash has the same cause as one of the deadlocks I found :)

Basically, we can still be in our callbacks in decodebin and try autoplugging, after the state change function has chained up to the parent for PAUSED->READY, then we free the decode chains that are still in use. Leading to deadlocks and/or crashes :) The basic assumption here apparently was, that we can never be in the autoplugging callbacks after chaining up to the parent's state change function.
Comment 10 Sebastian Dröge (slomo) 2013-05-15 14:12:34 UTC
*** Bug 695328 has been marked as a duplicate of this bug. ***
Comment 11 Sebastian Dröge (slomo) 2013-05-15 14:31:33 UTC
This is broken since 0.10 times btw
Comment 12 Christoph Reiter (lazka) 2013-05-15 16:25:54 UTC
Created attachment 244338 [details]
gdb-log-3

Thanks,

I get the attached with current trunk.
Comment 13 Christoph Reiter (lazka) 2013-05-15 16:45:57 UTC
Created attachment 244340 [details]
gdb-log-4

And another one, probably the same, on trunk again.
Comment 14 Sebastian Dröge (slomo) 2013-05-22 17:20:51 UTC
Ok, what I said above really can't happen. When the decode chains are freed, the state change function already chained up to the GstBin and GstElement state change function. This would've stopped all dataflow while holding the pad's stream locks.

Now our callbacks are called from inside the data flow and thus these stream locks are hold. So the callbacks should've returned already before the GstBin and GstElement state change functions returned and before we free the decode chain.

That's for gdb-log-3.


For gdb-log-4, the above is exactly happening. The GstBin and GstElement state change functions are shutting down all data flow and are holding the pad's stream lock for that... unfortunately some multiqueue is holding the stream lock still and waiting forever for an allocation query to be answered.
Comment 15 Sebastian Dröge (slomo) 2013-05-22 17:24:58 UTC
(In reply to comment #14)

> For gdb-log-4, the above is exactly happening. The GstBin and GstElement state
> change functions are shutting down all data flow and are holding the pad's
> stream lock for that... unfortunately some multiqueue is holding the stream
> lock still and waiting forever for an allocation query to be answered.

And this seems to be possible to happen in a few flushing scenarios when the queue is just flushed but any pending queries are not waken up. Same has to be checked in queue and queue2
Comment 16 Sebastian Dröge (slomo) 2013-05-22 17:26:30 UTC
And for the queue problems see bug #688824 too
Comment 17 Sebastian Dröge (slomo) 2013-05-23 21:02:53 UTC
So although this bug is handled in bug #688824 and bug #690420 , I'd prefer to keep it open until we fixed these two bugs and then confirmed that nothing else is left.
Comment 18 Sebastian Dröge (slomo) 2013-05-24 16:32:49 UTC
Author: Sebastian Dröge <slomo@circular-chaos.org>
Date:   Fri May 24 18:30:44 2013 +0200

    multiqueue: Make sure to always signal any possible pending serialized queries
    
    And don't unref them when flushing the queue, they're owned by the caller!
    
    https://bugzilla.gnome.org/show_bug.cgi?id=700342


This fixes the deadlock with the queue.
Comment 19 Christoph Reiter (lazka) 2013-05-24 19:22:46 UTC
Created attachment 245265 [details]
gdb-log-5

No more deadlocks with trunk as far as I can tell.. but the attached crash.
Comment 20 Sebastian Dröge (slomo) 2013-05-24 21:04:19 UTC
Yes I had that crash too, also some other crash in gst-libav that happen far more often.
Comment 21 Sebastian Dröge (slomo) 2013-07-10 07:58:32 UTC
Still a problem but not as bad as before.
Comment 22 Sebastian Dröge (slomo) 2013-08-16 09:54:00 UTC
*** Bug 690605 has been marked as a duplicate of this bug. ***
Comment 23 Tim-Philipp Müller 2016-02-23 17:21:49 UTC
I can't reproduce this any more, and I there definitely have been fixes in that respect too, so let's close this. Please reopen if you still have problems.
Comment 24 Sebastian Dröge (slomo) 2016-02-23 17:29:39 UTC
All the commits I merged earlier today are related to this too, the ones from bug #759539 and the related ones.
Comment 25 Christoph Reiter (lazka) 2016-02-23 17:32:14 UTC
1.6 has been stable here. Thanks everyone!