Bug 643087 – pulsesink: deadlock in gst_pulseringbuffer_open_device

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 643087 - pulsesink: deadlock in gst_pulseringbuffer_open_device


Summary:	pulsesink: deadlock in gst_pulseringbuffer_open_device


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	0.10.29
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-02-23 16:28 UTC by Philip Jägenstedt
Modified:	2011-03-08 15:33 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
s/ressource/resource/ (5.54 KB, patch) 2011-02-23 17:06 UTC, Philip Jägenstedt	committed	Details \| Review
experimental fix (introduces another deadlock, probably) (1.44 KB, patch) 2011-02-23 17:07 UTC, Philip Jägenstedt	none	Details \| Review
naive patch (905 bytes, patch) 2011-02-24 11:07 UTC, Philip Jägenstedt	none	Details \| Review
candidate fix (1.92 KB, patch) 2011-02-24 11:08 UTC, Philip Jägenstedt	committed	Details \| Review
pulsesink: Avoid deadlock in open_device() (1.56 KB, patch) 2011-02-24 13:41 UTC, Arun Raghavan	none	Details \| Review

Description Philip Jägenstedt 2011-02-23 16:28:56 UTC

I'm using decodebin2 and creating audiosink when needed from the "new-decoded-pad" signal callback. It happens quite often for me that there's a deadlock when two pipelines do this simultaneously.

One thread is waiting in pulsesink.c:455: g_mutex_lock (pa_shared_resource_mutex);

The other thread is waiting in pulsesink.c:511: pa_threaded_mainloop_wait (mainloop). In other words this thread is holding pa_shared_resource_mutex, which is taken further up in that function.

Here's the full backtrace of the two threads that have deadlocked using the git master of all repos:

+ Trace 226073

Thread 6 (Thread 0x7fffe6f5d700 (LWP 24602))

#0 __lll_lock_wait
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 136
#1 _L_lock_990
from /lib/libpthread.so.0
#2 __pthread_mutex_lock
at pthread_mutex_lock.c line 61
#3 gst_pulseringbuffer_open_device
at pulsesink.c line 455
#4 gst_ring_buffer_open_device
at gstringbuffer.c line 627
#5 gst_base_audio_sink_change_state
at gstbaseaudiosink.c line 1884
#6 gst_pulsesink_change_state
at pulsesink.c line 2749
#7 gst_element_change_state
at gstelement.c line 2658
#8 gst_element_set_state_func
at gstelement.c line 2614
#9 gst_element_set_state
at gstelement.c line 2515
#10 gst_auto_audio_sink_find_best
at gstautoaudiosink.c line 289
#11 gst_auto_audio_sink_detect
at gstautoaudiosink.c line 343
#12 gst_auto_audio_sink_change_state
at gstautoaudiosink.c line 391
#13 gst_element_change_state
at gstelement.c line 2658
#14 gst_element_set_state_func
at gstelement.c line 2614
#15 gst_element_set_state
at gstelement.c line 2515
#16 gst_bin_element_set_state
at gstbin.c line 2193
#17 gst_bin_change_state_func
at gstbin.c line 2492
#18 gst_element_change_state
at gstelement.c line 2658
#19 gst_element_set_state_func
at gstelement.c line 2614
#20 gst_element_set_state
at gstelement.c line 2515
#21 gst_op_link_pad
#22 GstMediaPlayer::NewDecodedPad
at ../platforms/media_backends/gst/gstmediaplayer.cpp line 958
#23 gst_play_marshal_VOID__OBJECT_BOOLEAN
at gstplay-marshal.c line 209
#24 g_closure_invoke
at /build/buildd/glib2.0-2.28.1/./gobject/gclosure.c line 767
#25 signal_emit_unlocked_R
at /build/buildd/glib2.0-2.28.1/./gobject/gsignal.c line 3252
#26 g_signal_emit_valist
at /build/buildd/glib2.0-2.28.1/./gobject/gsignal.c line 2983
#27 g_signal_emit
at /build/buildd/glib2.0-2.28.1/./gobject/gsignal.c line 3040
#28 gst_decode_bin_expose
at gstdecodebin2.c line 3256
#29 source_pad_blocked_cb
at gstdecodebin2.c line 3384
#30 handle_pad_block
at gstpad.c line 4034
#31 gst_pad_push_event
at gstpad.c line 5205
#32 theora_handle_type_packet
at gsttheoradec.c line 925
#33 theora_handle_header_packet
at gsttheoradec.c line 956
#34 theora_dec_decode_buffer
at gsttheoradec.c line 1298
#35 theora_dec_chain_forward
at gsttheoradec.c line 1464
#36 theora_dec_chain
at gsttheoradec.c line 1492
#37 gst_pad_push
at gstpad.c line 4667
#38 gst_single_queue_push_one
at gstmultiqueue.c line 921
#39 gst_multi_queue_loop
at gstmultiqueue.c line 1101
#40 gst_task_func
at gsttask.c line 318
#41 default_func
at gsttaskpool.c line 70
#42 g_thread_pool_thread_proxy
at /build/buildd/glib2.0-2.28.1/./glib/gthreadpool.c line 319
#43 g_thread_create_proxy
at /build/buildd/glib2.0-2.28.1/./glib/gthread.c line 1897
#44 start_thread
at pthread_create.c line 304
#45 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
#46 ??

Thread 8 (Thread 0x7fffe5a8c700 (LWP 24604))

#0 pthread_cond_wait
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S line 140
#1 pa_threaded_mainloop_wait
from /usr/lib/libpulse.so.0
#2 gst_pulseringbuffer_open_device
at pulsesink.c line 511
#3 gst_ring_buffer_open_device
at gstringbuffer.c line 627
#4 gst_base_audio_sink_change_state
at gstbaseaudiosink.c line 1884
#5 gst_pulsesink_change_state
at pulsesink.c line 2749
#6 gst_element_change_state
at gstelement.c line 2658
#7 gst_element_set_state_func
at gstelement.c line 2614
#8 gst_element_set_state
at gstelement.c line 2515
#9 gst_auto_audio_sink_find_best
at gstautoaudiosink.c line 289
#10 gst_auto_audio_sink_detect
at gstautoaudiosink.c line 343
#11 gst_auto_audio_sink_change_state
at gstautoaudiosink.c line 391
#12 gst_element_change_state
at gstelement.c line 2658
#13 gst_element_set_state_func
at gstelement.c line 2614
#14 gst_element_set_state
at gstelement.c line 2515
#15 gst_bin_element_set_state
at gstbin.c line 2193
#16 gst_bin_change_state_func
at gstbin.c line 2492
#17 gst_element_change_state
at gstelement.c line 2658
#18 gst_element_set_state_func
at gstelement.c line 2614
#19 gst_element_set_state
at gstelement.c line 2515
#20 gst_op_link_pad
#21 GstMediaPlayer::NewDecodedPad
at ../platforms/media_backends/gst/gstmediaplayer.cpp line 958
#22 gst_play_marshal_VOID__OBJECT_BOOLEAN
at gstplay-marshal.c line 209
#23 g_closure_invoke
at /build/buildd/glib2.0-2.28.1/./gobject/gclosure.c line 767
#24 signal_emit_unlocked_R
at /build/buildd/glib2.0-2.28.1/./gobject/gsignal.c line 3252
#25 g_signal_emit_valist
at /build/buildd/glib2.0-2.28.1/./gobject/gsignal.c line 2983
#26 g_signal_emit
at /build/buildd/glib2.0-2.28.1/./gobject/gsignal.c line 3040
#27 gst_decode_bin_expose
at gstdecodebin2.c line 3256
#28 source_pad_blocked_cb
at gstdecodebin2.c line 3384
#29 handle_pad_block
at gstpad.c line 4034
#30 gst_pad_push_event
at gstpad.c line 5205
#31 theora_handle_type_packet
at gsttheoradec.c line 925
#32 theora_handle_header_packet
at gsttheoradec.c line 956
#33 theora_dec_decode_buffer
at gsttheoradec.c line 1298
#34 theora_dec_chain_forward
at gsttheoradec.c line 1464
#35 theora_dec_chain
at gsttheoradec.c line 1492
#36 gst_pad_push
at gstpad.c line 4667
#37 gst_single_queue_push_one
at gstmultiqueue.c line 921
#38 gst_multi_queue_loop
at gstmultiqueue.c line 1101
#39 gst_task_func
at gsttask.c line 318
#40 default_func
at gsttaskpool.c line 70
#41 g_thread_pool_thread_proxy
at /build/buildd/glib2.0-2.28.1/./glib/gthreadpool.c line 319
#42 g_thread_create_proxy
at /build/buildd/glib2.0-2.28.1/./glib/gthread.c line 1897
#43 start_thread
at pthread_create.c line 304
#44 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
#45 ??

Comment 1 Philip Jägenstedt 2011-02-23 16:33:11 UTC

Don't get confused by the fact that there's an s less in my pa_shared_resource_mutex, I just cleaned that up when starting to debug and that's not what's causing this deadlock.

Comment 2 Philip Jägenstedt 2011-02-23 16:38:46 UTC

It looks very odd that even though thread 8 must have called  pa_threaded_mainloop_lock (mainloop), thread 6 was able to also get the lock and instead got stuck in waiting for g_mutex_lock (pa_shared_resource_mutex);

Perhaps there's a bug in PulseAudio. A speculative fix might be to reverse the order of taking the locks, hiding the bug. Will try that.

Comment 3 Sebastian Dröge (slomo) 2011-02-23 16:44:44 UTC

Just a guess but maybe the lock order is different elsewhere, which then causes this deadlock?

Comment 4 Philip Jägenstedt 2011-02-23 16:59:40 UTC

In gst_pulsesink_change_state pa_shared_resource_mutex is taken before pa_threaded_mainloop_new (obviously) and pa_threaded_mainloop_start, but I'm not seeing any other thread in the GStreamer pulsesink code at the time of deadlock, so I doubt that's the problem. There is this thread, though:

+ Trace 226076

Thread 12 (Thread 0x7fffde486700 (LWP 24749))

#0 __pthread_mutex_lock_full
at pthread_mutex_lock.c line 303
#1 pa_mutex_lock
from /usr/lib/libpulsecommon-0.9.22.so
#2 ??
from /usr/lib/libpulse.so.0
#3 pa_mainloop_poll
from /usr/lib/libpulse.so.0
#4 pa_mainloop_iterate
from /usr/lib/libpulse.so.0
#5 pa_mainloop_run
from /usr/lib/libpulse.so.0
#6 ??
from /usr/lib/libpulse.so.0
#7 ??
from /usr/lib/libpulsecommon-0.9.22.so
#8 start_thread
at pthread_create.c line 304
#9 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
#10 ??

Comment 5 Philip Jägenstedt 2011-02-23 17:01:46 UTC

I see that when _close_device calls _destroy_context, the locks will be taken in the same order as in _open_device, so reverting that as was my idea without also changing that would create another deadlock instead.

Comment 6 Philip Jägenstedt 2011-02-23 17:06:36 UTC

Created attachment 181725 [details] [review]
s/ressource/resource/

Comment 7 Philip Jägenstedt 2011-02-23 17:07:06 UTC

Created attachment 181726 [details] [review]
experimental fix (introduces another deadlock, probably)

Comment 8 Philip Jägenstedt 2011-02-24 09:21:27 UTC

I installed libpulse0-dbg and have some further information here. First a full backtrace of all the threads (except Opera's main thread):

(gdb) thread apply all bt

+ Trace 226090

Thread 8 (Thread 0x7fffe5938700 (LWP 8054))

#0 __lll_lock_wait
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 136
#1 _L_lock_990
from /lib/libpthread.so.0
#2 __pthread_mutex_lock
at pthread_mutex_lock.c line 61
#3 gst_pulseringbuffer_open_device
at pulsesink.c line 455
#4 gst_ring_buffer_open_device
at gstringbuffer.c line 627
#5 gst_base_audio_sink_change_state
at gstbaseaudiosink.c line 1884
#6 gst_pulsesink_change_state
at pulsesink.c line 2749
#7 gst_element_change_state
at gstelement.c line 2658
#8 gst_element_set_state_func
at gstelement.c line 2614
#9 gst_element_set_state
at gstelement.c line 2515
#10 gst_auto_audio_sink_find_best
at gstautoaudiosink.c line 289
#11 gst_auto_audio_sink_detect
at gstautoaudiosink.c line 343
#12 gst_auto_audio_sink_change_state
at gstautoaudiosink.c line 391
#13 gst_element_change_state
at gstelement.c line 2658
#14 gst_element_set_state_func
at gstelement.c line 2614
#15 gst_element_set_state
at gstelement.c line 2515
#16 gst_bin_element_set_state
at gstbin.c line 2193
#17 gst_bin_change_state_func
at gstbin.c line 2492
#18 gst_element_change_state
at gstelement.c line 2658
#19 gst_element_set_state_func
at gstelement.c line 2614
#20 gst_element_set_state
at gstelement.c line 2515
#21 gst_op_link_pad
#22 GstMediaPlayer::NewDecodedPad
at ../platforms/media_backends/gst/gstmediaplayer.cpp line 958
#23 gst_play_marshal_VOID__OBJECT_BOOLEAN
at gstplay-marshal.c line 209
#24 g_closure_invoke
at /build/buildd/glib2.0-2.28.1/./gobject/gclosure.c line 767
#25 signal_emit_unlocked_R
at /build/buildd/glib2.0-2.28.1/./gobject/gsignal.c line 3252
#26 g_signal_emit_valist
at /build/buildd/glib2.0-2.28.1/./gobject/gsignal.c line 2983
#27 g_signal_emit
at /build/buildd/glib2.0-2.28.1/./gobject/gsignal.c line 3040
#28 gst_decode_bin_expose
at gstdecodebin2.c line 3256
#29 source_pad_blocked_cb
at gstdecodebin2.c line 3384
#30 handle_pad_block
at gstpad.c line 4034
#31 gst_pad_push_event
at gstpad.c line 5205
#32 theora_handle_type_packet
at gsttheoradec.c line 925
#33 theora_handle_header_packet
at gsttheoradec.c line 956
#34 theora_dec_decode_buffer
at gsttheoradec.c line 1298
#35 theora_dec_chain_forward
at gsttheoradec.c line 1464
#36 theora_dec_chain
at gsttheoradec.c line 1492
#37 gst_pad_push
at gstpad.c line 4667
#38 gst_single_queue_push_one
at gstmultiqueue.c line 921
#39 gst_multi_queue_loop
at gstmultiqueue.c line 1101
#40 gst_task_func
at gsttask.c line 318
#41 default_func
at gsttaskpool.c line 70
#42 g_thread_pool_thread_proxy
at /build/buildd/glib2.0-2.28.1/./glib/gthreadpool.c line 319
#43 g_thread_create_proxy
at /build/buildd/glib2.0-2.28.1/./glib/gthread.c line 1897
#44 start_thread
at pthread_create.c line 304
#45 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
#46 ??

Thread 6 (Thread 0x7fffe6f5d700 (LWP 8052))

#0 pthread_cond_wait
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S line 140
#1 pa_threaded_mainloop_wait
at pulse/thread-mainloop.c line 212
#2 gst_pulseringbuffer_open_device
at pulsesink.c line 511
#3 gst_ring_buffer_open_device
at gstringbuffer.c line 627
#4 gst_base_audio_sink_change_state
at gstbaseaudiosink.c line 1884
#5 gst_pulsesink_change_state
at pulsesink.c line 2749
#6 gst_element_change_state
at gstelement.c line 2658
#7 gst_element_set_state_func
at gstelement.c line 2614
#8 gst_element_set_state
at gstelement.c line 2515
#9 gst_auto_audio_sink_find_best
at gstautoaudiosink.c line 289
#10 gst_auto_audio_sink_detect
at gstautoaudiosink.c line 343
#11 gst_auto_audio_sink_change_state
at gstautoaudiosink.c line 391
#12 gst_element_change_state
at gstelement.c line 2658
#13 gst_element_set_state_func
at gstelement.c line 2614
#14 gst_element_set_state
at gstelement.c line 2515
#15 gst_bin_element_set_state
at gstbin.c line 2193
#16 gst_bin_change_state_func
at gstbin.c line 2492
#17 gst_element_change_state
at gstelement.c line 2658
#18 gst_element_set_state_func
at gstelement.c line 2614
#19 gst_element_set_state
at gstelement.c line 2515
#20 gst_op_link_pad
#21 GstMediaPlayer::NewDecodedPad
at ../platforms/media_backends/gst/gstmediaplayer.cpp line 958
#22 gst_play_marshal_VOID__OBJECT_BOOLEAN
at gstplay-marshal.c line 209
#23 g_closure_invoke
at /build/buildd/glib2.0-2.28.1/./gobject/gclosure.c line 767
#24 signal_emit_unlocked_R
at /build/buildd/glib2.0-2.28.1/./gobject/gsignal.c line 3252
#25 g_signal_emit_valist
at /build/buildd/glib2.0-2.28.1/./gobject/gsignal.c line 2983
#26 g_signal_emit
at /build/buildd/glib2.0-2.28.1/./gobject/gsignal.c line 3040
#27 gst_decode_bin_expose
at gstdecodebin2.c line 3256
#28 source_pad_blocked_cb
at gstdecodebin2.c line 3384
#29 handle_pad_block
at gstpad.c line 4034
#30 gst_pad_push_event
at gstpad.c line 5205
#31 theora_handle_type_packet
at gsttheoradec.c line 925
#32 theora_handle_header_packet
at gsttheoradec.c line 956
#33 theora_dec_decode_buffer
at gsttheoradec.c line 1298
#34 theora_dec_chain_forward
at gsttheoradec.c line 1464
#35 theora_dec_chain
at gsttheoradec.c line 1492
#36 gst_pad_push
at gstpad.c line 4667
#37 gst_single_queue_push_one
at gstmultiqueue.c line 921
#38 gst_multi_queue_loop
at gstmultiqueue.c line 1101
#39 gst_task_func
at gsttask.c line 318
#40 default_func
at gsttaskpool.c line 70
#41 g_thread_pool_thread_proxy
at /build/buildd/glib2.0-2.28.1/./glib/gthreadpool.c line 319
#42 g_thread_create_proxy
at /build/buildd/glib2.0-2.28.1/./glib/gthread.c line 1897
#43 start_thread
at pthread_create.c line 304
#44 clone
at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
#45 ??

Comment 9 Philip Jägenstedt 2011-02-24 10:31:08 UTC

Now, thread 6 and 8 are still the interesting ones.

Thread 6 is waiting in pa_threaded_mainloop_wait (m=0x7fffe81a6aa0) at pulse/thread-mainloop.c:212, that is pa_cond_wait(m->cond, m->mutex);

That m->mutex is being held by thread 8, how did that happen? It turns out it's not that mysterious, pa_threaded_mainloop_wait is waiting for a condition, which of course releases the lock! The documentation says so quite clearly: "While waiting the lock will be released, immediately before returning it will be acquired again."

So, this is a pretty basic deadlock, albeit a bit racy. The fix here must be to untangle the use of pa_shared_resource_mutex and pa_threaded_mainloop_lock/unlock.

Comment 10 Philip Jägenstedt 2011-02-24 11:07:29 UTC

Created attachment 181813 [details] [review]
naive patch

Here's a patch that should fix the problem and is probably safe, but I have another one that is probably better but needs careful review.

Comment 11 Philip Jägenstedt 2011-02-24 11:08:31 UTC

Created attachment 181814 [details] [review]
candidate fix

This patch is the one I'm now using and with it I can't reproduce the deadlock.

Comment 12 Philip Jägenstedt 2011-02-24 11:10:51 UTC

Finally, there are some things I find odd about the code, here in the form of comments in a diff (after "s/ressource/resource/" and "candidate fix" have been applied):

diff --git a/ext/pulse/pulsesink.c b/ext/pulse/pulsesink.c
index c45acb9..ddce402 100644
--- a/ext/pulse/pulsesink.c
+++ b/ext/pulse/pulsesink.c
@@ -337,6 +337,7 @@ gst_pulseringbuffer_finalize (GObject * object)
 
   ringbuffer = GST_PULSERING_BUFFER_CAST (object);
 
+  /* why is this not protected by the mainloop lock? */
   gst_pulsering_destroy_context (ringbuffer);
   G_OBJECT_CLASS (ring_parent_class)->finalize (object);
 }
@@ -451,6 +452,8 @@ gst_pulseringbuffer_open_device (GstRingBuffer * buf)
   else
     pbuf->context_name = g_strdup (psink->client_name);
 
+  /* is it really necessary to take this lock here rather than just
+     after the g_mutex_unlock (pa_shared_resource_mutex) below? */
   pa_threaded_mainloop_lock (mainloop);
 
   g_mutex_lock (pa_shared_resource_mutex);

Comment 13 Sebastian Dröge (slomo) 2011-02-24 12:25:33 UTC

(In reply to comment #12)
>    ringbuffer = GST_PULSERING_BUFFER_CAST (object);
> 
> +  /* why is this not protected by the mainloop lock? */
>    gst_pulsering_destroy_context (ringbuffer);
>    G_OBJECT_CLASS (ring_parent_class)->finalize (object);

Because the mainloop should not run anymore at this point, everything should be cleaned up already when going back to NULL state.

> @@ -451,6 +452,8 @@ gst_pulseringbuffer_open_device (GstRingBuffer * buf)
>    else
>      pbuf->context_name = g_strdup (psink->client_name);
> 
> +  /* is it really necessary to take this lock here rather than just
> +     after the g_mutex_unlock (pa_shared_resource_mutex) below? */
>    pa_threaded_mainloop_lock (mainloop);
> 
>    g_mutex_lock (pa_shared_resource_mutex);

I think it is because it might create a new pulse context with that mainloop if none exists already.

Comment 14 Philip Jägenstedt 2011-02-24 12:36:11 UTC

(In reply to comment #13)
> (In reply to comment #12)
> > @@ -451,6 +452,8 @@ gst_pulseringbuffer_open_device (GstRingBuffer * buf)
> >    else
> >      pbuf->context_name = g_strdup (psink->client_name);
> > 
> > +  /* is it really necessary to take this lock here rather than just
> > +     after the g_mutex_unlock (pa_shared_resource_mutex) below? */
> >    pa_threaded_mainloop_lock (mainloop);
> > 
> >    g_mutex_lock (pa_shared_resource_mutex);
> 
> I think it is because it might create a new pulse context with that mainloop if
> none exists already.

The documentation for pa_context_new pa_threaded_mainloop_get_api doesn't say that a lock is needed, but I guess there's no harm.

Comment 15 Arun Raghavan 2011-02-24 13:41:42 UTC

Created attachment 181825 [details] [review]
pulsesink: Avoid deadlock in open_device()

Make sure that we have the resource lock before locking the mainloop to
avoid a deadlock when there are two instances opening the device
concurrently.

Comment 16 Arun Raghavan 2011-02-24 13:42:01 UTC

From what I can tell, the deadlock is caused by the following steps:

1. Thread 1 takes the mainloop lock followed by the resource lock in open_device(), sets up a context, and does a wait()

2. In the mean time, thread 2 enters open_device(), takes the mainloop lock (which is a recursive lock, and blocks any events from being dispatched), and then blocks on the resource lock

3. Since thread 2 is sleeping with the mainloop lock, the context state callback will never be issued and thus it cannot signal thread 1 to wake up

Philip, could you try the patch I've attached to see if the deadlock goes away?

Comment 17 Philip Jägenstedt 2011-02-24 14:02:47 UTC

Arun, what you describe is correct. Your patch is almost identical to my <https://bugzilla.gnome.org/show_bug.cgi?id=643087#c7> on which I commented "introduces another deadlock, probably".

The problem is that your code takes the lock in order "resource lock, mainloop lock", while there's other code that takes it in the reverse order, at least gst_pulseringbuffer_close_device and gst_pulseringbuffer_release+gst_pulsering_destroy_stream.

In other words, that fix will introduce another deadlock which will happen if you're creating and destroying pulsesinks at the same time, which will eventually happen.

Comment 18 Philip Jägenstedt 2011-02-24 14:07:09 UTC

In the interest of figuring out how widespread this deadlock is, I found that the problem was probably introduced in <http://cgit.freedesktop.org/gstreamer/gst-plugins-good/commit/?id=ae8d210fdbad6bd79824bb57c7356f4e10e7c5b5>, a refactoring done by Stefan Kost.

I tried CC'ing Stefan but don't appear to have the privileges to do that. Could someone add him, he might have some insight into what exactly the two locks are supposed to protect...

Comment 19 Stefan Sauer (gstreamer, gtkdoc dev) 2011-02-27 20:35:18 UTC

Phillip, I am reading bug email anyway. That commit was a fixup of previous patches made by your colleague Phil :) As you can see the patch does not change the lock order. Are you saying that everything works if you revert the patch?

Comment 20 Philip Jägenstedt 2011-02-27 22:42:35 UTC

Stefan, I haven't tried reverting to before your commit, it's just that before each pulsesink had its own pa_threaded_mainloop, so the situation where one thread locked the shared pa_threaded_mainloop while another was in pa_threaded_mainloop_wait just couldn't happen.

Anyway, I just thought you might have a better understanding about what exactly should be protected by each lock, if you wanted to comment.

Comment 21 Philip Jägenstedt 2011-02-27 22:49:25 UTC

To clarify, the deadlock can only be reproduced when you have at least two pulsesink elements. The reason that I can easily reproduce it is because I have a HTML test case with two <video> elements using the same slow-loading data source. This creates two GStreamer pipelines which are fed with the same data at (almost) the same time, so the two pulsesink elements will be created at (almost) the same time.

Comment 22 Stefan Sauer (gstreamer, gtkdoc dev) 2011-02-28 08:35:30 UTC

(In reply to comment #21)
> To clarify, the deadlock can only be reproduced when you have at least two
> pulsesink elements. The reason that I can easily reproduce it is because I have
> a HTML test case with two <video> elements using the same slow-loading data
> source. This creates two GStreamer pipelines which are fed with the same data
> at (almost) the same time, so the two pulsesink elements will be created at
> (almost) the same time.

Yes, I understand that. But the pa_threaded_mainloop was in the *class* structure before. That means all instances where already sharing the same instance.

The chain of chainged that lead to the current setup started with the patch
http://cgit.freedesktop.org/gstreamer/gst-plugins-good/commit/?id=69a397c32f4baf07a7b2937c610f9e8f383e9ae9
in bug #624338

I was fixing that with
http://cgit.freedesktop.org/gstreamer/gst-plugins-good/commit/?id=f62dc6976b611384c98efb37d407b5299daf8c17
some discussion in bug #628996

Comment 23 Philip Jägenstedt 2011-02-28 09:33:05 UTC

(In reply to comment #22)
> (In reply to comment #21)
> > To clarify, the deadlock can only be reproduced when you have at least two
> > pulsesink elements. The reason that I can easily reproduce it is because I have
> > a HTML test case with two <video> elements using the same slow-loading data
> > source. This creates two GStreamer pipelines which are fed with the same data
> > at (almost) the same time, so the two pulsesink elements will be created at
> > (almost) the same time.
> 
> Yes, I understand that. But the pa_threaded_mainloop was in the *class*
> structure before. That means all instances where already sharing the same
> instance.

Oh, I wasn't paying attention, I just saw psink->mainloop and assumed that there was one mainloop per instance before.

> The chain of chainged that lead to the current setup started with the patch
> http://cgit.freedesktop.org/gstreamer/gst-plugins-good/commit/?id=69a397c32f4baf07a7b2937c610f9e8f383e9ae9
> in bug #624338
> 
> I was fixing that with
> http://cgit.freedesktop.org/gstreamer/gst-plugins-good/commit/?id=f62dc6976b611384c98efb37d407b5299daf8c17
> some discussion in bug #628996

OK, so it was probably introduced here, since release 0.10.26 then.

Anyway, the problem seem mostly clear, I'm hoping for a code review of my "candidate fix" patch.

Comment 24 Arun Raghavan 2011-02-28 09:58:21 UTC

Comment on attachment 181825 [details] [review]
pulsesink: Avoid deadlock in open_device()

(patch is incorrect, as pointed out)

Comment 25 Philip Jägenstedt 2011-02-28 10:00:15 UTC

Comment on attachment 181813 [details] [review]
naive patch

Obsoleting "naive patch", I don't really think we should solve the problem like this.

Comment 26 Arun Raghavan 2011-02-28 11:32:30 UTC

After looking over the code again, the "candidate fix" looks good. Pushed both your patches - thanks.