GNOME Bugzilla – Bug 635784
ringbuffer: make sure to not start if the may_start flag is FALSE
Last modified: 2011-04-18 09:41:09 UTC
Created attachment 175244 [details] [review] patch This would potentially deadlock when changing the element-state up and down.
Looks good to me... Wim?
I think this papers over another problem and might not fix it fundamentally. Can you provide more info about what sequence of events lead to a deadlock or a test case?
Håvard: ping?
Pong! I have been quite busy lately, but will try to find time to provide a better overview of the problem :)
Removing blocker status.
Håvard: Ping
Ping noted. I am reverting the patch and reproducing locally as to get some callstacks :) BRB
Here are the deadlocking callstacks: Callstack 1: libgthread-2.0-0.dll!g_mutex_lock_errorcheck_impl(_GMutex * mutex=0x012933b0, const unsigned long magic=4175530711, char * const location=0x0054b8f8) Line 112 + 0xc bytes C > libgstbase-0.10-0.dll!gst_base_src_set_playing(_GstBaseSrc * basesrc=0x0125d178, int live_play=0) Line 2911 + 0x2a bytes C libgstbase-0.10-0.dll!gst_base_src_change_state(_GstElement * element=0x0125d178, GstStateChange transition=GST_STATE_CHANGE_PLAYING_TO_PAUSED) Line 3115 + 0xb bytes C libgstaudio-0.10-0.dll!gst_base_audio_src_change_state(_GstElement * element=0x0125d178, GstStateChange transition=GST_STATE_CHANGE_PLAYING_TO_PAUSED) Line 1149 + 0x27 bytes C libtaaaudio.dll!gst_duplex_audio_src_change_state(_GstElement * element=0x0125d178, GstStateChange transition=GST_STATE_CHANGE_PLAYING_TO_PAUSED) Line 448 + 0x27 bytes C libgstreamer-0.10-0.dll!gst_element_change_state(_GstElement * element=0x0125d178, GstStateChange transition=GST_STATE_CHANGE_PLAYING_TO_PAUSED) Line 2717 + 0x15 bytes C libgstreamer-0.10-0.dll!gst_element_set_state_func(_GstElement * element=0x0125d178, GstState state=GST_STATE_PAUSED) Line 2673 + 0xd bytes C libgstreamer-0.10-0.dll!gst_element_set_state(_GstElement * element=0x0125d178, GstState state=GST_STATE_PAUSED) Line 2574 + 0x15 bytes C gst-plugins-taa-test.exe!test_intense_paused_playing_on_src() Line 617 + 0xb bytes C Callstack 2: libgthread-2.0-0.dll!g_cond_wait_errorcheck_impl(_GCond * cond=0x012ae650, _GMutex * mutex=0x012ae600, const unsigned long magic=4175530711, char * const location=0x00a9eff0) Line 206 + 0x10 bytes C libgstaudio-0.10-0.dll!wait_segment(_GstRingBuffer * buf=0x01215bd8) Line 1478 + 0x2f bytes C libgstaudio-0.10-0.dll!gst_ring_buffer_read(_GstRingBuffer * buf=0x01215bd8, unsigned __int64 sample=0, unsigned char * data=0x0120e380, unsigned int len=480) Line 1893 + 0x9 bytes C libgstaudio-0.10-0.dll!gst_base_audio_src_create(_GstBaseSrc * bsrc=0x0125d178, unsigned __int64 offset=18446744073709551615, unsigned int length=1920, _GstBuffer * * outbuf=0x0167fa80) Line 811 + 0x19 bytes C libgstbase-0.10-0.dll!gst_base_src_get_range(_GstBaseSrc * src=0x0125d178, unsigned __int64 offset=18446744073709551615, unsigned int length=0, _GstBuffer * * buf=0x0167fa80) Line 2153 + 0x21 bytes C libgstbase-0.10-0.dll!gst_base_src_loop(_GstPad * pad=0x01293cf0) Line 2410 + 0x19 bytes C libgstreamer-0.10-0.dll!gst_task_func(_GstTask * task=0x012c6e58) Line 318 + 0x11 bytes C libgstreamer-0.10-0.dll!default_func(TaskData * tdata=0x011f6368, _GstTaskPool * pool=0x011ede78) Line 70 + 0x9 bytes C libglib-2.0-0.dll!g_thread_pool_thread_proxy(void * data=0x011eddd8) Line 325 + 0x14 bytes C libglib-2.0-0.dll!g_thread_create_proxy(void * data=0x0120dae0) Line 1955 + 0x10 bytes C libgthread-2.0-0.dll!g_thread_proxy(void * data=0x012c6f68) Line 509 + 0x10 bytes C As you can see, Cs1 is waiting for the LIVE_LOCK, that is held by Cs2 in gst_base_src_loop. Cs2 on the other hand is waiting for data to be written to the ringbuffer. The code-snippets of interest is the state-change PLAYING_TO_PAUSED in gstbaseaudiosrc.c: case GST_STATE_CHANGE_PLAYING_TO_PAUSED: GST_DEBUG_OBJECT (src, "PLAYING->PAUSED"); gst_ring_buffer_may_start (src->ringbuffer, FALSE); gst_ring_buffer_pause (src->ringbuffer); break; and the beginning of wait_segment inside gstringbuffer.c: if (G_UNLIKELY (g_atomic_int_get (&buf->abidata.ABI.may_start) == FALSE)) goto no_start; GST_DEBUG_OBJECT (buf, "start!"); segments = g_atomic_int_get (&buf->segdone); gst_ring_buffer_start (buf); Now, if both _may_start and _pause is called *in between* the ABI.may_start check and gst_ring_buffer_start, what will happen? The buffer will basically change state from PAUSED to PLAYING, and will go on to wait forever in the wait, since all mechanisms that should prevent that have failed: 1. Setting state-flags have been reverted by gst_ringbuffer_start () 2. The may_start check came to early (may_start = FALSE happened just after the check) 3. The signaling of the waiter has also happened to early (not waiting yet) The way to combat this is to add a check inside _start to see if the buffer really is allowed to start, and this is safe because both _pause and _start take the buf-lock, so it will be serialized nicely. ... and that is exactly what this patch does! :) One could also argue that this patch makes the check: if (G_UNLIKELY (g_atomic_int_get (&buf->abidata.ABI.may_start) == FALSE)) goto no_start; ...superfluous, since it will be checked for properly, serialized with _pause, inside _start, but I have not analyzed whether this would have any other side-effects.
Clear deadlock in BaseAudioSrc with easy fix. Sounds like blocker :)
But it's not necessarily the case. Thanks for your interested in making GStreamer better, but adjusting priority or severity fields doesn't help it. Your patch will be seen soon and this issue hopefully resolved.
I've to say that your explanation makes sense and the patch too. Let's get this into 0.10.33 :) Tim?
If you want a test-case, all I do is to instantiate a src that inherits from BaseAudioSrc, and then do something like this: for (i = 0; i < 10000000; i++) { gst_element_set_state (src, GST_STATE_PLAYING); g_usleep (G_USEC_PER_SEC / 10000); gst_element_set_state (src, GST_STATE_PAUSED); g_usleep (G_USEC_PER_SEC / 10000); g_print("\r%d", i); } If the counting stops, I´ve got the deadlock. Without the patch it stops within a minute. With, I have ran it for 5 hours and no problem.
Yes, I understand now. It also then looks like the other may_start check is not needed anymore.
commit d9f1b3736e0e6927e513e95ba934236101ca0a82 Author: Håvard Graff <havard.graff@.eu.tandberg.int> Date: Thu Nov 25 17:01:53 2010 +0100 ringbuffer: make sure to not start if the may_start flag is FALSE Fixes #635784