GNOME Bugzilla – Bug 759535
gst_clock_id_unschedule isnt thread safe
Last modified: 2016-04-12 06:58:49 UTC
Created attachment 317489 [details] Test case for gst_clock_id_unschedule When using gst_clock_new_periodic_id and gst_clock_id_unschedule the documentation claims they are thread safe but it seems to be impossible to disable a timer without the risk of a race which can cause the application to core. I trace this down to the fact that the timer can be active after gst_clock_id_unschedule has returned. Which would make the api non thread safe to use. I have attached a test case for this which will cause the valgrind invalid memory reads below ==11520== Invalid read of size 8 ==11520== at 0x57016DF: __GI_mempcpy (memcpy.S:107) ==11520== by 0x56EE53C: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1312) ==11520== by 0x56C0AB4: vfprintf (vfprintf.c:1642) ==11520== by 0x56C9588: printf (printf.c:33) ==11520== by 0x400B31: callback (in /home/james/tmp/a.out) ==11520== by 0x4EC5421: gst_system_clock_async_thread (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.403.0) ==11520== by 0x51B8924: g_thread_proxy (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.4200.1) ==11520== by 0x545F0A4: start_thread (pthread_create.c:309) ==11520== by 0x576ECFC: clone (clone.S:111) ==11520== Address 0x6e574c3 is 3 bytes inside a block of size 12 free'd ==11520== at 0x4C2BE10: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==11520== by 0x400BF8: main (in /home/james/tmp/a.out) ==11520== the test program can be compiled with gcc -Wall gstclock.c `pkg-config --cflags --libs gstreamer-1.0` on ubuntu
That's correct and probably not possible to fix. What can happen is e.g. that gst_system_clock_async_thread() is just calling your callback, then unschedule() is called and there is no way to stop your callback from being called anymore. What is threadsafe about this API however is that you can call all functions from any thread at any time. It's not guaranteed though that no thread is inside your callback somewhere after unschedule() is called.
Do you want to provide a documentation patch to make this clear? It's the same for pad probes and other similar callbacks btw.
What shall we do about this?
I would consider this WONTFIX. It's the same problem with all functions that allow scheduling a callback to be called at some point (e.g. g_idle_add() has the same). It's how these things are working everywhere, and if you need more guarantees you have to do that from inside the callback.
I don't mind either way. I ended up making my own equivalent function to do the same to ensure the function had finished executing after the remove was called. This eneded up with a glib thread, mutex, condition wait to replace the timer usage. So give you an example of the use case which causes the api to break for us. We have a gstreamer element which acts as a dataflow timeout in a pipeline. It does this by taking a timestamp of when the last time the chain function was run. During the state transition from paused -> playing we start the timer. The timer then checks when the last buffer was seen and sends an event on the bus if it exceeds a certain value. Then during the state transition from playing -> paused we stop the timer. However when the system is under load. It is possible to do the state transition playing -> pause -> null and free the element before the callback has had a chance to finish running which will in turn crash when it accesses the already free'ed memory. Based on how the api call is made. From what I can tell its practically impossible to stop the bug like the above happening when the api is used with a callback like above. So it probably points to other yet unknown bugs in gstreamer
That's why the GDestroyNotify parameter of gst_clock_id_wait_async() exists. You somehow need to ensure that the user_data is going to be valid until the callback was called or the clock id is destroyed.