Bug 759535 – gst_clock_id_unschedule isnt thread safe

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 759535 - gst_clock_id_unschedule isnt thread safe


Summary:	gst_clock_id_unschedule isnt thread safe


Status:	RESOLVED WONTFIX

Product:	GStreamer
Classification:	Platform
Component:	gstreamer (core)
Version:	1.4.5
Hardware:	Other Windows

Importance:	Normal major
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-12-16 10:25 UTC by James Stevenson
Modified:	2016-04-12 06:58 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Test case for gst_clock_id_unschedule (763 bytes, text/plain) 2015-12-16 10:25 UTC, James Stevenson	Details

Description James Stevenson 2015-12-16 10:25:38 UTC

Created attachment 317489 [details]
Test case for gst_clock_id_unschedule

When using gst_clock_new_periodic_id and gst_clock_id_unschedule the documentation claims they are thread safe but it seems to be impossible to disable a timer without the risk of a race which can cause the application to core. I trace this down to the fact that the timer can be active after gst_clock_id_unschedule has returned. Which would make the api non thread safe to use.

I have attached a test case for this which will cause the valgrind invalid memory reads below

==11520== Invalid read of size 8
==11520==    at 0x57016DF: __GI_mempcpy (memcpy.S:107)
==11520==    by 0x56EE53C: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1312)
==11520==    by 0x56C0AB4: vfprintf (vfprintf.c:1642)
==11520==    by 0x56C9588: printf (printf.c:33)
==11520==    by 0x400B31: callback (in /home/james/tmp/a.out)
==11520==    by 0x4EC5421: gst_system_clock_async_thread (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.403.0)
==11520==    by 0x51B8924: g_thread_proxy (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.4200.1)
==11520==    by 0x545F0A4: start_thread (pthread_create.c:309)
==11520==    by 0x576ECFC: clone (clone.S:111)
==11520==  Address 0x6e574c3 is 3 bytes inside a block of size 12 free'd
==11520==    at 0x4C2BE10: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11520==    by 0x400BF8: main (in /home/james/tmp/a.out)
==11520==

the test program can be compiled with  gcc -Wall gstclock.c `pkg-config --cflags --libs gstreamer-1.0` on ubuntu

Comment 1 Sebastian Dröge (slomo) 2015-12-16 10:33:55 UTC

That's correct and probably not possible to fix. What can happen is e.g. that gst_system_clock_async_thread() is just calling your callback, then unschedule() is called and there is no way to stop your callback from being called anymore.

What is threadsafe about this API however is that you can call all functions from any thread at any time. It's not guaranteed though that no thread is inside your callback somewhere after unschedule() is called.

Comment 2 Sebastian Dröge (slomo) 2015-12-16 10:36:15 UTC

Do you want to provide a documentation patch to make this clear? It's the same for pad probes and other similar callbacks btw.

Comment 3 Tim-Philipp Müller 2016-04-10 17:12:34 UTC

What shall we do about this?

Comment 4 Sebastian Dröge (slomo) 2016-04-11 06:20:21 UTC

I would consider this WONTFIX. It's the same problem with all functions that allow scheduling a callback to be called at some point (e.g. g_idle_add() has the same). It's how these things are working everywhere, and if you need more guarantees you have to do that from inside the callback.

Comment 5 James Stevenson 2016-04-11 10:11:12 UTC

I don't mind either way. I ended up making my own equivalent function to do the same to ensure the function had finished executing after the remove was called. This eneded up with a glib thread, mutex, condition wait to replace the timer usage.

So give you an example of the use case which causes the api to break for us. We have a gstreamer element which acts as a dataflow timeout in a pipeline. It does this by taking a timestamp of when the last time the chain function was run.

During the state transition from paused -> playing we start the timer. The timer then checks when the last buffer was seen and sends an event on the bus if it exceeds a certain value. Then during the state transition from playing -> paused we stop the timer.

However when the system is under load. It is possible to do the state transition playing -> pause -> null and free the element before the callback has had a chance to finish running which will in turn crash when it accesses the already free'ed memory.

Based on how the api call is made. From what I can tell its practically impossible to stop the bug like the above happening when the api is used with a callback like above. So it probably points to other yet unknown bugs in gstreamer

Comment 6 Sebastian Dröge (slomo) 2016-04-12 06:58:49 UTC

That's why the GDestroyNotify parameter of gst_clock_id_wait_async() exists. You somehow need to ensure that the user_data is going to be valid until the callback was called or the clock id is destroyed.