GNOME Bugzilla – Bug 758285
glib: Don't store CF run loop to avoid racy cleanup
Last modified: 2015-12-01 13:07:11 UTC
On OSX, the pipeline/simple_launch_lines unit test (i.e. test_2_elements) crashes. This seems to be caused by subsequent calls to gst_bus_poll (the test checks for message types), each causing a GMainContext to be acquired/released on the main thread, which exposes a race condition in the Cocoa-specific handling. The race is between the Cocoa-specific portion of g_main_context_release and the internal cocoa_select thread: (1) A GMainContext is created, the CF run loop is acquired and the cocoa select thread is polling some descriptors (2) The instance from (1) is released, the CF runloop variable is set to NULL. (3) The select thread is done with polling from (1), and tries to wake up the CF runloop via the static variable that is now set to NULL. (3) will crash with CFRunLoopWakeUp being called on the NULL pointer of the static runloop variable. After reading through the related code, I think there are three ways to fix this: A: shut down the cocoa_select thread when releasing the context B: Check CF run loop var for NULL in a thread-safe manner before calling CFRunLoopWakeUp C: Avoid storing the CoreFoundation main run loop in a static variable ad A: not sure about repercussion when using recursive main contexts. The teardown and restart of the select thread might be a bit more involved to do this correctly, and introduce further race conditions ad B: Guarding CFRunLookWakeUp in the cocoa_select thread with if (!g_atomic_pointer_get (&cocoa_main_context)) CFRunLoopWakeUp (cocoa_main_thread_run_loop); ... so basically, when we re-set the cocoa_main_context to NULL, we also wouldn't wake up the run loop anymore. ad C: Instead of storing the CF main thread run loop, we can just use CFRunLoopGetMain where needed. In the above race condition, the run loop would then be woken up to do nothing, which is fine and not racy anymore. There is always a single main run loop per process anyway, so there is no need to store this. Implementation of CFRunLoopGetMain is seen here: https://github.com/opensource-apple/CF/blob/master/CFRunLoop.c#L1482 A plus is that this call checks for the process being forked, in which case all of our other code would probably not work, anyway. For [C], I have attached a patch (for the glib patch in cerbero). I like this solution most because it removes a static global variable, which is always prone to race conditions, and because it requires only minimal changes. Besides pipeline/simple_launch_lines, other tests failed with the same crash as described above: - gst/gstbin test_stream_start - elements/multiqeue test_simple_shutdown_while_running After applying the submitted patch those tests and simple_launch_lines will pass again (on OSX 10.11).
Created attachment 315827 [details] [review] glib: Don't store CF run loop to avoid racy cleanup After polling for file descriptors, the Cocoa select thread did wake up the main run loop instance that has been stored in a static variable. This variable might have already been set to NULL by g_main_context_release, resulting in a segfault of CFRunLoopWakeUp. This race is fixed by this commit by simply not storing the main run loop in a variable, but instead calling CFRunLoopGetMain locally where needed. Within a single process, the main run loop is always the same, and always accessible. It is therefore not necessary anyway to remember the run loop instance when acquiring the context.
Attachment 315827 [details] pushed as e75ddb0 - glib: Don't store CF run loop to avoid racy cleanup