GNOME Bugzilla – Bug 547835
tee release_request_pad while buffer_alloc racyness
Last modified: 2008-08-15 18:26:47 UTC
Releasing a request pad on a tee while a buffer_alloc is in progress on that same pad is racy. This goes wrong when gst_tee_find_buffer_alloc is iterating over the pads, picks a pad that's about to be released in the application's thread, unlocks tee's object lock to call gst_pad_alloc_buffer() on the pad, and while this is in progress (gst_pad_buffer_alloc_unchecked()), the application releases the pad. This unlinks and unparents the pad, and when gst_pad_buffer_alloc_unchecked() returns back to gst_pad_alloc_buffer_full() it detects that there's been a caps change because the caps have been cleared as a consequence of what the release request pad function did, so it tries to do gst_pad_configure_src() which fails with a g_warning() in gst_pad_get_caps_unlocked() because the pad no longer has any parent (obviously).
Created attachment 116611 [details] [review] tee_release_request_pad_while_buffer_alloc_testcase The attached patch adds a test-case to tests/check/tee.c that reproduces the race reliably by staging a disaster by orchestrating the interaction between the pretend streaming thread and pretend application thread. It reproduces exactly the same crash that we've experienced in application code.
s/exactly the same crash/exactly the same race/ We're always running the application with G_DEBUG containing "fatal_criticals". :)
s/which fails with a g_warning()/which fails with a g_critical()/ Think I need to get some sleep :)
Created attachment 116616 [details] [review] tee_release_request_pad_while_buffer_alloc_testcase_r2 Simplified and improved version of the original test. - Avoid deadlocking if the locking semantics change in the future, so give the app thread up to 3 seconds to finish the release_request_pad() before returning from buffer_alloc. - Reduced the number of global variables and spawn the 'app' thread from buffer_alloc to simplify things (d'oh!). - Try to be a nicer gstcheck citizen.
Created attachment 116647 [details] [review] tee_release_request_pad_while_buffer_alloc_testcase_r3 Refactored version of the test that doesn't add any global variables, and also has a convenient setup/teardown of the miniature tee harness. Preparing to write more tests. A related bug has also been found, for which a test will be written. Fixes for tee coming.
Created attachment 116659 [details] [review] tee_release_request_pad_while_buffer_alloc_testcase_r4 Added another test-case and refactored things a bit.
Created attachment 116661 [details] [review] tee_release_request_pad_while_buffer_alloc_fixes First take on a fix for both race-conditions.
Verified that this fixes the problem in our application.
It would be nice if core could make sure that pad_allocs are finished, probably with another mutex. That does not seem very possible right now so I guess we'll have to implement this in the plugins.
I fixed it a little with an extra variable, there was still a race where the pad would be removed and shut down first before the pad_alloc was called. Patch by: Ole André Vadla Ravnås <ole.andre.ravnas at tandberg com> * plugins/elements/gsttee.c: (gst_tee_finalize), (gst_tee_init), (gst_tee_request_new_pad), (gst_tee_release_pad), (gst_tee_find_buffer_alloc), (gst_tee_buffer_alloc): * plugins/elements/gsttee.h: Protect pad_alloc with a new lock so that we can be sure that nothing is performing a pad_alloc when removing the pad. Fixes #547835. * tests/check/elements/tee.c: (buffer_alloc_harness_setup), (buffer_alloc_harness_teardown), (app_thread_func), (final_sinkpad_bufferalloc), (GST_START_TEST), (tee_suite): Added testcase for shutdown race.
Ahh, good catch! Thanks! I'll see if maybe some more unit tests can be written to nail down any remaining races. Haven't yet investigated if pushing a buffer downstream is safe or if there's any race there.