After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 547835 - tee release_request_pad while buffer_alloc racyness
tee release_request_pad while buffer_alloc racyness
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gstreamer (core)
git master
Other All
: Normal normal
: 0.10.21
Assigned To: Ole André Vadla Ravnås
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2008-08-14 20:30 UTC by Ole André Vadla Ravnås
Modified: 2008-08-15 18:26 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
tee_release_request_pad_while_buffer_alloc_testcase (4.71 KB, patch)
2008-08-14 20:37 UTC, Ole André Vadla Ravnås
none Details | Review
tee_release_request_pad_while_buffer_alloc_testcase_r2 (4.38 KB, patch)
2008-08-14 22:52 UTC, Ole André Vadla Ravnås
none Details | Review
tee_release_request_pad_while_buffer_alloc_testcase_r3 (4.89 KB, patch)
2008-08-15 09:59 UTC, Ole André Vadla Ravnås
none Details | Review
tee_release_request_pad_while_buffer_alloc_testcase_r4 (6.10 KB, patch)
2008-08-15 12:03 UTC, Ole André Vadla Ravnås
committed Details | Review
tee_release_request_pad_while_buffer_alloc_fixes (2.08 KB, patch)
2008-08-15 12:05 UTC, Ole André Vadla Ravnås
committed Details | Review

Description Ole André Vadla Ravnås 2008-08-14 20:30:45 UTC
Releasing a request pad on a tee while a buffer_alloc is in progress on that same pad is racy.

This goes wrong when gst_tee_find_buffer_alloc is iterating over the pads, picks a pad that's about to be released in the application's thread, unlocks tee's object lock to call gst_pad_alloc_buffer() on the pad, and while this is in progress (gst_pad_buffer_alloc_unchecked()), the application releases the pad. This unlinks and unparents the pad, and when gst_pad_buffer_alloc_unchecked() returns back to gst_pad_alloc_buffer_full() it detects that there's been a caps change because the caps have been cleared as a consequence of what the release request pad function did, so it tries to do gst_pad_configure_src() which fails with a g_warning() in gst_pad_get_caps_unlocked() because the pad no longer has any parent (obviously).
Comment 1 Ole André Vadla Ravnås 2008-08-14 20:37:02 UTC
Created attachment 116611 [details] [review]
tee_release_request_pad_while_buffer_alloc_testcase

The attached patch adds a test-case to tests/check/tee.c that reproduces the race reliably by staging a disaster by orchestrating the interaction between the pretend streaming thread and pretend application thread. It reproduces exactly the same crash that we've experienced in application code.
Comment 2 Ole André Vadla Ravnås 2008-08-14 20:44:18 UTC
s/exactly the same crash/exactly the same race/  We're always running the application with G_DEBUG containing "fatal_criticals". :)
Comment 3 Ole André Vadla Ravnås 2008-08-14 20:46:43 UTC
s/which fails with a g_warning()/which fails with a g_critical()/

Think I need to get some sleep :)
Comment 4 Ole André Vadla Ravnås 2008-08-14 22:52:24 UTC
Created attachment 116616 [details] [review]
tee_release_request_pad_while_buffer_alloc_testcase_r2

Simplified and improved version of the original test.
- Avoid deadlocking if the locking semantics change in the future, so give the app thread up to 3 seconds to finish the release_request_pad() before returning from buffer_alloc.
- Reduced the number of global variables and spawn the 'app' thread from buffer_alloc to simplify things (d'oh!).
- Try to be a nicer gstcheck citizen.
Comment 5 Ole André Vadla Ravnås 2008-08-15 09:59:49 UTC
Created attachment 116647 [details] [review]
tee_release_request_pad_while_buffer_alloc_testcase_r3

Refactored version of the test that doesn't add any global variables, and also has a convenient setup/teardown of the miniature tee harness. Preparing to write more tests.

A related bug has also been found, for which a test will be written. Fixes for tee coming.
Comment 6 Ole André Vadla Ravnås 2008-08-15 12:03:05 UTC
Created attachment 116659 [details] [review]
tee_release_request_pad_while_buffer_alloc_testcase_r4

Added another test-case and refactored things a bit.
Comment 7 Ole André Vadla Ravnås 2008-08-15 12:05:57 UTC
Created attachment 116661 [details] [review]
tee_release_request_pad_while_buffer_alloc_fixes

First take on a fix for both race-conditions.
Comment 8 Ole André Vadla Ravnås 2008-08-15 12:20:52 UTC
Verified that this fixes the problem in our application.
Comment 9 Wim Taymans 2008-08-15 15:39:00 UTC
It would be nice if core could make sure that pad_allocs are finished, probably with another mutex. That does not seem very possible right now so I guess we'll have to implement this in the plugins.
Comment 10 Wim Taymans 2008-08-15 17:02:08 UTC
I fixed it a little with an extra variable, there was still a race where the pad would be removed and shut down first before the pad_alloc was called. 

        Patch by: Ole André Vadla Ravnås  <ole.andre.ravnas at tandberg com>

        * plugins/elements/gsttee.c: (gst_tee_finalize), (gst_tee_init),
        (gst_tee_request_new_pad), (gst_tee_release_pad),
        (gst_tee_find_buffer_alloc), (gst_tee_buffer_alloc):
        * plugins/elements/gsttee.h:
        Protect pad_alloc with a new lock so that we can be sure that nothing is
        performing a pad_alloc when removing the pad. Fixes #547835.

        * tests/check/elements/tee.c: (buffer_alloc_harness_setup),
        (buffer_alloc_harness_teardown), (app_thread_func),
        (final_sinkpad_bufferalloc), (GST_START_TEST), (tee_suite):
        Added testcase for shutdown race.
Comment 11 Ole André Vadla Ravnås 2008-08-15 18:26:47 UTC
Ahh, good catch! Thanks!  I'll see if maybe some more unit tests can be written to nail down any remaining races. Haven't yet investigated if pushing a buffer downstream is safe or if there's any race there.