GNOME Bugzilla – Bug 646566
Protect against Pad-Parent disappearing
Last modified: 2011-05-09 09:57:28 UTC
We have done a lot of work on stress-testing element for different scenarios where the pad-parent might disappear. Upstream Events / Querys and pad-allocs arriving while the element goes down. Instead of making lots of bugs for this, I thought I could post all the patches here instead. Pretty straight-forward self-explanatory stuff.
Created attachment 184974 [details] [review] patch
Created attachment 184975 [details] [review] queue and queue2
Created attachment 184976 [details] [review] upstream events
Created attachment 184977 [details] [review] querys
Created attachment 184978 [details] [review] vorbisdec
Created attachment 184979 [details] [review] more query
Created attachment 184980 [details] [review] rtp-callbacks
Created attachment 184981 [details] [review] more rtp pad-callbacks
Created attachment 184982 [details] [review] more...
Created attachment 184983 [details] [review] more upstream events
Do we really want to go this way ? I still think GstPad should hold a ref on it's parent any time it calls one of those functions.
Isn't there another bug about this already, with patches by you Olivier?
There is bug #631580, it has a test case, but no patches. I could never make a patch that would work in all case. I think one solution for 0.11 is to have the pad functions ref the parent and pass it to the callback (so getcaps() for example could be blah_getcaps(Pad *pad, GstObject *parent); Also, we currently don't require pads to have a parent, it may be much easier if we did.
so how about committing these patches now, since they all help make GStreamer more stable, and then revisit the whole thing for .11?
I don't particularly think the checking is weird, just something you need to be aware of and do. Another option is to have a new pad flag (PAD_REF_PARENT) and when this is set, the core automatically keeps the pad alive while doing a callback. The flag could automatically be set when you do gst_element_add_pad(). If you want to do something for 0.11 (pass the pad as an argument) you still need to be able to know when the parent is really gone (after a dispose) or when the pad simply has no parent. I would also like to avoid adding the requirement that pads always need a parent to do dataprocessing.
I guess we could have pads that require a parent and pads that dont... with a flag or something. The current system has clearly failed (almost none of the elements in the tree do it correctly). That said, I'm not sure there is anywhere outside of the unit tests where there are pads that really have no parent.
(In reply to comment #16) > That said, I'm not sure there is anywhere outside of the unit tests where there > are pads that really have no parent. Au contraire! This has been a common cause of crashes for us. We take out an element, and a pad-alloc, query or event arrives just at that time. You can never completely protect yourself against this unless you take down the whole pipeline at the same time. Crash-dumps from customers often have one variant of this, with the error-msg G_IS_OBJECT () as the most common one.
(In reply to comment #17) I meant, I don't know any case outside of the unit tests where the callback from the pads should be called without holding the parent (or without having a parent set). My proposal was to never call the functions from the pads unless the parent is set and "hold" the parent until the callback returns. I'm also getting convinced that we should have something at the GstObject level, so we can "lock" the gstobject parent as returned by gst_object_get_parent(), so that it always succeeds during the callbacks (so we don't have to do those checks). Or maybe just not allow changing the parent, only setting it once and then only allow to unset it once?
(In reply to comment #12) > Isn't there another bug about this already, with patches by you Olivier? Bug 638844, that Olivier marked as a duplicate of his, has a few patches with the same theme, yes.
Not really in favour of this approach at all, there must be a better way to do this that doesn't involve adding such checks to every single element. Also, they're not really correct either, e.g. queue = GST_QUEUE (gst_pad_get_parent (pad)); might still throw a warning and possibly abort if get_parent() returns NULL, which is arguably better than crashing, but still broken.
I've committed all these patches now until we find a better solution. This way the races are at least fixed in the next release and we can find a perfect solution without any pressure and possibly using a suboptimal solution. IMHO the best solution for this would be to a) add a GstObject *parent parameter to all pad callbacks and b) add a flag to GstPad that you don't want to have your callback called if there's no parent (and by default that flag is disabled) For this we have to wait until 0.11 though because it's an API change. b) might be possible to do in 0.10 by taking a reference of the parent before calling the callbacks
Since all these patches are committed, lets follow this at bug #631580
(In reply to comment #20) > Not really in favour of this approach at all, there must be a better way to do > this that doesn't involve adding such checks to every single element. > > Also, they're not really correct either, e.g. > > queue = GST_QUEUE (gst_pad_get_parent (pad)); > > might still throw a warning and possibly abort if get_parent() returns NULL, > which is arguably better than crashing, but still broken. I thought queue = GST_QUEUE (NULL); was legal? Anyway, I made these stress-tests you might want? Posting them inline here, since this bug is now closed... :) static gboolean keep_going; static gpointer query_func (gpointer data) { GstElement * element = GST_ELEMENT (data); GstPad * src_pad = gst_element_get_static_pad (element, "src"); GstQuery * query = gst_query_new_latency (); while (keep_going) gst_pad_query (src_pad, query); gst_query_unref (query); gst_object_unref (src_pad); return NULL; } static gpointer event_func (gpointer data) { GstElement * element = GST_ELEMENT (data); GstPad * src_pad = gst_element_get_static_pad (element, "src"); GstStructure * event_structure = gst_structure_new ("test-event", NULL); GstEvent * event = gst_event_new_custom (GST_EVENT_CUSTOM_UPSTREAM, event_structure); GstPadEventFunction eventfunc = GST_PAD_EVENTFUNC (src_pad); g_assert (eventfunc); while (keep_going) eventfunc (src_pad, gst_event_ref (event)); gst_event_unref (event); gst_object_unref (src_pad); return NULL; } static gpointer state_change_and_unparent_func (gpointer data) { GstElement * element = GST_ELEMENT (data); GstPad * src_pad = gst_element_get_static_pad (element, "src"); while (keep_going) { gst_element_set_state (element, GST_STATE_PLAYING); g_usleep (G_USEC_PER_SEC / 10000); gst_element_set_state (element, GST_STATE_NULL); g_usleep (G_USEC_PER_SEC / 10000); gst_object_unparent (GST_OBJECT (src_pad)); g_usleep (G_USEC_PER_SEC / 10000); gst_object_set_parent (GST_OBJECT (src_pad), GST_OBJECT (element)); g_usleep (G_USEC_PER_SEC / 10000); } gst_object_unref (src_pad); return NULL; } static void test_element_against_racy_upstream_querys (GstElement * element) { GThread * query_thread; GThread * state_change_and_unparent_thread; keep_going = TRUE; query_thread = g_thread_create_full (query_func, element, 0, TRUE, TRUE, G_THREAD_PRIORITY_NORMAL, NULL); state_change_and_unparent_thread = g_thread_create_full (state_change_and_unparent_func, element, 0, TRUE, TRUE, G_THREAD_PRIORITY_NORMAL, NULL); g_usleep (G_USEC_PER_SEC / 4); keep_going = FALSE; g_thread_join (query_thread); g_thread_join (state_change_and_unparent_thread); } static void test_element_against_racy_upstream_events (GstElement * element) { GThread * event_thread; GThread * state_change_and_unparent_thread; keep_going = TRUE; event_thread = g_thread_create_full (event_func, element, 0, TRUE, TRUE, G_THREAD_PRIORITY_NORMAL, NULL); state_change_and_unparent_thread = g_thread_create_full (state_change_and_unparent_func, element, 0, TRUE, TRUE, G_THREAD_PRIORITY_NORMAL, NULL); g_usleep (G_USEC_PER_SEC / 4); keep_going = FALSE; g_thread_join (event_thread); g_thread_join (state_change_and_unparent_thread); }
> I thought queue = GST_QUEUE (NULL); was legal? You are right, my apologies. I could've sworn this was not allowed, because I remember warnings about invalid casts of pointer 'NULL" to type xyz. But according to the code in gtype.[ch] this is just fine. Wonder if it changed at some point. In any case, sorry for the noise.
(In reply to comment #21) > I've committed all these patches now until we find a better solution. This way > the races are at least fixed in the next release and we can find a perfect > solution without any pressure and possibly using a suboptimal solution. > > IMHO the best solution for this would be to > a) add a GstObject *parent parameter to all pad callbacks and > b) add a flag to GstPad that you don't want to have your callback called if > there's no parent (and by default that flag is disabled) > > For this we have to wait until 0.11 though because it's an API change. > > b) might be possible to do in 0.10 by taking a reference of the parent before > calling the callbacks Wanna file a 0.11 bug so that we remember?
There's bug #631580 already