GNOME Bugzilla – Bug 747610
Inconsistent bin children state when a child fails to switch from NULL to READY
Last modified: 2015-04-15 15:03:31 UTC
I'm putting this in -bad for now as it's triggered by a -bad test case, but it may be an issue in core if it turns out it's a bug in GstBin. In -bad, the dtlssrtpdec element is a bin, which has, among others, a funnel and a dtlsdec. In the states unit test, the dtlsdec will fail to get from NULL to READY due to not being set up. The funnel has previously switched from NULL to READY. Due to the dtlsdec failure, the bin itself will fail its state switch, and keep in NULL. However, the funnel is in READY, and stays there. This will cause the funnel to complain it's being deleted while not in NULL. An attempt to switch the bin to NULL will not switch the funnel to NULL, as the bin is already in NULL. The bin code doesn't seem to try to switch back child elements that sucesfully switched when a subsequent child fails in the same batch. This seems to be the cause of the problem, but doing this might also bring issues (like what happens if a failure happens in such a switch back).
A downwards state change to NULL should never block or fail. I think the same applies if the target is READY. So I think it makes sense to undo state changes from NULL upwards.
Yes, that seems like something that has to be fixed in GstBin. (In reply to Olivier Crête from comment #1) > A downwards state change to NULL should never block or fail. I think the > same applies if the target is READY. "should", but it could. And things like closing a device can actually block for a while. But that seems unrelated to this issue, we should at least try to change to the previous state again
Created attachment 301611 [details] [review] undo upwards state change on children of a bin where a subsequent child fails
Review of attachment 301611 [details] [review]: ::: gst/gstbin.c @@ +2651,3 @@ + * up, we need to switch those elements back down to the previous state to be + * in sync with the bin */ + changed_elements = NULL; Why not just set the state to the previous one for *all* elements. No need to remember the ones that were successful IMHO, that's just making things more complicated everywhere for an error case.
Fair enough, though I think doing it for all elements would actually be complex since one would have to add an extra iterator loop/switch construct.
gst_iterator_foreach() :)
Created attachment 301618 [details] [review] undo upwards state change on children of a bin where a subsequent child fails Nice, done :)
Review of attachment 301618 [details] [review]: Looks generally good, but please also a unit test :) ::: gst/gstbin.c @@ +2545,3 @@ +{ + GstElement *e = g_value_get_object (data); + GstState state = (GstState) user_data; GPOINTER_TO_INT() Otherwise this will give compiler warnings on weird compilers/platforms :) @@ +2547,3 @@ + GstState state = (GstState) user_data; + + gst_element_set_state (e, state); Maybe check return value here and print something if things go wrong @@ +2830,3 @@ + gst_element_state_get_name (current)); + do { + ret = gst_iterator_foreach (it, &reset_state, (gpointer) current); GINT_TO_POINTER()
Created attachment 301622 [details] [review] undo upwards state change on children of a bin where a subsequent child fails
Review of attachment 301622 [details] [review]: Looks good, just a bit weird in the test :) ::: tests/check/generic/states.c @@ +232,3 @@ + + /* we want at least one before and one after */ + g_assert (sizeof (mid) / sizeof (mid[0]) >= 3); ??? (also G_N_ELEMENTS())
Created attachment 301624 [details] [review] undo upwards state change on children of a bin where a subsequent child fails If you meant "what is that statement for ?", then it is because it needs at least three children to test states for a child already changed, the one that fails, and one that was not already changed. I'm kinda assuming the bin doesn't mess up the ordering though, which I'm not sure about.
Review of attachment 301624 [details] [review]: ::: tests/check/generic/states.c @@ +232,3 @@ + + /* we want at least one before and one after */ + g_assert (G_N_ELEMENTS (mid) >= 3); But you just created it with 3 elements 5 lines above :) @@ +254,3 @@ + + /* make the middle element fail to switch up */ + GST_ELEMENT_GET_CLASS (mid[1])->change_state = fail_to_go_to_ready; Here you're changing change_state for all identity elements... not just your 3, but all that are ever created in the same process from this point onwards.
> But you just created it with 3 elements 5 lines above :) Yes. So it passes. Which is good, no ? Am I missing something ? > @@ +254,3 @@ > + > + /* make the middle element fail to switch up */ > + GST_ELEMENT_GET_CLASS (mid[1])->change_state = fail_to_go_to_ready; > > Here you're changing change_state for all identity elements... not just your > 3, but all that are ever created in the same process from this point onwards. Doh. I'd started with elements and did not catch when I remembered it needed to be on the class. I'll fix that after lunch.
(In reply to Vincent Penquerc'h from comment #13) > > But you just created it with 3 elements 5 lines above :) > > Yes. So it passes. Which is good, no ? Am I missing something ? Well, keep it then :) It seems confusing to me, I first thought that something else was supposed to check here instead of the obviously correct thing. > > @@ +254,3 @@ > > + > > + /* make the middle element fail to switch up */ > > + GST_ELEMENT_GET_CLASS (mid[1])->change_state = fail_to_go_to_ready; > > > > Here you're changing change_state for all identity elements... not just your > > 3, but all that are ever created in the same process from this point onwards. > > Doh. I'd started with elements and did not catch when I remembered it needed > to be on the class. I'll fix that after lunch. Enjoy lunch :) I think the best you can do here is to use a fakesink. With async=false it can do state changes directly, and it has a property (state-error) to have it fail specific state changes.
Created attachment 301640 [details] [review] undo upwards state change on children of a bin where a subsequent child fails
Review of attachment 301640 [details] [review]: ::: tests/check/generic/states.c @@ +259,3 @@ + /* make the middle element fail to switch up */ + identity_state_change = GST_ELEMENT_GET_CLASS (mid[1])->change_state; + GST_ELEMENT_GET_CLASS (mid[1])->change_state = fail_to_go_to_ready; No, use state-error=null-to-ready on the fakesink Otherwise if your test fails, the vfunc will be wrong for all future fakesinks.
Created attachment 301646 [details] [review] undo upwards state change on children of a bin where a subsequent child fails Not really since I was using only one and resetting its change state function afterwards, but here's a version with the fakesink property.
Oh - do you mean that a test failing causes a hidden return from the function, bypassing the cleanup, and then it continues the next test fnuction in the same program ? If it does this, then I now understand why you said that :)
Review of attachment 301646 [details] [review]: Yes, exactly because of that :) Did you check if your test fails without your fix? If so, please push
Yes, I had been testing both on each update :) Thanks. commit a3b42ec42a085f4767930f55d5bb0b4f4fc8321a Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Wed Apr 15 11:02:54 2015 +0100 bin: undo upward state changes on children when a child fails When a bin changes states upwards, and a child fails to change, any child that was already switched will not be reset to its original state, leaving its state inconsistent with the bin, which does not change state due to the failure. If the state change was from NULL to READY, it means that deleting this bin will cause those children to be deleted while not in NULL state, which is a Bad Thing. For other upward changes, it is less of a problem, as a subsequent switch back to NULL will cause an actual downwards change on those inconsistent elements, albeit from the "wrong" state. We now reset state to the original one when a child fails. Includes unit test. https://bugzilla.gnome.org/show_bug.cgi?id=747610