GNOME Bugzilla – Bug 440463
gstbin unit test sometimes hangs waiting for ASYNC_DONE message
Last modified: 2007-05-24 14:45:02 UTC
Running make gst/gstbin.forever I regularly get hangs in the gst/gstbin unit test. This is not always the same test it seems. Sometimes it spins without using cpu, sometimes it hangs using 100% cpu. Here's a stack trace: (gdb) thread apply all bt
+ Trace 135529
Created attachment 88614 [details] bin:5,GST_STATES:5 debug log for one of the hangs (not exactly the same as the one above)
Created attachment 88626 [details] [review] possible patch The problem was because the get_state function recalculated the state and discovered that it was no longer async. Then the state_continue function tries to see if it needs to perform additional actions after the state change completed. It saw nothing was pending and exited while it should have posted an ASYNC_DONE message. This patch always makes the state_continue function post ASYNC_DONE. We can do this unconditionally because we know that the ASYNC_DONE was not posted yet when we started the function.
The patch didn't apply completely cleanly for me because of bilboed's changes this morning, but it applied well enough. It seems to fix the immediate issue of hanging. When running the gstbin check forever though, I get this after a few hundred iterations: gst/gstbin.c:699:F:bin tests:test_children_state_change_order_semi_sink:0: No state change message within 1 second (#209) That could be a bug in the test though, I've no idea. Wim, how confident are you that this doesn't introduce any other problems?
> I get this after a few hundred iterations: > > gst/gstbin.c:699:F:bin tests:test_children_state_change_order_semi_sink:0: No > state change message within 1 second (#209) > > That could be a bug in the test though, I've no idea. > > Wim, how confident are you that this doesn't introduce any other problems? Same here, but I thought it was because I was compiling stuff in another terminal :) I still got a hang in the unit test once with the patch applied, but haven't been able to reproduce it since.
there is one problem left: in certain situations no state-change message is posted, for the same reasons the ASYNC_DONE was not posted before.
Created attachment 88679 [details] [review] improved patch this is an improved patch and restores the state_lock around recalc_state that was wrongly removed during the rewrite. The problem with some messages not being posted is still here and I can't see how to make a quick fix.
ok, last patch indeed posts all messages but the ASYNC one might be out-of-order, which should not be considered a regression since it's a new message. The test just deals badly with it and ideally the ASYNC_DONE should be posted in the right order but that will require a slight rework.
Seems to work fine now, the only warning I now get occasionally is this: gst/gstbin.c:303:F:bin tests:test_message_state_changed_children:0: sink (0x8082148) refcount is 4 instead of 3
That message is because the state change recalculation thread hasn't finished yet, I believe. I'd change the test to allow either 3 or 4 as a valid refcount.