GNOME Bugzilla – Bug 505770
gst_element_get_state() should unblock if element posts an error
Last modified: 2010-07-01 19:30:44 UTC
If a pipeline fails to preroll, calls to gst_element_get_state result in a deadlock. I've encountered this bug in a specific scenario, although I think it would happen in other cases as well. In my case there is a simple pipeline: filesrc location=some-file '!' decodebin '!' fakesink Now, if the mime-type of some-file is recognised, yet no codec for the format is found, the pipeline (correctly) fails to enter PAUSED state. The problem is that any attempt to check for this condition using gst_element_get_state result in a deadlock. Steps to reproduce: 1. Open gst-launch sources. Append the line gst_element_get_state (pipeline, NULL, NULL, GST_CLOCK_TIME_NONE); after (around line 700) fprintf (stderr, _("ERROR: pipeline doesn't want to preroll.\n")); 2. Build. 3. Launch the above pipeline with a file for which a codec is not available. 4. gst-launch prints out the following text, then deadlocks. Setting pipeline to PAUSED ... Pipeline is PREROLLING ... ERROR: from element /pipeline0/filesrc0: Internal data flow error. Additional debug info: gstbasesrc.c(1816): gst_base_src_loop (): /pipeline0/filesrc0: streaming task paused, reason not-linked (-1) ERROR: pipeline doesn't want to preroll.
This is as designed. You can check for this condition by: - regularly poll the state (with a timeout, so that you don't block forever). - act when you receive the ASYNC_DONE message on the bus. Another option would be to make the get_state function return a FAILURE when an error is posted from the pipeline. This is not done yet, do you think it's needed?
The documentation states clearly: "If the element completes the state change or goes into an error, this function returns immediately with a return value of GST_STATE_CHANGE_SUCCESS or GST_STATE_CHANGE_FAILURE respectively." I wish to synchronously change the state and leave the glib main loop alone, so waiting for error events is not really an option. Polling with a nonzero timeout - it will possibly work, but a timeout does not tell me if the state-change has failed, or it is just taking longer than usual.
I don't mean to be rude or annoy you in any way, but the behavior I've described is: - inconsistent with the documentation - difficult to avoid without really dirty hacks - contrary to the "principle of least astonishment" :) This said, I'm quite confident to call it a bug. <rant mode on> I was going to write how much this blocks my project, but who cares. Please fix this bug, or at least point me in the right direction so I can understand how this voodoo code works. <rant mode off>
The documentation needs to be made clearer then. Currently, you're supposed to do things like this: if (gst_element_set_state() == GST_STATE_CHANGE_FAILURE) goto handle_error; ... either handle state changes or errors asynchroneously, e.g. by running a main loop and setting up a bus watch, or by doing something like: msg = gst_bus_timed_pop_filtered (GST_ELEMENT_BUS (pipeline), -1, GST_MESSAGE_ERROR | GST_MESSAGE_ASYNC_DONE); if (GST_MESSAGE_TYPE (msg) == GST_MESSAGE_ERROR) goto handle error_message; ...
ping? do you think it's interesting to unblock a _get_state() with an error when an error message is posted on the element? I think it's possible to do this and I'm starting to think that it would be a nice thing. Opinions?
> Do you think it's interesting to unblock a _get_state() with an error > when an error message is posted on the element? I think it's possible to do > this and I'm starting to think that it would be a nice thing. Opinions? I think that would probably be a good idea. It's just what people expect, and there's not a lot of code out there that really gets this stuff right (because it is very tedious if you don't have an entirely async state machine). And I don't expect anything to break if we change this.
*** Bug 538176 has been marked as a duplicate of this bug. ***
So let's get this done :)
(In reply to comment #5) > ping? do you think it's interesting to unblock a _get_state() with an error > when an error message is posted on the element? I think it's possible to do > this and I'm starting to think that it would be a nice thing. Opinions? How would this be implemented? The state cond needs to be signalled when an error is posted but for this we would need to look at the bus. How do you want to look at all bus messages?
gstbin has a sync bus handler where it receives all bus messages from all children. When you see and error, see if there is a state change going on, mark the state change as error and signal the element state gcond to unblock the pending wait. The reason why it's not currently done is because it is not deterministic. Sometimes a state change would work without error (and you get the error later) or sometimes it would fail because the error message was posted faster. Maybe it's better to unblock the gcond without changing the state change return value. It needs some thinking.
Created attachment 155225 [details] [review] possible patch This is a possible patch. It marks the bin state change return value as FAILURE and wakes up any pending get_state() functions.
It probably does not cause problems in any apps I know but it still sits weird.. Now we would return a FAILURE state change return even if there was no state change return failure...
Maybe we should add a new return value for this case?
> It probably does not cause problems in any apps I know but it still sits > weird.. Now we would return a FAILURE state change return even if there was no > state change return failure... Not sure I understand the problem. There was a failure and we return FAILURE, that seems fine to me. Preroll/dataflow is part of the state change mechanism, so it seems ok to classify this as a 'state change failure' IMHO. Adding a new return value would cause more problems than solve things. I'd rather keep current behaviour than add yet another return value.
I ran into the exact same problem today while coding an application, and my unit tests where failing on a build slave that didn't have flac installed and thus my pipeline errored out and never left get_state() as I expected. I'm not sure why adding a return value would complicate things; as a bonus it also allows code to safely test for whether the behaviour has changed and it can safely use get_state()
> I'm not sure why adding a return value would complicate things; as a bonus it > also allows code to safely test for whether the behaviour has changed and it > can safely use get_state() It can safely use get_state() in any case, unless you can think of a case where a program might actually want to block forever waiting for something that won't be happening. Adding a new return value would complicate things because: - 99% of the code out there doesn't even handle the existing values correctly - the actual problem is that most people who use this function don't understand the details of the state change mechanism, the various failure scenarios, how things work and/or are supposed to work, and whether or not they should be using get_state() here or if it's safe. Adding yet another return value won't help - adding another return value risks breaking existing code that actually tries to handle the return values; of course you could argue that that code isn't correct from the start, but that wouldn't not particularly helpful IMHO (and then we could just argue that code using get_state() incorrectly is broken and we don't need this fix at all). Maybe we should just deprecate _get_state() and replace it with a bunch of new, better functions, such as gst_pipeline_wait_for_preroll (pipeline, &error), gst_pipeline_wait_for_preroll_with_timeout (pipeline, &error, -1); gst_element_get_states (&cur_state, &pending_state); or something like that.
I think we should then push the Patch in Comment 11 after the next freeze and see what happens.
Now might be a good time?
yes, let's try it
commit 88c6896fb9db7d61429d6305314d549c818545da Author: Wim Taymans <wim.taymans@collabora.co.uk> Date: Tue May 25 19:17:44 2010 +0200 gstbin: unlock _get_state() on error When an error message is received on the bus, mark the bin as being in the error state and unlock all current _get_state() calls with an error. Fixes #505770
Re-opening this, because according to git-bisect it's responsible for causing crashes in totem, when trying to play the next file after totem has gotten an error (e.g. file is encrypted or is a random text file), leading to this: ** Message: Error: This appears to be a text file gstdecodebin2.c(1795): type_found (): /GstPlayBin2:play/GstURIDecodeBin:uridecodebin0/GstDecodeBin2:decodebin20: decodebin2 cannot decode plain text files ... 0:00:03.312382906 7912 0x11ec100 DEBUG playbin2 gstplaybin2.c:2993:autoplug_select_cb:<play> checking factory qtdemux (lt-totem:7912): GLib-GObject-CRITICAL **: g_object_get: assertion `G_IS_OBJECT (object)' failed Segmentation fault Program received signal SIGSEGV, Segmentation fault. 0x00007ffff328b6c2 in IA__g_value_set_object (value=0x7fffffffa780, v_object=0x12d1de0) at /tmp/buildd/glib2.0-2.24.1/gobject/gobject.c:2840 2840 /tmp/buildd/glib2.0-2.24.1/gobject/gobject.c: No such file or directory. in /tmp/buildd/glib2.0-2.24.1/gobject/gobject.c (gdb) bt
+ Trace 222656
Haven't investigated exact cause yet though.
Closing again. See bug #623318 for the playbin2 issue exposed by this.