GNOME Bugzilla – Bug 652562
basesink: stuck in GST_PAD_PREROLL_WAIT
Last modified: 2011-06-26 09:52:12 UTC
This is a bug that I'm seeing with Opera's GStreamer integration on certain test cases, and it appears like it might be a GStreamer bug. The test case loads two <audio> elements with a very short (2 samples of silence) WAVE file, which I'll attach. The pipeline constructed is like this: $ gst-launch-0.10 filesrc location=2samples.wav ! wavparse ! queue ! audioconvert ! audioresample ! volume ! autoaudiosink At random, one such pipeline will fail to preroll, getting stuck in gst_base_sink_wait_preroll. The full backtrace at that point is: (gdb) thread apply all bt
+ Trace 227475
As you can see there are two pipelines here, of which 1 is waiting for preroll. GstMediaPlayer::ThreadFunc (part of our integration) is not doing anything interesting at this point, only waiting for messages from GStreamer. In thread 880, the event being processed is a GST_EVENT_EOS. I have a few main suspects: 1. The WAVE file is exceptionally short, so it seems likely that there is somewhere a race condition triggered by the EOS event being fired much earlier than it usually is. I haven't looked in wavparse, but that might be a good place to look around. 2. We normally post a seek to position 0 immediately after reaching GST_STATE_PAUSED. This is because we sometimes want to seek to some non-zero offset and just haven't special-cased the typical case to avoid the seek. Doing this seek triggers the GST_PAD_PREROLL_SIGNAL in gst_base_sink_preroll_queue_flush. I presume this is rather atypical use of the API, and that there may be race conditions hidden here. 3. pulsesink itself is failing randomly due to something with PulseAudio. I have no particular reason to suspect this, the pulsesink has been less than reliable in the past.
Created attachment 189903 [details] 2 samples of silence
This is the state of basesink while waiting for GST_PAD_PREROLL_WAIT: (gdb) print *sink $29 = {element = {object = {object = {g_type_instance = {g_class = 0x50959a0}, ref_count = 2, qdata = 0x0}, refcount = 0, lock = 0x3e0d7d0, name = 0x45ca1a0 "opaudiosink-actual-sink-pulse", name_prefix = 0x0, parent = 0x7fffe4cde4d0, flags = 32, _gst_reserved = 0x0}, state_lock = 0x3e0ef40, state_cond = 0x3b411c0, state_cookie = 2, current_state = GST_STATE_PAUSED, next_state = GST_STATE_VOID_PENDING, pending_state = GST_STATE_VOID_PENDING, last_return = GST_STATE_CHANGE_SUCCESS, bus = 0x3dc60f0, clock = 0x0, base_time = 0, numpads = 1, pads = 0x7fffcc8dcae0, numsrcpads = 0, srcpads = 0x0, numsinkpads = 1, sinkpads = 0x4531600, pads_cookie = 1, abidata = {ABI = {target_state = GST_STATE_PAUSED, start_time = 0}, _gst_reserved = {0x3, 0x0, 0x0, 0x0}}}, sinkpad = 0x3e03da0, pad_mode = GST_ACTIVATE_PUSH, offset = 0, can_activate_pull = 0, can_activate_push = 1, preroll_queue = 0x7fffcc990940, preroll_queue_max_len = 0, preroll_queued = 1, buffers_queued = 0, events_queued = 0, eos = 0, eos_queued = 0, need_preroll = 1, have_preroll = 1, playing_async = 0, have_newsegment = 0, segment = {rate = 1, abs_rate = 1, format = GST_FORMAT_UNDEFINED, flags = GST_SEEK_FLAG_NONE, start = 0, stop = -1, time = 0, accum = 0, last_stop = 0, duration = -1, applied_rate = 1, _gst_reserved = '\000' <repeats 23 times>}, clock_id = 0x0, end_time = 0, sync = 1, flushing = 0, abidata = {ABI = {clip_segment = 0x4b0b570, max_lateness = -1, running = 0}, _gst_reserved = {0x4b0b570, 0xffffffffffffffff, 0x0 <repeats 17 times>}}, priv = 0x44938b0} I can't tell frmo this if data has flown through the sink before the EOS event, but perhaps someone else can.
It seems that even when our tests involving this setup pass, we still have a thread waiting for preroll. I've stepped through a typical such scenario and this is what happens: 1. GST_PAD_PREROLL_WAIT - from gst_pad_push (4 byte buffer, should be the full raw PCM data)
+ Trace 227476
4. Nothing! The thread in step 3 is still waiting. Will investigate why we're not getting a second preroll, or if basesink should be waiting for it at all.
That got a bit mangled by the stack trace being pulled out, don't miss steps 2 and 3 which are in the trace.
It looks like my understanding here may be a bit flawed. Even when removing our "seek to 0" code and using a more normal-sized WAVE file, I still have a thread waiting for preroll when the element is paused. Is preroll not supposed to happen while the element is paused?
In such a case, it's gst_base_sink_change_state that calls GST_PAD_PREROLL_SIGNAL for GST_STATE_CHANGE_PAUSED_TO_PLAYING, as such: if (basesink->eos) { GstMessage *message; /* need to post EOS message here */ GST_DEBUG_OBJECT (basesink, "Now posting EOS"); message = gst_message_new_eos (GST_OBJECT_CAST (basesink)); gst_message_set_seqnum (message, basesink->priv->seqnum); gst_element_post_message (GST_ELEMENT_CAST (basesink), message); } else { GST_DEBUG_OBJECT (basesink, "signal preroll"); GST_PAD_PREROLL_SIGNAL (basesink->sinkpad); } Note especially that GST_PAD_PREROLL_SIGNAL is called only if basesink->eos is not set. When the bug reproduces it is when the eos event has been posted, so it seems quite likely that there's a relation. In such a case, GST_PAD_PREROLL_SIGNAL would never be called here, so the other thread will be left hanging in GST_PAD_PREROLL_WAIT. Looking at the state of things in a normal (non-hanging) case, it appears as though there may be some missing locking/synchronization for sink->need_preroll. Thread 9 and 14 are the interesting ones, see backtrace at bottom. Thread 9 checked if (G_LIKELY (sink->need_preroll)) before going into gst_base_sink_wait_preroll, but thread 14 sets basesink->need_preroll = FALSE without making any strong guarantee that those waiting for preroll will be notified in the case if (basesink->eos). Is the assumption that if we have basesink->eos, then no thread can possibly be waiting for preroll? That seems to not be a correct assumption, and possibly the root cause here. (gdb) thread apply all bt
+ Trace 227479
Thread 14 (Thread 0x7fffeae2c700 (LWP 17195))
Trying out the hypothesis that this was due to the logic in GST_STATE_CHANGE_PAUSED_TO_PLAYING I added some printf-ing to see which path was taken, but with this added I couldn't reproduce. However, I didn't see either branch ever being taken, so this particular piece of code is probably not the culprit.
After chatting with wtay on IRC it seems I've been chasing after the wrong thing. Apparently it's expected that there will be a thread waiting for preroll while the pipeline is PAUSED. I should try to instead figure out why the pipeline isn't reaching the PAUSED state at all when this bug reproduces...
First, sorry for all the spamming. I was indeed heading in the wrong direction. The pipeline is in fact reaching PAUSED, I deduced that it wasn't from the fact that a side-effect of it reaching PAUSED never happened: case GST_MESSAGE_STATE_CHANGED: if (GST_MESSAGE_SRC(msg) == GST_OBJECT_CAST(pipeline)) { GstState oldstate, newstate, pending; gst_message_parse_state_changed (msg, &oldstate, &newstate, &pending); if (oldstate == GST_STATE_READY && newstate == GST_STATE_PAUSED) { GstFormat format = GST_FORMAT_TIME; gint64 duration; if (gst_element_query_duration (pipeline, &format, &duration) && format == GST_FORMAT_TIME) { // post a gst_message_new_duration to the application // we do this because GST_MESSAGE_DURATION usually isn't posted by GStreamer, it seems } } The side-effect that never happened was posting that duration message. However, it wasn't because the pipeline wasn't reaching PAUSED, it was because gst_element_query_duration failed. When this happens, seeking to offset 0 also fails. I'm not sure why either fails, but that is the problem to go after, not what I've been chasing in this bug. Closing as invalid.
On a side note, it's much easier to check for the ASYNC_DONE message, that saves you the src==pipeline check and the oldstate/newstate check.
What was actually wrong: In theory, it is OK for a pull source to return smaller or larger buffers than was requested from _create, and most code handles this. However, few pull sources except Opera's seem to do this, so in practice it isn't very well tested and this is the second bug that turned out to be caused by it. In this case, it is gst_riff_read_chunk that assumes that a short read implies EOS, so decoding simply failed. I'm not sure if I should file a bug on gst_riff_read_chunk or not, I have instead changed Opera so that _create never returns short buffers except at EOS.
> In theory, it is OK for a pull source to return smaller or larger buffers than > was requested from _create Citation needed? :) > In this case, it is gst_riff_read_chunk that assumes that a short read implies > EOS, so decoding simply failed. > > I'm not sure if I should file a bug on gst_riff_read_chunk or not, I have > instead changed Opera so that _create never returns short buffers except at > EOS. I believe gst_riff_read_chunk() is correct in assuming this, and I don't think it's OK for a pull source to return a buffer smaller than what was requested in other circumstances.
(In reply to comment #12) > > In theory, it is OK for a pull source to return smaller or larger buffers than > > was requested from _create > > Citation needed? :) Wow, digging in history here proved embarrassing and amusing. The documentation was updated to "When the subclass returns GST_FLOW_OK, it MUST return a buffer of the requested size unless fewer bytes are available because an EOS condition is near." in https://bugzilla.gnome.org/show_bug.cgi?id=617733 which was filed by none other than me! In that bug, I even claimed that I would fix Opera to respect this, but obviously didn't and even managed to completely suppress it from memory. Now I've actually fixed it. Yay me.
FWIW the only case where you are allowed to return *less* than requested in _create() is at the end of stream and then immediately afterwards an EOS event should be sent.