After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 645961 - [pulsesink] hangs when going from paused to playing near EOS
[pulsesink] hangs when going from paused to playing near EOS
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other Linux
: Normal blocker
: 0.10.29
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2011-03-28 15:32 UTC by René Stadler
Modified: 2011-03-31 13:58 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Test script to reproduce the issue (1.04 KB, text/x-python)
2011-03-28 15:32 UTC, René Stadler
  Details
Patch for -base (1.80 KB, patch)
2011-03-28 15:34 UTC, René Stadler
needs-work Details | Review
Patch for -good (2.60 KB, patch)
2011-03-28 15:34 UTC, René Stadler
needs-work Details | Review
baseaudiosink: arrange for running clock when rendering eos (2.78 KB, patch)
2011-03-28 20:01 UTC, Mark Nauwelaerts
committed Details | Review
pulsesink: also uncork during and after eos rendering (1.39 KB, patch)
2011-03-28 20:02 UTC, Mark Nauwelaerts
committed Details | Review

Description René Stadler 2011-03-28 15:32:46 UTC
Created attachment 184466 [details]
Test script to reproduce the issue

[This issue is a bit similar to bug #636886]

When pausing a non-live pipeline involving pulsesink during EOS rendering time (gst_base_sink_wait_eos) and going back to playing, the pipeline hangs forever waiting for EOS. This is easy to reproduce with a larger ringbuffer (of the kind we like to use in mobile devices). I'm attaching a test script that reproduces this with audiotestsrc ! pulsesink latency-time=100000 buffer-time=500000 for example.

The cause of the problem is that the pulsesink provided clock is frozen (time doesn't continue). This leads to base_sink_wait_eos waiting forever because it uses clock_id_wait. Time is frozen because the pa_stream remains corked here. This is a side-effect of Wim's change to defer uncorking until there is actual data to be written. Obviously that won't happen during or after EOS.

It seems this needs a fix to baseaudiosink as well as pulsesink itself. I'm attaching two patches for illustration purposes; they fix the issue as far as it can be triggered by the script but they are not perfect. First fix to baseaudiosink changes it so that the ringbuffer is also started during wait_eos/draining. This can be seen as an extension of Mark's fix for bug #636886, which added this behavior for the time *after* EOS is rendered (basesink->eos == TRUE).

Starting the ringbuffer however has no effect on pulsesink since this gets deferred. Second patch changes that to detect the pending EOS situation there as well, then uncorks right away to keep the clock running even without data. Also, it seems pulsesink didn't check basesink->eos before, which makes me wonder whether the fix for bug #636886 had any effect here(?)

Maybe someone has an idea how to implement those fixes in a bit more elegant way (and how to eliminate remaining races in baseaudiosink).
Comment 1 René Stadler 2011-03-28 15:34:03 UTC
Created attachment 184467 [details] [review]
Patch for -base

baseaudiosink: also start ringbuffer during wait for EOS (FIXME!)

Waiting for EOS needs a running clock, which might require the
ringbuffer to be running.

FIXME: This patch is incomplete in terms of proper locking; it needs
more work since there is probably remaining races between incoming EOS
event and state change.
Comment 2 René Stadler 2011-03-28 15:34:55 UTC
Created attachment 184468 [details] [review]
Patch for -good

pulsesink: also uncork during EOS waiting (and after EOS is rendered)

Pulsesink was recently changed to defer uncorking until there is data
to write. This condition will however never occur when EOS in being
rendered (since that marks the end of data). Changing to PAUSED state
while EOS is being waited on results in a hang: pausing corks the
stream, which will never be undone since there is no more data when
going back to PLAYING. If pulsesink is the clock provider, deadlock
ensues since time doesn't continue in corked state and the clock id
for EOS wait never fires.
Comment 3 Mark Nauwelaerts 2011-03-28 18:36:42 UTC
AFAICS, when EOS arrives at sink, sink->eos will turn TRUE (unless possibly sink turns flushing, which should not be when going from PLAYING to PAUSED), so there should be no need for the pending_eos (in either patch).  That should then only leave arranging the uncorking at ringbuffer start whenever sink->eos is TRUE.

Suffice it to say, will double-check and patch accordingly (actually using pulsesink unlike when testing bug #636886 ;) ).
Comment 4 René Stadler 2011-03-28 18:52:37 UTC
Maybe I'm mistaken, but I think that ->eos will turn TRUE only after the fact (after wait_eos/drain is complete). While it's waiting to reach the clock time of EOS, it is still FALSE. If I overlooked this I would feel really silly, but at least that would make the fix trivial :)
Comment 5 Mark Nauwelaerts 2011-03-28 19:22:27 UTC
Unfortunately, I overlooked something.  It seems things can hang around longer in wait_eos than expected, so it can take quite a while for eos to turn TRUE.

Patch for bug #636886 also removed an eos_rendering in baseaudiosink that could be used, or could have been, since that one was also a private one.

So, either resurrecting that one as a private, and have another private copy in pulsesink, or resurrect it as a non-private one that can be used by pulsesink ...
Comment 6 Mark Nauwelaerts 2011-03-28 20:01:08 UTC
Created attachment 184502 [details] [review]
baseaudiosink: arrange for running clock when rendering eos

Alternative patch for -base (reviving what existed before more exposed).
Comment 7 Mark Nauwelaerts 2011-03-28 20:02:18 UTC
Created attachment 184503 [details] [review]
pulsesink: also uncork during and after eos rendering

Minor variation on the previous -good patch, making use of what baseaudiosink exposes in the foregoing -base patch.
Comment 8 René Stadler 2011-03-28 20:36:01 UTC
Looks okay, but is there a (much smaller) remaining race left in baseaudiosink like I mentioned? I'm wondering if a fast state change PLAYING->PAUSED->PLAYING could end up with the streaming thread pushing EOS just *after* we checked that eos_rendering is FALSE in change_state. Haven't checked this too closely, but I can't put my finger on which lock would prevent that from happening :)
Comment 9 Mark Nauwelaerts 2011-03-28 20:48:41 UTC
I believe that is covered by the fact that baseaudiosink's state change issues a ringbuffer_may_start (in any case) before checking eos_rendering (a.o.).  So if EOS arrives after the check, the ringbuffer may start and baseaudiosink' drain will then (be allowed to) start the ringbuffer (when rendering EOS).
Comment 10 René Stadler 2011-03-28 21:03:22 UTC
That makes sense; I was looking at a different variant of my patch in which I moved the setting of pending_eos closer to the wait_eos call (and past the ring_buffer_start call). But yes, if the order is correct, there is no race. Nice!
Comment 11 Sebastian Dröge (slomo) 2011-03-29 10:01:17 UTC
Looks good to me
Comment 12 Mark Nauwelaerts 2011-03-31 11:30:05 UTC
In -good:

commit 176b8ffbfffb39d0d340e3e5f19f1081884523d7
Author: Mark Nauwelaerts <mark.nauwelaerts@collabora.co.uk>
Date:   Thu Mar 31 13:25:00 2011 +0200

    pulsesink: also uncork during EOS waiting (and after EOS is rendered)
    
    Pulsesink was recently changed to defer uncorking until there is data
    to write. This condition will however never occur when EOS in being
    rendered (since that marks the end of data). Changing to PAUSED state
    while EOS is being waited on results in a hang: pausing corks the
    stream, which will never be undone since there is no more data when
    going back to PLAYING. If pulsesink is the clock provider, deadlock
    ensues since time doesn't continue in corked state and the clock id
    for EOS wait never fires.
    
    Fixes #645961.

which depends on -base:

commit e73f293ee560379d3667c832be8165042a186877
Author: Mark Nauwelaerts <mark.nauwelaerts@collabora.co.uk>
Date:   Mon Mar 28 22:00:25 2011 +0200

    baseaudiosink: arrange for running clock when rendering eos
    
    Commit ba2e500bd992d8ad7db0da923801964964835967 ensured to provide
    a running clock when EOS had finished rendering.  However,
    other measures are needed (and were in place before) to ensure a
    running clock when EOS still needs rendering (i.e. waiting).
    
    So, specifically, re-introduce eos_rendering removed in aforementioned commit,
    this time as a public variable so subclasses can be aware of the situation.
    
    Fixes (part of) #645961.
    
    API: GstBaseAudioSink:eos_rendering
Comment 13 René Stadler 2011-03-31 13:53:01 UTC
In the final -good commit you replaced the author with yourself. I guess that was by mistake; please be more careful in the future. Even though it boiled down to another one of those one-line fixes, I put in some time to find the problem and describe it accordingly (e.g. the whole commit message is mine).
Comment 14 Mark Nauwelaerts 2011-03-31 13:58:29 UTC
Yes, sorry, noticed too late that happened by mistake.  Apparently some 'amend' (or alike) ate what I had catered for in the original patch ... (which I seemed to recall/know/believe it usually does not ?)