Bug 760930 – adaptivedemux: stream can pause for a long time when server errors encountered

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 760930 - adaptivedemux: stream can pause for a long time when server errors encountered


Summary:	adaptivedemux: stream can pause for a long time when server errors encountered


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-bad
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-01-21 12:12 UTC by David Waring
Modified:	2018-11-03 13:45 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch to change the way fragments are retried in live streams (7.58 KB, patch) 2016-02-04 15:40 UTC, David Waring	none	Details \| Review
Patch to change the way fragments are retried in live streams (dashdemux/hlsdemux/msdemux) (18.71 KB, patch) 2016-02-05 16:10 UTC, David Waring	none	Details \| Review
Patch to change the way fragments are retried in live streams (dashdemux/mssdemux) (11.24 KB, patch) 2016-02-25 11:53 UTC, David Waring	none	Details \| Review
Updated patch which includes better tracking of playback distance from live. (14.73 KB, patch) 2016-05-17 13:24 UTC, David Waring	none	Details \| Review

Description David Waring 2016-01-21 12:12:07 UTC

During testing dashdemux/adaptivedemux with live streams we noticed the
playback cease and wait for a long time before attempting to carry on.
This was caused by an unexpected 503 error from the DASH server, which
caused the adaptivedemux element to immediately wait for the next MPD
update. The MPD used had a 1 hour update period, meaning that the stream
could remain frozen for up to 1 hour after encountering an HTTP server
error. We believe that this a less than ideal situation for playback of
a live stream.

We propose that the error recovery logic in
gst_adaptive_demux_download_fragment and
gst_adaptive_demux_download_loop be changed. We believe, for a live
stream, a better error recovery scenario would be achieved by:
  1. retry the current media fragment first, up to the retry limit.
  2. once the retry limit is reached for the current media fragment it
should be abandoned and the stream resynchronised to the fragment that should
be at the current running time (i.e. bring back up to date with what
should be playing out had the stream continued and not encountered an
error).
  3. if there is no new fragment to resynchronise to, only then wait for a
manifest update.

Step 2 could also just skip to the next fragment should one be available.

If this logic seems sound I'll try and produce a patch to implement it.
Can any of the maintainers foresee any problems with these proposed changes
to adaptivedemux?

Comment 1 Sebastian Dröge (slomo) 2016-01-22 15:42:41 UTC

Sounds like a good solution, but also 1) seems something to be done for non-live streams.

Comment 2 David Waring 2016-02-04 15:40:18 UTC

Created attachment 320445 [details] [review]
Patch to change the way fragments are retried in live streams

Here's my proposed patch.

In the patch I needed to know the difference between a stream seek that moved onto a new fragment and one that didn't. Since none of the code checked the return code from the stream_seek I added an extra return code of GST_FLOW_CUSTOM_SUCCESS to indicate that the seek succeeded, but it didn't actually change the stream position, GST_FLOW_OK means the seek worked and changed the stream position. Hope this is ok.

Comment 3 David Waring 2016-02-04 15:41:51 UTC

Sebastian, to answer your earlier comment, the behaviour for non-live streams was already to do the retries, so I've left that behaviour as it is.

Comment 4 Thiago Sousa Santos 2016-02-04 17:53:27 UTC

I like this idea. The new return value needs to be replicated in mssdemux and hlsdemux as well.

Comment 5 David Waring 2016-02-05 16:10:09 UTC

Created attachment 320505 [details] [review]
Patch to change the way fragments are retried in live streams (dashdemux/hlsdemux/msdemux)

Comment 6 David Waring 2016-02-05 16:23:35 UTC

I've updated to the latest git master HEAD and implemented on top of the new changes to stream_seek from there. I've also included mssdemux and hlsdemux. Had to write the seek function for hlsdemux as it didn't have one already, so based it on some of the code in the seek event call.

I can extensively test the dashdemux path, but I've only been able to do a cursory check on hlsdemux and mssdemux using example streams.

Comment 7 Thiago Sousa Santos 2016-02-09 22:36:08 UTC

Review of attachment 320505 [details] [review]:

::: ext/hls/gsthlsdemux.c
@@ +414,3 @@
+}
+
+static GstFlowReturn

Please make sure this doesn't break the unit tests that do snap-seeking.

::: gst-libs/gst/adaptivedemux/gstadaptivedemux.c
@@ +2604,3 @@
+
+            gst_adaptive_demux_get_live_seek_range (demux, &range_start,
+                &range_end);

smoothstreaming doesn't implement this. We need to skip this if it isn't implemented or just have some sane default that we could try, perhaps the last fragment timestamp in the current manifest version?

@@ +2605,3 @@
+            gst_adaptive_demux_get_live_seek_range (demux, &range_start,
+                &range_end);
+            stream_time = range_end - stream->segment.base;

Do we need to subtract base here? AFAIK the ranges are already in the correct values.

Comment 8 David Waring 2016-02-12 17:34:30 UTC

I'm having a look at the unit tests, but I'm having trouble getting them to execute consistently. One time all tests will pass, another run (without recompiling) and the test will show bus errors and seg-faults. Is there a trick to getting consistent runs for the unit tests?

Comment 9 Tim-Philipp Müller 2016-02-12 19:32:06 UTC

> I'm having a look at the unit tests, but I'm having trouble getting them to
> execute consistently. One time all tests will pass, another run (without
> recompiling) and the test will show bus errors and seg-faults. Is there a
> trick to getting consistent runs for the unit tests?

Not really, how do you run them exactly? On what platform?

Comment 10 David Waring 2016-02-15 10:50:44 UTC

The platform I'm trying to test on is a Dell T7400 (Intel Xeon E5420, 8GB ram, NVidia Quadro FX 1700) running up-to-date Fedora 23.

I'm using "make check" to run the unit tests after compiling with "make -j4". I'm building in a separate directory to the source (i.e. running the configure script in the gst-plugins-bad directory from a different build directory), if that makes any difference.

Comment 11 Thiago Sousa Santos 2016-02-15 14:39:59 UTC

Can you share the errors you get? Are they failing before your modifications? This looks like a separate bug.

Comment 12 David Waring 2016-02-16 15:29:05 UTC

I did another git pull before trying to get some logs together for you. The tests now seem a lot more stable. Although I'm still seeing:
- elements/mplex occasionally (randomly) fails
- elements/audiointerleave and elements/audiomixer always fail
- elements/mssdemux fails when run with other tests, but succeeds when run on its own

I don't believe my code touches mplex, or the audio* ones, so is it safe to ignore those failures?

The mssdemux error (when run with other tests) is:
0:00:06.241177356 17899 0x7fca88002a80 ERROR          adaptivedemux gstadaptivedemux.c:3010:gst_adaptive_demux_stream_download_loop:<mssdemux:audio_00> Download error: A network error occurred, or the server closed the connection unexpectedly.: /project-mcuk/bcs/davidjw/InetDist/DASM/rd-gst-plugins-bad/tests/check/elements/mssdemux.c(525): test_fragment_download_error_src_create (): /GstPipeline:pipeline/GstMssDemux:mssdemux/GstBin:srcbin-audio_00/GstTestHTTPSrc:testhttpsrc1:
A network error occurred, or the server closed the connection unexpectedly.

91%: Checks: 12, Failures: 0, Errors: 1
/project-mcuk/bcs/davidjw/InetDist/DASM/rd-gst-plugins-bad/tests/check/elements/test_http_src.c:477:E:basicTest:testDownloadError:0: (after this point) Test timeout expired
FAIL elements/mssdemux (exit status: 1)


elements/hls_demux is now failing consistently on snap seeks, so I can get on and look at that.

Comment 13 David Waring 2016-02-25 11:53:24 UTC

Created attachment 322356 [details] [review]
Patch to change the way fragments are retried in live streams (dashdemux/mssdemux)

This patch should address issues with the previous patch.

Comment 14 A Ashley 2016-02-29 11:25:13 UTC

I don't think the logic should be to always jump back to the current live point following an error. One transient network error causing playback to jump forwards would be rather confusing (and probably annoying) to the viewer.

Imagine if you were watching a sports event, 30 minutes behind live. Having the stream suddenly jump forward 30 minutes without any user input would be rather annoying.

I think the behaviour should be to move forwards, one segment at a time, until it is either at live or has succeeded in getting back into a good state.

Comment 15 David Waring 2016-02-29 13:01:47 UTC

Yes, I'd thought of that, but quickly realised skipping forward one segment at a time wouldn't work.

I then tried to figure out how far the viewer was from live and use that as an offset to recalculate, but I couldn't figure out how to find out how far behind live the viewer is at (due to pause/skip). That figure doesn't seem to be easily available and there doesn't seem to be any way to derive it from absolute time, running time, stream time and/or segment information (my first patch had incorrectly tried to use the segment base as this offset, due to me misreading the GstSegment documentation).

The problem with just trying to skip forward one segment at a time is that, due to the retry logic, each fragment will take a fragment duration plus a little extra to retry. So simply skipping to next fragment will never catch up the stream. It needs to be a jump forward, sometimes 2 or 3 fragments to get back to the streaming point. As pointed out above, calculating that skip doesn't seem to be an easy task.

I'm open to suggestions to improve this, but I was spending way too much time trying all sorts of combinations of clocks, bases and offsets. I had to make a decision to just drop that behaviour and just assume that most viewers would be watching a live stream at the live point. For those not viewing live, there would be a disruption to playback anyway, during retry of the current fragment, followed by a jump back to live. I realise this is not ideal, but I also believe that it is better than the current master branch behaviour which can freeze playback for long periods of time waiting for a manifest update.

After posting this latest patch I did have a thought to try remembering the difference between current playback stream time and live stream time upon seek event or transition to PLAYING, and use that as an offset. But I'm not sure how well this would work and I haven't had chance to try it yet.

Comment 16 David Waring 2016-05-17 13:24:39 UTC

Created attachment 328078 [details] [review]
Updated patch which includes better tracking of playback distance from live.

Finally had a chance to try out the method in my previous post and it works quite well, so here's a new version of the patch with changes to track how far the playback is behind live and use that adjustment when resynchronising the stream.

Comment 17 Sebastian Dröge (slomo) 2017-03-02 17:32:04 UTC

Is this related to https://bugzilla.gnome.org/show_bug.cgi?id=776609 ?

Comment 18 David Waring 2017-03-06 10:06:35 UTC

Certainly looks like they came across the same situation we did, i.e. a fragment download error causing the download loop to wait for an MPD update which could take a long time.

Comment 19 GStreamer system administrator 2018-11-03 13:45:36 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/343.