Bug 733959 – hlsdemux: download bitrate algorithms don't reflect real download rate

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 733959 - hlsdemux: download bitrate algorithms don't reflect real download rate


Summary:	hlsdemux: download bitrate algorithms don't reflect real download rate


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-bad
Version:	git master
Hardware:	Other All

Importance:	Normal blocker
Target Milestone:	1.7.2
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Duplicates:	735789 (view as bug list)
Depends on:
Blocks:	772330

Reported:	2014-07-30 07:24 UTC by m][sko
Modified:	2016-10-02 08:42 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
download algorithm really don't reflect real downloadrate (29.19 KB, text/x-log) 2014-07-31 23:43 UTC, m][sko		Details
souphttpsrc hack to set socket read buffer size (2.95 KB, patch) 2015-02-12 01:17 UTC, Duncan Palmer	none	Details \| Review
Prototype for pre-buffering in hlsdemux (12.18 KB, patch) 2015-02-18 06:53 UTC, Duncan Palmer	none	Details \| Review
Introduce a queue into hlsdemux to facilitate bitrate calculation (9.94 KB, patch) 2015-02-23 04:41 UTC, Duncan Palmer	none	Details \| Review
Add a property named avg-in-rate to queue2 (2.04 KB, patch) 2015-02-23 04:42 UTC, Duncan Palmer	none	Details \| Review
Introduce a queue into hlsdemux to facilitate bitrate calculation - take 2 (10.02 KB, patch) 2015-02-24 05:15 UTC, Duncan Palmer	none	Details \| Review
latest hlsdemux bitrate fix against 1.4 (9.80 KB, patch) 2015-04-20 03:58 UTC, Duncan Palmer	none	Details \| Review
latets change to queue2 to add avg-in-rate property. against 1.4. (3.04 KB, patch) 2015-04-20 03:59 UTC, Duncan Palmer	committed	Details \| Review
adaptivedemux: improved bitrate estimations (11.48 KB, patch) 2015-09-07 11:05 UTC, Philippe Normand	none	Details \| Review
adaptivedemux: improved bitrate estimations (11.42 KB, patch) 2015-10-23 11:57 UTC, Philippe Normand	none	Details \| Review
queue2: add overrun signal (5.49 KB, patch) 2015-11-04 11:05 UTC, Philippe Normand	committed	Details \| Review
adaptivedemux: improved bitrate estimations (12.73 KB, patch) 2015-11-04 11:05 UTC, Philippe Normand	committed	Details \| Review
adaptivedemux: remove queue2 overrun notification (1.81 KB, patch) 2016-01-21 11:13 UTC, Philippe Normand	committed	Details \| Review
adaptivedemux: restore bitrate averaging support (4.63 KB, patch) 2016-01-21 11:13 UTC, Philippe Normand	committed	Details \| Review
adaptivedemux: Calculate values before queue2 (6.40 KB, patch) 2016-10-02 08:09 UTC, Edward Hervey	none	Details \| Review

Description m][sko 2014-07-30 07:24:51 UTC

with this fix
Bug 733837
hlsdemux don't properly measure download bitrate.
current situation looks like how fast we get buffers from multiqueue
and not real download speed.
as souphttps src push buffer as fast as multiqueue allow.

Comment 1 m][sko 2014-07-30 07:41:00 UTC

I use this as traffic shaper
http://www.hecticgeek.com/2012/02/simple-traffic-shaping-ubuntu-linux/

sudo wondershaper eth0 292 128

and some results
http://pastebin.com/cYYPw5Es

Comment 2 m][sko 2014-07-30 18:38:48 UTC

you can close this bug

Comment 3 Thiago Sousa Santos 2014-07-30 20:19:29 UTC

So you confirm that the measures are correct or approximate well enough?

Comment 4 m][sko 2014-07-30 20:44:43 UTC

I am looking for right methodology how to test it.

Comment 5 m][sko 2014-07-31 23:43:48 UTC

Created attachment 282231 [details]
download algorithm really don't reflect real downloadrate

download algorithm really don't reflect real downloadrate
I used traffic shaper to limit speed 
when I increase limit to to twice as needed for stream, download rate was unrepresentative value.
gsthttpsoupsrc feed data faster then 
maybe some prebuffered data
or libsoup prebuffered data

Comment 6 m][sko 2014-08-01 00:46:17 UTC

when I move measure code down to souphttpsrc to 
gst_soup_http_src_got_chunk_cb
I get pretty much stream bitrate as result
still no connection speed

Comment 7 Sebastian Dröge (slomo) 2014-08-01 12:05:08 UTC

You mean the changes in the traffic shaper make no difference in the actual download rate? Or only the calculated value is wrong while data actually arrives with a completely different speed?

We currently measure how long it takes souphttpsrc to produce a buffer for us, while removing the time spent in downstream elements. This is not entirely correct as it ignores any buffering that can happen while downstream elements handle our buffer. So the number can be higher than reality. Is that what you're observing?

Comment 8 m][sko 2014-08-01 12:39:09 UTC

values are much much higher
And we can't use this values to switch to specific bitrate
It always switch to highest.

try it from your local intranet server with traffic shaper on your side

Comment 9 Duncan Palmer 2015-02-12 01:09:44 UTC

I've observed this as well - it happens because of the buffering which occurs when downstream elements are handling buffers from hlsdemux, and the problem is very severe in my test environment.

The simplest to reproduce is to simply playback content over a link which is shaped to a relatively low bitrate (e.g. 5mbit/s) so that we're using a low bitrate variant. I use a uridecodebin with buffer-duration set to 3.5s, and buffer-size to something large, so that after a short time, hlsdemux spends most of it's time pushing data downstream (as the first multiqueue spends a lot of time in overrun).

In the senario I've described, the time taken for souphttpsrc to produce a buffer is generally very small, as the buffer comes from the socket read buffer most of the time. The result is that we calculate quite a high bitrate.

In testing, where I've shaped port 80 on my http server to 5mbit/s using tc:
- hlsdemux takes 2.93s to download a 3s hls segment. Of this, 2.44s is spent pushing buffers downstream, and 0.49s is spent retrieving buffers from souphttpsrc. Segment size is 1528440 bytes.
- hlsdemux reports a bitrate of 23 mbit/s for this particular hls segment.
- Using wget to download the same segment on my device indicates the download rate is 4 mbit/s.
- If I mess about with the socket read buffer sizes, setting min to 1024 and max to 2048 using /proc/sys/net/ipv4/tcp_rmem , /proc/sys/net/core/rmem_max and /proc/sys/net/core/rmem_default , hlsdemux reports the rate as 8 mbit/s

As msko indicated, this behaviour was triggered by the multiqueue fix. If I set max-size-buffers on the first multiqueue to 0 (unlimited), the calculated bitrate is not too bad (tho is it a bit high). This is because we're continually reading from souphttpsrc, and so spend a lot of that time fetching data over the network.

I'm not sure what the solution is. One simple solution would be to buffer up some amount of data in hlsdemux, and measure how long it takes to fill the buffer. Empty the buffer before fetching more data. The buffer would need to be large enough to reduce the effects of the socket read buffers. Using a larger blocksize for souphttpsrc would help as well, but there are downsides to this I imagine.

Comment 10 Duncan Palmer 2015-02-12 01:17:56 UTC

Created attachment 296648 [details] [review]
souphttpsrc hack to set socket read buffer size

This hack to souphttpsrc forces it's socket read buffer size to 2k. It can be used to demonstrate that the problem is caused by data being read from socket read buffers. I'm also using it as a rather bad temporary workaround...

Comment 11 Thiago Sousa Santos 2015-02-13 05:22:54 UTC

(In reply to Duncan Palmer from comment #9)
> 
> In testing, where I've shaped port 80 on my http server to 5mbit/s using tc:
> - hlsdemux takes 2.93s to download a 3s hls segment. Of this, 2.44s is spent
> pushing buffers downstream, and 0.49s is spent retrieving buffers from
> souphttpsrc. Segment size is 1528440 bytes.
> - hlsdemux reports a bitrate of 23 mbit/s for this particular hls segment. 
> - Using wget to download the same segment on my device indicates the
> download rate is 4 mbit/s. 
> - If I mess about with the socket read buffer sizes, setting min to 1024 and
> max to 2048 using /proc/sys/net/ipv4/tcp_rmem , /proc/sys/net/core/rmem_max
> and /proc/sys/net/core/rmem_default , hlsdemux reports the rate as 8 mbit/s
> 
> As msko indicated, this behaviour was triggered by the multiqueue fix. If I
> set max-size-buffers on the first multiqueue to 0 (unlimited), the
> calculated bitrate is not too bad (tho is it a bit high). This is because
> we're continually reading from souphttpsrc, and so spend a lot of that time
> fetching data over the network.
> 
> I'm not sure what the solution is. One simple solution would be to buffer up
> some amount of data in hlsdemux, and measure how long it takes to fill the
> buffer. Empty the buffer before fetching more data. The buffer would need to
> be large enough to reduce the effects of the socket read buffers. Using a
> larger blocksize for souphttpsrc would help as well, but there are downsides
> to this I imagine.

This is a tricky issue. At some point we had an infinite queue inside hlsdemux so we could safely measure the bitrate as we would never block on a full queue. The problem is that this can be used to explode our memory usage with a malicious stream.

Currently our algorithm is to count only the time spend getting the data from souphttpsrc, we stop our chronometer before pushing and restart after the push returns. This saves us from measuring the blocking time but the internet and soup are running when our thread is blocked anyway, so we cheat a little by not measuring this time that was used to deliver us a few buffers. Specially when blocking the kernel can receive and hold some data for us and we would never take this time into account.

I don't see any other option to have a good enough measure that doesn't involve an infinite (or very very large) queue. As I said, this can be exploited to consume the machine's RAM. So, we need to write to disk where our limits are much larger. The downside is that it also is much slower. We could, however, create a new type of queue that would be hybrid, it would have its latest data always available in memory, while newest data goes to disk storage.

It would actually work like 2 queues:

-> [ disk queue ] -> [ mem queue ] ->

The souphttpsrc would be pushing data into the first queue, another thread would be taking data from mem queue and pushing it outside, and a third thread sits in the middle taking data from the disk and putting into memory.
The trick here is that the input thread would skip the disk queue if it was empty and if there was space in the mem queue. It would only use the disk when memory was full and we had enough buffering to not care about the disk latency.

Comment 12 Duncan Palmer 2015-02-13 10:44:30 UTC

(In reply to Thiago Sousa Santos from comment #11)

> This is a tricky issue. At some point we had an infinite queue inside
> hlsdemux so we could safely measure the bitrate as we would never block on a
> full queue. The problem is that this can be used to explode our memory usage
> with a malicious stream.

Wouldn't memory usage explode with any stream? At steady state, we need to be able to download a bit faster than we decode, so unless there is a limit imposed, our queues will just keep growing.


> I don't see any other option to have a good enough measure that doesn't
> involve an infinite (or very very large) queue. As I said, this can be
> exploited to consume the machine's RAM. So, we need to write to disk where
> our limits are much larger. 

One problem with this is that writing to disk isn't really an option for many embedded platforms. The disk would also need to be fairly large, as the size of the on-disk buffer would grow indefinately.

Comment 13 m][sko 2015-02-13 11:13:21 UTC

(In reply to Thiago Sousa Santos from comment #11)
Anybody can confirm that we can somehow properly measure real download speed from souphttpsrc?
TCP stack has some buffers, libsoup has some buffers and gst queues.

As I don't really think that souphttpsrc -> hlsdemux(queue) is good idea how to measure download speed.
Current solution don't handle drop of download rate that is common on wireless networks so hlsdemux switch to lower bitrate too late.


unlimited queuesize is different security problem and I don't know if it is really necessary to cache big amount of data in gsreamer queues 
I think that libsoup support some fs caching

Comment 14 Thiago Sousa Santos 2015-02-13 14:04:34 UTC

(In reply to Duncan Palmer from comment #12)
> (In reply to Thiago Sousa Santos from comment #11)
> 
> > This is a tricky issue. At some point we had an infinite queue inside
> > hlsdemux so we could safely measure the bitrate as we would never block on a
> > full queue. The problem is that this can be used to explode our memory usage
> > with a malicious stream.
> 
> Wouldn't memory usage explode with any stream? At steady state, we need to
> be able to download a bit faster than we decode, so unless there is a limit
> imposed, our queues will just keep growing.

Our queues are limited, the ability to store to disk allows us to have a bit larger queue that should be enough to accomodate at least 1 fragment safely.

There are also files where there is a single fragment (they use subfragments, a kind of marked place inside the stream where you can switch bitrates safely). The DASH ISOBMFF profile allows this.

> 
> 
> > I don't see any other option to have a good enough measure that doesn't
> > involve an infinite (or very very large) queue. As I said, this can be
> > exploited to consume the machine's RAM. So, we need to write to disk where
> > our limits are much larger. 
> 
> One problem with this is that writing to disk isn't really an option for
> many embedded platforms. The disk would also need to be fairly large, as the
> size of the on-disk buffer would grow indefinately.

We could limit disk queue with a large enough value that would be enough for a high bitrate fragment.

Comment 15 Thiago Sousa Santos 2015-02-13 14:07:18 UTC

(In reply to m][sko from comment #13)
> (In reply to Thiago Sousa Santos from comment #11)
> Anybody can confirm that we can somehow properly measure real download speed
> from souphttpsrc?
> TCP stack has some buffers, libsoup has some buffers and gst queues.

I tried looking for ways for getting this from libsoup but I couldn't. Ideally it would be the best solution to keep this measurement to the element that
does the downloading but I just don't know if it can do it. In any case it can also block when its queues get full and they would have the same problems we have when measuring the bitrate.

> 
> As I don't really think that souphttpsrc -> hlsdemux(queue) is good idea how
> to measure download speed.
> Current solution don't handle drop of download rate that is common on
> wireless networks so hlsdemux switch to lower bitrate too late.
> 

The decision of when to switch bitrates is a different problem, IMHO. First we need to make sure our bitrate makes sense.

Comment 16 Olivier Crête 2015-02-13 17:17:55 UTC

It may make sense to use the "overrun" signal from the queue.. and stop the timer only when that happens. And then only restart it when the kernel's buffer has been emptied, but that second step would require help from libsoup.

Comment 17 Olivier Crête 2015-02-13 17:19:29 UTC

Actually, that may be a terrible idea, would probably results in never having measurement in the normal case (where the buffers are always full I assume?)

Comment 18 Duncan Palmer 2015-02-17 04:28:53 UTC

(In reply to Thiago Sousa Santos from comment #14)
> (In reply to Duncan Palmer from comment #12)
> > (In reply to Thiago Sousa Santos from comment #11)
> > 
> > > This is a tricky issue. At some point we had an infinite queue inside
> > > hlsdemux so we could safely measure the bitrate as we would never block on a
> > > full queue. The problem is that this can be used to explode our memory usage
> > > with a malicious stream.
> > 
> > Wouldn't memory usage explode with any stream? At steady state, we need to
> > be able to download a bit faster than we decode, so unless there is a limit
> > imposed, our queues will just keep growing.
> 
> Our queues are limited, the ability to store to disk allows us to have a bit
> larger queue that should be enough to accomodate at least 1 fragment safely.

What I'm trying to get at is that because the queue size is limited, we will always end up, after some time, with all queues full. This is because we download faster than we decode.

So, even with the scheme which has the on-disk queue, we are in no better place than we are now, as we can't accurately measure download rate unless we're pushing into a non-blocking queue.

So, we would need to wait for a queue to have enough free space to accomodate a whole fragment before attempting to download and measure download rate.

Also, if we have small fragments, we can't accurately measure download rate, so we need to measure it across a few of them.

I'm wondering if it would be simpler to define a minimum amount of data which can be used to measure download rate, have a queue which can hold that amount of data, and simply measure how long it takes to fill that queue?  Then emppty the queue and repeat. If we don't fill it e.g. because we switch variant, then we don't measure bitrate that time around. Again, we run into trouble with low bitrate variants, as the queue could end up being quite long in stream time.

> > One problem with this is that writing to disk isn't really an option for
> > many embedded platforms. The disk would also need to be fairly large, as the
> > size of the on-disk buffer would grow indefinately.
> 
> We could limit disk queue with a large enough value that would be enough for
> a high bitrate fragment.

As I mention above, once the queue becomes full because of the limit, then we can no longer measure bitrate accurately.

Comment 19 Duncan Palmer 2015-02-17 07:03:52 UTC

(In reply to Duncan Palmer from comment #18)

> I'm wondering if it would be simpler to define a minimum amount of data
> which can be used to measure download rate, have a queue which can hold that
> amount of data, and simply measure how long it takes to fill that queue? 
> Then emppty the queue and repeat. If we don't fill it e.g. because we switch
> variant, then we don't measure bitrate that time around. Again, we run into
> trouble with low bitrate variants, as the queue could end up being quite
> long in stream time.

Actually, there's no low bitrate problem provided we always read from this bitrate calculation queue using a seperate thread. We don't calculate how long it takes to fill the queue, but rather how long it takes to read n bytes, where n is the size of the queue. We wait for it to empty after this calculation.

I'd guess the size of the queue wouldn't need to me more than a couple of hundred K.

Comment 20 Duncan Palmer 2015-02-18 06:39:56 UTC

I've prototyped something along the following lines which is (I think) a lot simpler than the queueing ideas we've discussed. It implements the following;

- The kernel and possibly libsoup have buffers which fill continually when we're retrieving a HLS segment. The combined size of these buffers is upstream_bufsize.
- hlsdemux has a local buffer of size hlsdemux_bufsize, where hlsdemux_bufsize > upstream_bufsize.
- We can measure bitrate by measuring how long it takes to fill the portion of the hlsdemux buffer between offsets upstream_bufsize and hlsdemux_bufsize. We're not attempting to measure how long it takes to download a segment, but simply measure bitrate.

The prototype uses a GstBuffer which is filled with data arriving in _src_chain() until it becomes full. The contents of the buffer are then pushed downstream.

- The download_start_time variable is set when the buffer level increases above upstream_bufsize.
- The download_total_bytes variable is incremented by hlsdemux_bufsize - upstream_bufsize (so we don't take account of all data received, only the portion which we should have had to wait to download).
- In the EOS handler for the src, we push out anything remaining in the GstBuffer, but do not update download_total_bytes or download_total_time.

This prototype implementation breaks when we use HLS content with small segments, as the GstBuffer will never fill before EOS. This can be solved in a final implementation by measuring how long it takes to download the entire segment for segments which are < hlsdemux_bufsize. This works because we never get blocked pushing downstream with these small segments, as there'll be one push only, on src handler EOS.

There is no way to determine what upstream_bufsize is at runtime. It is the size of the socket receive buffer (which I believe encompasses the TCP receive window) plus whatever buffering libsoup may introduce. Without hacking on libsoup, we can't find out this information. So, I think the easiest option is to introduce a property to set it. I've experimented with values of 128k and 256k (the prototype assumes hlsdemux_bufsize == upstream_bufsize * 2) on my STB. Both work well, I do see some evidence of the original problem when using 128k, however this is small enough that I could live with it. The correct value will also vary depending how much memory the machine has, as I believe this effects the actual socker readbuffer sizes, unless they are explicitly set.

Testing (quick and dirty so far)

- Throttle the connection to 5120 kbit/s using dripls.
- Play back. The initial variant chosen is the first (and highest in the case of my test content), requiring about 10000 kbit/s.
- Using the original bitrate calculation algorithm, the bitrate is calculated at about 10000 kbit/s after downloading this fragment. About 3 seconds is spent pushing data downstream, and 2 seconds waiting for data to arrive from souphttpsrc.
- Using the prototype, the bitrate is calculated at about 4800 kbit/s.
- Playback for 90 seconds using the prototype, and observe that the bitrate is always about right.

An interesting observation from the testing is that the time taken to push buffers downstream were consistently smaller with the prototype. I presume this is because we were dealing in < 4K buffers previously, whereas we're now dealing with 256K buffers, and so are not pushing the first multiqueue into overrun so quickly. The logs show evidence to support this, but I can't say for sure.

I'd appreciate comments on this approach. I'll upload a patch later tonite, or tomorrow. Currently, what I have doesn't apply cleanly to 1.4, or at all to master (I'm working on a 1.4 based branch).

Comment 21 Duncan Palmer 2015-02-18 06:53:35 UTC

Created attachment 297075 [details] [review]
Prototype for pre-buffering in hlsdemux

This patch (against 1.4) contains the protoype work I described in my previous comment. It wasn't so hard to get it to apply to 1.4 after all. 

I've supplied this for review and comment on my approach only. It's rather hacky, and missing a few things. If people are happy with this approach, I'll re-implement.

There appear to be a few small changes in there which are unrelated to what I've done - not sure how they crept in. Also I haven't built or tested this on vanilla 1.4.

Comment 22 Duncan Palmer 2015-02-19 04:29:30 UTC

Well, it turns out I was lucky with my testing yesterday, and that my understanding of how the socket receive buffer is sized was incorrect. 

After reading tcp(7) I had thought that the read buffer size wouldn't increase beyond the size in /proc/sys/net/core/rmem_max, unless explicitly set with setsockopt(). After some testing, I see that it increases as far as the 3rd value in /proc/sys/net/ipv4/tcp_rmem, which is around 1MB on my STB, and around 6MB on my PC. 

So, 256K GstBuffer sizes are way too small. A mechanism which can be guaranteed to hold an entire hls segment without blocking is required. Just downloading entire hls segments before sending them downstream (hlsdemux used to do this I believe) would do the job, but this does have the potential to use a fair bit of memory.

Comment 23 Sebastian Dröge (slomo) 2015-02-19 08:16:24 UTC

It also means that you have to download a complete fragment before you can actually do something, which will increase startup times.

I think what is needed here is something in souphttpsrc that notifies us about the download rate. It can probably estimate that better than some other element downstream.

Comment 24 Olivier Crête 2015-02-19 17:54:17 UTC

I wonder if we could have a mode on the queue where it doesn't block on buffers, but only on specific events, and then GstBaseAdaptive could push those events between segments ? So the queue would only block on inter-segment boundaries? But that may also require some logic to know that the queue is empty enough to start downloading the next segment.

Comment 25 Duncan Palmer 2015-02-20 04:24:47 UTC

I'm implementing another solution which places a queue2 between the souphttpsrc and hlsdemux. The queue is sized so that it should never block. The bitrate can then be retrieved from either souphttpsrc or the queue2 element (queue2 keeps track of a running bitrate average internally).

This achieves much the same thing as querying downstream queue elements, but without the added complexity (querying downstreams queues for free space), and fragility (we don't need to assume whoever constructed the pipeline set max queue sizes and buffering thresholds correctly). It also puts the onus back onto souphttpsrc to tell us the bitrate.

Comment 26 Duncan Palmer 2015-02-23 04:41:40 UTC

Created attachment 297617 [details] [review]
Introduce a queue into hlsdemux to facilitate bitrate calculation

Comment 27 Duncan Palmer 2015-02-23 04:42:41 UTC

Created attachment 297618 [details] [review]
Add a property named avg-in-rate to queue2

Comment 28 Duncan Palmer 2015-02-23 04:43:13 UTC

These patches are against hlsdemux in 1.4, and implement the scheme I outlined
in my previous comment.

I've tossed up whether or not to modify souphttpsrc to have it tell us the
bitrate, but I prefer getting this information from the queue, as it means
we're independent of the element uridownloader chooses to place before the
queue.

So, I added a property named avg-in-rate to queue2 to retrieve the average
bitrate. This is not the overall bitrate, but a weighted running average, which
I think is fine. If we try and get the bitrate in the first 0.2 seconds, it
will not yet have been calculated, so I added a something to handle this case.

Testing:
- Modify hlsdemux so that _src_chain() sleeps for 7 seconds after a fixed
  number of fragments are received. Observe that the bitrate calculated by the
  queue is correct, and that calculated by the old hlsdemux method is way too
  high. This test simulates the issue which triggers this bug.
- Playback using dripls to switch bitrate a number of times. Observe that the
  calculated bitrate is correct for all segments.

I'd be interested in feedback on these changes.

Comment 29 Duncan Palmer 2015-02-23 06:16:40 UTC

This appears to break seeking. I would still appreciate any feedback while I fix that.

Comment 30 Duncan Palmer 2015-02-24 05:15:51 UTC

Created attachment 297729 [details] [review]
Introduce a queue into hlsdemux to facilitate bitrate calculation - take 2

The problem I found with seeking relates to content which uses byterange playlists. In this case, hlsdemux sends a SEEK event to the src bin when it's in the READY state. However, it turns out that most gstreamer elements can't deal with a SEEK event when in the READY state, so this doesn't work anymore. 

I've fixed this by sending the SEEK event directly to the uri_handler element (either souphttpsrc or filesrc in practice). I think this is an acceptable thing to do?

This patch obsoletes the previous changes to hlsdemux.

Comment 31 Sebastian Dröge (slomo) 2015-02-24 09:03:24 UTC

Seems overall like a sensible solution to me, especially as it reuses the rate calculation from elsewhere instead of adding yet another implementation.

Comment 32 Duncan Palmer 2015-03-17 22:42:05 UTC

Does anyone have further thoughts on this? I'd like to get a fix for this problem committed if possible. I realise it'll need reworking for master.

Comment 33 Duncan Palmer 2015-04-20 03:58:47 UTC

Created attachment 301964 [details] [review]
latest hlsdemux bitrate fix against 1.4

Comment 34 Duncan Palmer 2015-04-20 03:59:19 UTC

Created attachment 301965 [details] [review]
latets change to queue2 to add avg-in-rate property. against 1.4.

Comment 35 Duncan Palmer 2015-04-20 04:03:13 UTC

I've attached updated patches to hlsdemux and queue2, against 1.4. 

The patch to queue2 includes a fix for a unhandled corner case when calculating avg-in-rate.

The patch to hlsdemux casts the arguments to g_object_set(), when properties on queue2 are set.

I would like to get this work upstream in some form. There is a positive comment from Sebastian back in February, but I have seen nothing else. Can anyone offer suggestions as to how I can further this? Once I get some feedback that my approach is fine, and the changes against 1.4 look reasonable, I'll implement them against master.

Comment 36 Philippe Normand 2015-09-07 11:05:34 UTC

Created attachment 310797 [details] [review]
adaptivedemux: improved bitrate estimations

Bitrate estimation is now handled through a queue2 element added after
the source element used to download fragments.

Original hlsdemux patch by Duncan Palmer <dpalmer@digisoft.tv>

Comment 37 Rajat Verma 2015-10-15 10:42:06 UTC

This issue is affecting all three streaming protocols (HLS/DASH/MSS) and leads to bad user experience. Can it be please looked at by gstreamer maintainers?

Comment 38 Sebastian Dröge (slomo) 2015-10-15 10:48:11 UTC

I have it on my list, someone will look at it once it fits into their schedule.

Comment 39 Philippe Normand 2015-10-23 11:57:52 UTC

Created attachment 313935 [details] [review]
adaptivedemux: improved bitrate estimations

Bitrate estimation is now handled through a queue2 element added after
the source element used to download fragments.

Original hlsdemux patch by Duncan Palmer <dpalmer@digisoft.tv>


Rebased patch.

Comment 40 Rajat Verma 2015-10-27 10:24:39 UTC

Hello Philippe,

I tested your patch and can verify that now bitrate estimation is accurate.

my test results:

Stream used for testing : https://devimages.apple.com.edgekey.net/streaming/examples/bipbop_4x3/bipbop_4x3_variant.m3u8 hosted on local apache2 server (to avoid unintended bandwidth variation).

Represenations available:
Gear 1 : ~28 kBps
Gear 2 : ~80 kBps
Gear 3 : ~121 kBps
Gear 4 : ~235 kBps

Without these patches, on setting apache2 bandwidth to 150 kBps, adaptive demux keeps switching between gear 3 and gear 4 for whole duration of stream.

With these patches, adaptive demux is able to select correct representation all the time.

Comment 41 Thiago Sousa Santos 2015-10-27 22:33:59 UTC

Review of attachment 313935 [details] [review]:

+1 on this approach. Some minor remarks below.

::: gst-libs/gst/adaptivedemux/gstadaptivedemux.c
@@ +89,3 @@
 #define DEFAULT_CONNECTION_SPEED 0
 #define DEFAULT_BITRATE_LIMIT 0.8
+#define SRC_QUEUE_MAX_BYTES 20 * 1024 * 1024    /* For safety. Large enough to hold a segment. */

This is going to bite us some day, it would be nice to connect to the overrun signal on queue2 at least to print or post a warning so it is easy to identify in the future when the measure would be disturbed by this size.

* queue2 doesn't have the overrun, it would need to be implemented. Can likely be copied from queue.

@@ +1875,3 @@
+    queue_sink = gst_element_get_static_pad (queue, "sink");
+
+    if (GST_PAD_LINK_FAILED (gst_pad_link (uri_handler_src, queue_sink))) {

This link should never fail hierarchy or caps check, perhaps just use gst_pad_link_full and disable those checks.

Comment 42 Philippe Normand 2015-10-28 08:55:07 UTC

Thanks for the review Thiago, I'll update the patch :)

Comment 43 Philippe Normand 2015-11-04 11:05:17 UTC

Created attachment 314801 [details] [review]
queue2: add overrun signal

Comment 44 Philippe Normand 2015-11-04 11:05:57 UTC

Created attachment 314802 [details] [review]
adaptivedemux: improved bitrate estimations

Bitrate estimation is now handled through a queue2 element added after
the source element used to download fragments.

Original hlsdemux patch by Duncan Palmer <dpalmer@digisoft.tv>

Comment 45 Thiago Sousa Santos 2015-11-05 17:38:58 UTC

All patches look good for me, except that I changed them locally to use bytes/s instead of bits/s in the avg-in-rate property and updated adaptivedemux accordingly.

If no one opposes to that I can merge it tomorrow.

Comment 46 Philippe Normand 2015-11-06 07:50:37 UTC

(In reply to Thiago Sousa Santos from comment #45)
> All patches look good for me, except that I changed them locally to use
> bytes/s instead of bits/s in the avg-in-rate property and updated
> adaptivedemux accordingly.
> 
> If no one opposes to that I can merge it tomorrow.

No objections from here :)

Comment 47 Thiago Sousa Santos 2015-11-06 15:35:40 UTC

commit 45fa81e564fa2fc8130705e77e69a1d90c132068
Author: Duncan Palmer <dpalmer@digisoft.tv>
Date:   Mon Feb 23 13:16:19 2015 +1000

    queue2: Add the avg-in-rate property.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=733959

commit 8ae8b2723d0cf179a4f09b2f6c5f797e2d97034d
Author: Philippe Normand <philn@igalia.com>
Date:   Wed Nov 4 12:02:51 2015 +0100

    queue2: add overrun signal
    
    Notifies that the queue2 is full, same as queue does
    
    https://bugzilla.gnome.org/show_bug.cgi?id=733959


commit ccff3be3ab2e5bffcefc12c80a5edb225801f1b9
Author: Philippe Normand <philn@igalia.com>
Date:   Fri Sep 4 09:59:06 2015 +0200

    adaptivedemux: improved bitrate estimations
    
    Bitrate estimation is now handled through a queue2 element added after
    the source element used to download fragments.
    
    Original hlsdemux patch by Duncan Palmer <dpalmer@digisoft.tv>
    https://bugzilla.gnome.org/show_bug.cgi?id=733959

Comment 48 Philippe Normand 2015-11-06 15:44:59 UTC

Thanks! I suppose these patches can't be merged in 1.6 because of the queue2 signal/property, right?

Comment 49 Tim-Philipp Müller 2015-11-09 19:52:39 UTC

This broke one of the dash demux unit tests btw, see bug #757776 .

Comment 50 Tim-Philipp Müller 2015-11-10 21:32:59 UTC

> Thanks! I suppose these patches can't be merged in 1.6
> because of the queue2 signal/property, right?

Probably not, though it's possible, both are not very intrusive code-wise. Not sure about the adaptivedemux changes. The extra "up to 20MB" memory requirement introduced by the new queue seems quite hefty though, so that's perhaps not something we should pick into the stable branch.

I'm also not entirely happy about the new "overrun" property because that now basically imposes a non-trivial performance penalty in the main code path for everyone using the element even if the signal is not used. For the sole reason of printing a GST_WARNING debug log message in adaptivedemux! I wonder if you couldn't have just used the messages it posts by overriding the GstBin::handle_message() vfunc in adaptivedemux and dropping it there.

Comment 51 Duncan Palmer 2015-11-10 22:21:08 UTC

The real memory requirement is the fragment size of the highest bitrate variant. This will be a lot lower than 20MB in any practical application.

I'd like to see this in the stable branch, as without this change, hls (and I presume the other adaptive demuxers) are simply un-usable on anything but a fast and stable network connection.

Comment 52 Thiago Sousa Santos 2015-11-10 23:42:02 UTC

(In reply to Tim-Philipp Müller from comment #50)
> > Thanks! I suppose these patches can't be merged in 1.6
> > because of the queue2 signal/property, right?
> 
> Probably not, though it's possible, both are not very intrusive code-wise.
> Not sure about the adaptivedemux changes. The extra "up to 20MB" memory
> requirement introduced by the new queue seems quite hefty though, so that's
> perhaps not something we should pick into the stable branch.
> 
> I'm also not entirely happy about the new "overrun" property because that
> now basically imposes a non-trivial performance penalty in the main code
> path for everyone using the element even if the signal is not used. For the
> sole reason of printing a GST_WARNING debug log message in adaptivedemux! I
> wonder if you couldn't have just used the messages it posts by overriding
> the GstBin::handle_message() vfunc in adaptivedemux and dropping it there.

I was also thinking about this. Using the buffering messages is doable but we need to enable buffering which will make it post messages. I wonder if that won't be more overhead as we are only interested in buffering=100%. Also need to make sure it is dropped in adaptivedemux to prevent messing with the real buffering.

How about adding a 'silent' property to queue2 just like queue has? Or 'emit-signals' that is false by default.

Comment 53 Olivier Crête 2015-11-10 23:55:49 UTC

Possibly we could add a property to just send the buffering messages when it gets from 100% to non-100% and when it gets back to 100%. In a lot of cases, those are the only ones we care about.

Comment 54 Tim-Philipp Müller 2015-11-10 23:59:30 UTC

We can also check if there are signal handlers hooked up when going NULL->READY and use the internal boolean as a short-cut. Thing is just this is feels all rather silly because we've added this new API just for a GST_WARNING log message. We can easily drop messages from a GstBin::handle_message() vfunc.

Comment 55 Thiago Sousa Santos 2015-11-11 00:12:02 UTC

(In reply to Tim-Philipp Müller from comment #54)
> We can also check if there are signal handlers hooked up when going
> NULL->READY and use the internal boolean as a short-cut. Thing is just this
> is feels all rather silly because we've added this new API just for a
> GST_WARNING log message. We can easily drop messages from a
> GstBin::handle_message() vfunc.

The problem is that if we assume it just works for that limit, once resolutions improve that limit might not be enough. It could be turned into a warning message to let the application know that the measurement is not accurate enough.

Anyway, I agree that we can lower the overhead of the signal by only using it when it is needed (or reverting and resorting to the buffering messages).

Comment 56 Sebastian Dröge (slomo) 2015-11-18 09:02:42 UTC

*** Bug 735789 has been marked as a duplicate of this bug. ***

Comment 57 Thiago Sousa Santos 2015-11-24 14:44:47 UTC

Ping? What should we do here?

Comment 58 Duncan Palmer 2015-12-23 15:45:02 UTC

I've just noticed that ccff3be3 removed the bitrate averaging added in c98348c1. I don't think this is correct; queue2 keeps a moving bitrate average, but only over a period of about 3 seconds. We sample this after downloading each fragment.

The current solution will be as reactive to an increase in bitrate as it is to a decrease, whereas c98348c1 was trying to make it reactive to a decreasing bitrate but act more conservatively to an increasing bitrate.

I've proposed some changes to improve this in https://bugzilla.gnome.org/show_bug.cgi?id=747558

Comment 59 Tim-Philipp Müller 2016-01-19 19:42:15 UTC

I would like to remove the new signal again, since it's only used for a debugging message. We can then figure out how to re-instate the debugging message later, if people think it's essential.

Also re-opening for comment #58, which sounds like a regression was introduced (please correct me if I misunderstood).

Comment 60 Philippe Normand 2016-01-20 14:45:26 UTC

(In reply to Tim-Philipp Müller from comment #59)
> I would like to remove the new signal again, since it's only used for a
> debugging message. We can then figure out how to re-instate the debugging
> message later, if people think it's essential.
> 

Ok then, should we also remove the now unused overrun signal support from queue2 as well?

> Also re-opening for comment #58, which sounds like a regression was
> introduced (please correct me if I misunderstood).

Alright, my mistake indeed. I'll prepare a patch restoring the previous behavior.

Comment 61 Philippe Normand 2016-01-20 16:27:05 UTC

(In reply to Duncan Palmer from comment #58)
> I've just noticed that ccff3be3 removed the bitrate averaging added in
> c98348c1. I don't think this is correct; queue2 keeps a moving bitrate
> average, but only over a period of about 3 seconds. We sample this after
> downloading each fragment.

Ok so it doesn't make much sense to use the average bitrate reported by queue2, if I understand you correctly?

Comment 62 Duncan Palmer 2016-01-20 22:28:58 UTC

Using the bitrate reported by queue2 is the correct way to retrieve the bitrate for the current fragment. However, c98348c1 averages bitrate over several fragments and uses this average to decide when to switch up variant, as opposed to the bitrate of the last fragment, which it uses to decide when to move down.

We carry some further changes internally which introduce hysterisis to prevent continuous up/down switches when the average bitrate sits around the switching point. See https://bugzilla.gnome.org/show_bug.cgi?id=747558. I can do the work to upstream this + restore the behaviour from c98348c1 if you'd like.

Comment 63 Philippe Normand 2016-01-21 11:13:36 UTC

Created attachment 319489 [details] [review]
adaptivedemux: remove queue2 overrun notification

Due to performance impact concerns this is removed. An alternative
approach would be to rely on buffering messages monitoring.

Comment 64 Philippe Normand 2016-01-21 11:13:59 UTC

Created attachment 319490 [details] [review]
adaptivedemux: restore bitrate averaging support

This was accidentally removed in commit ccff3be3.

Comment 65 Sebastian Dröge (slomo) 2016-02-16 15:13:39 UTC

Attachment 319489 [details] pushed as 4d1489b - adaptivedemux: remove queue2 overrun notification
Attachment 319490 [details] pushed as 0b7276d - adaptivedemux: restore bitrate averaging support

Comment 66 Edward Hervey 2016-10-02 08:09:13 UTC

Created attachment 336749 [details] [review]
adaptivedemux: Calculate values before queue2

In order to calculate the *actual* bitrate for downloading a fragment
we need to take into account the time since we requested the fragment.

Without this, the bitrate calculations (previously reported by queue2)
would be biased since they wouldn't take into account the request latency
(that is the time between the moment we request a specific URI and the
moment we receive the first byte of that request).

Such examples were it would be biased would be high-bandwith but high-latency
networks. If you download 5MB in 500ms, but it takes 200ms to get the first
byte, queue2 would report 80Mbit/s (5Mb in 500ms) , but taking the request
into account it is only 57Mbit/s (5Mb in 700ms).

While this would not cause too much issues if the above fragment represented
a much longer duration (5s of content), it would cause issues with short
ones (say 1s, or when doing keyframe-only requests which are even shorter)
where the code would expect to be able to download up to 80Mbit/s ... whereas
if we take the request time into account it's much lower (and we would
therefore end up doing late requests).

Also calculate the request latency for debugging purposes and further
usage (it could allow us to figure out the maximum request rate for
example).

Comment 67 Edward Hervey 2016-10-02 08:42:46 UTC

Created a clone bug