Bug 769768 – rtpjitterbuffer: lots of improvements around RTX

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 769768 - rtpjitterbuffer: lots of improvements around RTX


Summary:	rtpjitterbuffer: lots of improvements around RTX


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	git master
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	1.9.90
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Duplicates:	770081 (view as bug list)
Depends on:	769771
Blocks:

Reported:	2016-08-11 21:52 UTC by Håvard Graff (hgr)
Modified:	2016-09-30 07:04 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
0001 (2.39 KB, patch) 2016-08-11 21:53 UTC, Håvard Graff (hgr)	committed	Details \| Review
expose more stats (12.86 KB, patch) 2016-08-11 21:53 UTC, Håvard Graff (hgr)	none	Details \| Review
option to disable rtx-delay-reorder (6.41 KB, patch) 2016-08-11 21:54 UTC, Håvard Graff (hgr)	committed	Details \| Review
Major improvements for RTX stats (50.35 KB, patch) 2016-08-11 21:55 UTC, Håvard Graff (hgr)	none	Details \| Review
Improve expected timer handling (19.76 KB, patch) 2016-08-11 21:55 UTC, Håvard Graff (hgr)	committed	Details \| Review
rtx-deadline as a property (4.95 KB, patch) 2016-08-11 21:56 UTC, Håvard Graff (hgr)	committed	Details \| Review
fix lost-duration when gap after lost-event (4.77 KB, patch) 2016-08-11 21:57 UTC, Håvard Graff (hgr)	committed	Details \| Review
don't request rtx if it is too late (5.73 KB, patch) 2016-08-11 21:57 UTC, Håvard Graff (hgr)	committed	Details \| Review
equidistant packet-spacing detector (12.33 KB, patch) 2016-08-11 21:58 UTC, Håvard Graff (hgr)	committed	Details \| Review
improved rtx-rtt averaging (1.62 KB, patch) 2016-08-11 22:06 UTC, Håvard Graff (hgr)	none	Details \| Review
rtxreceive: set buffer flag for rtx packets (824 bytes, patch) 2016-08-11 22:12 UTC, Håvard Graff (hgr)	committed	Details \| Review
updated patch with unit specified (50.36 KB, patch) 2016-08-16 12:48 UTC, Håvard Graff (hgr)	none	Details \| Review
improved rtx-rtt averaging updated (2.59 KB, patch) 2016-08-23 18:25 UTC, Håvard Graff (hgr)	none	Details \| Review
patch (12.79 KB, patch) 2016-09-07 08:39 UTC, Håvard Graff (hgr)	none	Details \| Review
updated: Major improvements for RTX stats (50.31 KB, patch) 2016-09-07 08:41 UTC, Håvard Graff (hgr)	none	Details \| Review
updated: Major improvements for RTX stats (50.31 KB, patch) 2016-09-07 08:44 UTC, Håvard Graff (hgr)	committed	Details \| Review
Updated: Add and expose more stats (12.79 KB, patch) 2016-09-07 08:45 UTC, Håvard Graff (hgr)	committed	Details \| Review
updated: improved rtx-rtt averaging (3.33 KB, patch) 2016-09-07 08:52 UTC, Håvard Graff (hgr)	committed	Details \| Review
rtpjitterbuffer: improved rtx-rtt averaging (3.39 KB, patch) 2016-09-14 23:42 UTC, Olivier Crête	committed	Details \| Review
rtpjitterbuffer: Detect whether to assume equidistant spacing when loss (12.36 KB, patch) 2016-09-14 23:42 UTC, Olivier Crête	committed	Details \| Review
rtpjitterbuffer: Don't request rtx if 'now' is past retry period (5.75 KB, patch) 2016-09-14 23:42 UTC, Olivier Crête	committed	Details \| Review
rtpjitterbuffer: Fix lost duration when gap after lost timer (4.79 KB, patch) 2016-09-14 23:42 UTC, Olivier Crête	committed	Details \| Review
rtpjitterbuffer: Expose rtx-deadline as a property (4.99 KB, patch) 2016-09-14 23:42 UTC, Olivier Crête	committed	Details \| Review
rtpjitterbuffer: Improved expected-timer handling when gap > 0 (19.78 KB, patch) 2016-09-14 23:42 UTC, Olivier Crête	committed	Details \| Review
rtpjitterbuffer: Major improvements for RTX stats (50.32 KB, patch) 2016-09-14 23:42 UTC, Olivier Crête	committed	Details \| Review
rtpjitterbuffer: Add and expose more stats and increase testing of it (12.85 KB, patch) 2016-09-14 23:42 UTC, Olivier Crête	committed	Details \| Review
rtxreceive: Set buffer flag for retransmitted packets (851 bytes, patch) 2016-09-14 23:42 UTC, Olivier Crête	committed	Details \| Review
rtpjitterbuffer: Option to disable rtx-delay-reorder (6.44 KB, patch) 2016-09-14 23:42 UTC, Olivier Crête	committed	Details \| Review

Description Håvard Graff (hgr) 2016-08-11 21:52:05 UTC

We have been focusing on RTX and making it work properly for our use-cases, and as a result we have a series of patches that we thought it best to present in one bug instead of splitting them up over several bugs.

The overall thing that these patches is trying to do, is to make the jitterbuffer act really well against WebRTC and their implementation of RTX, and also provide the application with the means to determine the right settings (usually the right latency for the jitterbuffer) based on various stats.

Perhaps the biggest and most important changes are the fact that we now will "wait" for rtx-packets that have been requested even after they have "expired", and to mark the rtx-packets with a new flag (patch for plugins-base and rtprtxreceive) that allows differentiation of "normal" and "rtx" packets. These changes allow the various rtx-stats to become a lot more reliable and again allows the application to act on them.

The workflow we have been following have been to observe every "actually" lost packet (that rtx were unable to recover) and asking why this packets was not "saved" by rtx, and this has led us to finding many bugs and improvements, all which have been written proper tests for.

We are pleased to report that up to 10% introduced packetloss from Chrome to GStreamer (using tools like netem) is now almost completely eliminated in our application using these changes, while still keeping the end-to-end latency at an acceptable level for real-time communication.

The patches are ordered, and needs to be applied in that order.

Comment 1 Håvard Graff (hgr) 2016-08-11 21:53:14 UTC

Created attachment 333136 [details] [review]
0001

Comment 2 Håvard Graff (hgr) 2016-08-11 21:53:55 UTC

Created attachment 333137 [details] [review]
expose more stats

Comment 3 Håvard Graff (hgr) 2016-08-11 21:54:39 UTC

Created attachment 333138 [details] [review]
option to disable rtx-delay-reorder

Comment 4 Håvard Graff (hgr) 2016-08-11 21:55:10 UTC

Created attachment 333139 [details] [review]
Major improvements for RTX stats

Comment 5 Håvard Graff (hgr) 2016-08-11 21:55:45 UTC

Created attachment 333140 [details] [review]
Improve expected timer handling

Comment 6 Håvard Graff (hgr) 2016-08-11 21:56:15 UTC

Created attachment 333141 [details] [review]
rtx-deadline as a property

Comment 7 Håvard Graff (hgr) 2016-08-11 21:57:05 UTC

Created attachment 333142 [details] [review]
fix lost-duration when gap after lost-event

Comment 8 Håvard Graff (hgr) 2016-08-11 21:57:40 UTC

Created attachment 333143 [details] [review]
don't request rtx if it is too late

Comment 9 Håvard Graff (hgr) 2016-08-11 21:58:08 UTC

Created attachment 333144 [details] [review]
equidistant packet-spacing detector

Comment 10 Håvard Graff (hgr) 2016-08-11 22:06:51 UTC

Created attachment 333146 [details] [review]
improved rtx-rtt averaging

As this patch was a little short on an explanation, I might add something here. The basic idea is this:
1. For *larger* rtx-rtt, weigh a new measurement as before
2. For *smaller* rtx-rtt, be a bit more conservative and weigh a bit less
3. For very large measurements, consider them "outliers" and count them a lot less

The idea being that reducing the rtx-rtt is much more harmful then increasing it, since we don't want to be underestimating the rtt of the network, and when using this number to estimate the latency you need for you jitterbuffer, you would rather want it to be a bit larger then a bit smaller, potentially losing rtx-packets. The "outlier-detector" is there to prevent a single skewed measurement to affect the outcome too much. On wireless networks, these are surprisingly common.

Comment 11 Håvard Graff (hgr) 2016-08-11 22:12:43 UTC

Created attachment 333148 [details] [review]
rtxreceive: set buffer flag for rtx packets

This is dependent on a gst-plugins-base patch, but not on the jitterbuffer-patches

Comment 12 Olivier Crête 2016-08-16 12:02:58 UTC

Review of attachment 333136 [details] [review]:

Don't we already compile everything with -Wno-unused-parameter .. GLib's API makes way too many of them.

Comment 13 Olivier Crête 2016-08-16 12:14:36 UTC

Review of attachment 333139 [details] [review]:

I'd also suggest the patch title to be something like "rtpjitterbuffer: Collect stats for unsuccessful packets"

::: gst/rtpmanager/gstrtpjitterbuffer.c
@@ +718,3 @@
+   * The time to wait for a retransmitted packet after it has been
+   * considered lost in order to collect RTX statistics.
+   *

Please specify the unit !

Comment 14 Olivier Crête 2016-08-16 12:20:17 UTC

Review of attachment 333140 [details] [review]:

Your changing the algo that computes the delay, do you mind explaining why in the commit message? I don't understand why?

Comment 15 Håvard Graff (hgr) 2016-08-16 12:21:22 UTC

(In reply to Olivier Crête from comment #13)
> Review of attachment 333139 [details] [review] [review]:
> 
> I'd also suggest the patch title to be something like "rtpjitterbuffer:
> Collect stats for unsuccessful packets"

The patch does a whole lot more then just waiting for the rtx that did not arrive in time, if you look through the tests we wrote for this. We did however not find a proper way to split it up further since the changes are quite substantial, and hence the name.

> 
> ::: gst/rtpmanager/gstrtpjitterbuffer.c
> @@ +718,3 @@
> +   * The time to wait for a retransmitted packet after it has been
> +   * considered lost in order to collect RTX statistics.
> +   *
> 
> Please specify the unit !

It is specified in the property, but I can add it there as well.

Comment 16 Håvard Graff (hgr) 2016-08-16 12:22:32 UTC

(In reply to Olivier Crête from comment #14)
> Review of attachment 333140 [details] [review] [review]:
> 
> Your changing the algo that computes the delay, do you mind explaining why
> in the commit message? I don't understand why?

I wrote some ASCII art in some of the tests there to help us understand what is actually going on. :) Does that help?

Comment 17 Olivier Crête 2016-08-16 12:25:00 UTC

Review of attachment 333141 [details] [review]:

Why would the deadline ever differ from the latency? Can you add a bit of doc explaining that? I guess it's to be able to receive packets that are too late to display, but still allow you to not break inter-frame based decoding for later frames?

Comment 18 Olivier Crête 2016-08-16 12:26:56 UTC

Review of attachment 333142 [details] [review]:

Looks good, lets commit this with the previous commits when they are ready.

Comment 19 Olivier Crête 2016-08-16 12:27:47 UTC

Review of attachment 333143 [details] [review]:

Looks good, lets commit with the others.

Comment 20 Håvard Graff (hgr) 2016-08-16 12:30:37 UTC

(In reply to Olivier Crête from comment #17)
> Review of attachment 333141 [details] [review] [review]:
> 
> Why would the deadline ever differ from the latency? Can you add a bit of
> doc explaining that? I guess it's to be able to receive packets that are too
> late to display, but still allow you to not break inter-frame based decoding
> for later frames?

Just another flexibility for the application to control how long a RTX-request should be valid for in RTCP land. No point in sending another request for RTX if there is already one queued up waiting to be sent.

Comment 21 Olivier Crête 2016-08-16 12:34:21 UTC

Review of attachment 333144 [details] [review]:

That's an interesting heuristic.. I wonder if it would make sense to use the marker flag also.. After a packet with a marker bit set, you can assume that the next packet will be the first of the next frame.

Also small bikeshedding, I dont like equidistant as it's not really what is tested, we've seen audio streams with variable duration packets which don't really fit the mold of equidistance.. Maybe something like "fragmented", I can't find find a great name either.

Comment 22 Håvard Graff (hgr) 2016-08-16 12:41:30 UTC

> > Please specify the unit !
> 
Looking at the code, none of rtx-retry-timeout, rtx-min-retry-timeout and rtx-retry-period specifies the unit, so I guess we just followed suit. Maybe someone instead should go over and clean up the docs?

Comment 23 Håvard Graff (hgr) 2016-08-16 12:48:45 UTC

Created attachment 333407 [details] [review]
updated patch with unit specified

Comment 24 Olivier Crête 2016-08-19 23:10:38 UTC

Review of attachment 333146 [details] [review]:

Can you put the comments from your commit message in a comment in the code? They make a lot of sense and will be helpful when the next person tries to understand why those "magic" numbers are like that.

Comment 25 Håvard Graff (hgr) 2016-08-23 18:25:59 UTC

Created attachment 334030 [details] [review]
improved rtx-rtt averaging updated

Comment 26 GstBlub 2016-08-30 16:16:15 UTC

I also posted an RTX related improvement recently in bug 769769

Comment 27 Sebastian Dröge (slomo) 2016-09-01 07:21:45 UTC

*** Bug 770081 has been marked as a duplicate of this bug. ***

Comment 28 Sebastian Dröge (slomo) 2016-09-01 07:23:24 UTC

This one is blocked on #769771 currently, but I'd like to get all this in for 1.10 (after 1.9.2).

(In reply to GstBlub from comment #26)
> I also posted an RTX related improvement recently in bug 769769

How do they relate to each other? Independent?

Comment 29 Håvard Graff (hgr) 2016-09-01 07:27:18 UTC

> (In reply to GstBlub from comment #26)
> > I also posted an RTX related improvement recently in bug 769769
> 
> How do they relate to each other? Independent?

Yes, but I would really like to see a test for bug 769769 before that gets committed, and also an example of a real use-case, as I have yet to encounter anyone actually doing this.

Comment 30 GstBlub 2016-09-02 22:43:16 UTC

(In reply to Håvard Graff (hgr) from comment #29)
> Yes, but I would really like to see a test for bug 769769 before that gets
> committed, and also an example of a real use-case, as I have yet to
> encounter anyone actually doing this.
I don't know if/when I get a chance to work on such a test.  If you look at the diff, it basically is changing from determining which sequence numbers are lost (using the blp) and then sending an event for each lost sequence number to sending one event that contains the blp and do the exact same logic when receiving that event.

In real world this is easy to hit with fairly high packet loss.  I encounter this case quite frequently in a gstreamer<->gstreamer application, especially when I introduce artificial packet loss with an identity element.

Comment 31 Håvard Graff (hgr) 2016-09-07 08:39:53 UTC

Created attachment 334966 [details] [review]
patch

Update patch with a fix for some rebasing-foo

Comment 32 Håvard Graff (hgr) 2016-09-07 08:41:45 UTC

Created attachment 334967 [details] [review]
updated: Major improvements for RTX stats

Updated patch with a fix for some rebasing-foo

Comment 33 Håvard Graff (hgr) 2016-09-07 08:44:13 UTC

Created attachment 334968 [details] [review]
updated: Major improvements for RTX stats

Updated patch with fix for rebasing-foo

Comment 34 Håvard Graff (hgr) 2016-09-07 08:45:15 UTC

Created attachment 334969 [details] [review]
Updated: Add and expose more stats

Update patch after rebase-foo

Comment 35 Håvard Graff (hgr) 2016-09-07 08:52:57 UTC

Created attachment 334970 [details] [review]
updated: improved rtx-rtt averaging

Update patch with fix for rebase-foo

Comment 36 Olivier Crête 2016-09-14 19:37:09 UTC

Comment on attachment 333136 [details] [review]
0001

Because of the way GObject signal works, we just compile everything with -Wno-unused-arguments otherwise it becomes madness!

Comment 37 Olivier Crête 2016-09-14 23:41:20 UTC

Thanks, I pushed everything!

The following fixes have been pushed:
f440b07 rtpjitterbuffer: improved rtx-rtt averaging
f8238f0 rtpjitterbuffer: Detect whether to assume equidistant spacing when loss
2eb7383 rtpjitterbuffer: Don't request rtx if 'now' is past retry period
ab49dfd rtpjitterbuffer: Fix lost duration when gap after lost timer
dd020f5 rtpjitterbuffer: Expose rtx-deadline as a property
8087a8a rtpjitterbuffer: Improved expected-timer handling when gap > 0
38a7545 rtpjitterbuffer: Major improvements for RTX stats
1b868cc rtpjitterbuffer: Add and expose more stats and increase testing of it
531199d rtxreceive: Set buffer flag for retransmitted packets
1436fc0 rtpjitterbuffer: Option to disable rtx-delay-reorder

Comment 38 Olivier Crête 2016-09-14 23:42:03 UTC

Created attachment 335589 [details] [review]
rtpjitterbuffer: improved rtx-rtt averaging

The basic idea is this:
1. For *larger* rtx-rtt, weigh a new measurement as before
2. For *smaller* rtx-rtt, be a bit more conservative and weigh a bit less
3. For very large measurements, consider them "outliers"
   and count them a lot less

The idea being that reducing the rtx-rtt is much more harmful then
increasing it, since we don't want to be underestimating the rtt of the
network, and when using this number to estimate the latency you need for
you jitterbuffer, you would rather want it to be a bit larger then a bit
smaller, potentially losing rtx-packets. The "outlier-detector" is there
to prevent a single skewed measurement to affect the outcome too much.
On wireless networks, these are surprisingly common.

Comment 39 Olivier Crête 2016-09-14 23:42:08 UTC

Created attachment 335590 [details] [review]
rtpjitterbuffer: Detect whether to assume equidistant spacing when loss

Assuming equidistant packet spacing when that's not true leads to more
loss than necessary in the case of reordering and jitter. Typically this
is true for video where one frame often consists of multiple packets
with the same rtp timestamp. In this case it's better to assume that the
missing packets have the same timestamp as the last received packet, so
that the scheduled lost timer does not time out too early causing the
packets to be considered lost even though they may arrive in time.

Comment 40 Olivier Crête 2016-09-14 23:42:13 UTC

Created attachment 335591 [details] [review]
rtpjitterbuffer: Don't request rtx if 'now' is past retry period

There is no need to schedule another EXPECTED timer if we're already
past the retry period. Under normal operation this won't happen, but if
there are more timers than the jitterbuffer is able to process in
real-time, scheduling more timers will just make the situation worse.
Instead, consider this packet as lost and move on. This scenario can
occur with high loss rate, low rtt and high configured latency.

Comment 41 Olivier Crête 2016-09-14 23:42:18 UTC

Created attachment 335592 [details] [review]
rtpjitterbuffer: Fix lost duration when gap after lost timer

This patch fixes an issue with the estimated gap duration when there is
a gap immediately after a lost timer has been processed. Previously
there was a discrepancy beteen the gap in seqnum and gap in dts which
would cause wrong calculated duration. The issue would only be seen with
retranmission enabled since when it's disabled lost timers are only
created when a packet is received and the actual gap length and last dts
is known.

Comment 42 Olivier Crête 2016-09-14 23:42:24 UTC

Created attachment 335593 [details] [review]
rtpjitterbuffer: Expose rtx-deadline as a property

The default -1 gives the old behavior.

Comment 43 Olivier Crête 2016-09-14 23:42:29 UTC

Created attachment 335594 [details] [review]
rtpjitterbuffer: Improved expected-timer handling when gap > 0

Comment 44 Olivier Crête 2016-09-14 23:42:35 UTC

Created attachment 335595 [details] [review]
rtpjitterbuffer: Major improvements for RTX stats

Stats should also be collected for unsuccessful packets.

rtx-rtt is very important for determining the necessary configured
latency on the jitterbuffer. It's especially important to be able to
increase the latency when retransmitted packets arrive too late and are
considered lost. This patch includes these late packets in the
calculation of the various rtx stats, making them more correct and
useful.

Also in the case where the original packet arrives after a NACK is sent,
the received RTX packet should update the stats since it provides useful
information about RTT.

The RTT is only updated if and only if all requested retranmissions are
received. That way the RTT is guaranteed to make sense. If not we don't
know which request the packet is a response to and the RTT may be bogus.
A consequence of this patch is that RTT is not updated for a request
when one of the RTX packets for that seqnum is lost, but that since
measured RTT will be more accurate.

The implementation store the RTX information from the timed out timers
and use this when the retransmitted packet arrives. For performance
these timers are stored separately from the "normal" timers in order to
not impact performance (see attached performance test).

Comment 45 Olivier Crête 2016-09-14 23:42:40 UTC

Created attachment 335596 [details] [review]
rtpjitterbuffer: Add and expose more stats and increase testing of it

Add num-pushed and num-lost.
Expose num-late, num-duplicates and avg-jitter.

Comment 46 Olivier Crête 2016-09-14 23:42:45 UTC

Created attachment 335597 [details] [review]
rtxreceive: Set buffer flag for retransmitted packets

Comment 47 Olivier Crête 2016-09-14 23:42:51 UTC

Created attachment 335598 [details] [review]
rtpjitterbuffer: Option to disable rtx-delay-reorder

When disabled we can save some iterations over timers.

There is probably an argument for rtx-delay-reorder to exist, but
for normal operations, handling jitter (reordering) is something a
jitterbuffer should do, and this variable feels like functionality that
is not "in-sync" with what the jitterbuffer is trying to achieve.

Example: You have 50ms jitter on your network, and are receiving
audio packets with 10ms durations. An audio packet should not be
considered late until its rtx-timeout has expired (and hence a rtx-event
is sent), but with rtx-delay-reorder, events will be sent pretty much
all the time due to the jitter on the network.

Point being: The jitterbuffer should adapt its size to the measured network
jitter, and then rtx-delay-reorder needs to adapt as well, or simply
get out of the way and let the other (better) rtx-mechanisms do their job.

Also change find_timer to only use seqnum as an argument, since there
will only ever be one timer per seqnum at any given time. In the
one case where the type matters, the caller simply checks the type.