GNOME Bugzilla – Bug 769768
rtpjitterbuffer: lots of improvements around RTX
Last modified: 2016-09-30 07:04:02 UTC
We have been focusing on RTX and making it work properly for our use-cases, and as a result we have a series of patches that we thought it best to present in one bug instead of splitting them up over several bugs. The overall thing that these patches is trying to do, is to make the jitterbuffer act really well against WebRTC and their implementation of RTX, and also provide the application with the means to determine the right settings (usually the right latency for the jitterbuffer) based on various stats. Perhaps the biggest and most important changes are the fact that we now will "wait" for rtx-packets that have been requested even after they have "expired", and to mark the rtx-packets with a new flag (patch for plugins-base and rtprtxreceive) that allows differentiation of "normal" and "rtx" packets. These changes allow the various rtx-stats to become a lot more reliable and again allows the application to act on them. The workflow we have been following have been to observe every "actually" lost packet (that rtx were unable to recover) and asking why this packets was not "saved" by rtx, and this has led us to finding many bugs and improvements, all which have been written proper tests for. We are pleased to report that up to 10% introduced packetloss from Chrome to GStreamer (using tools like netem) is now almost completely eliminated in our application using these changes, while still keeping the end-to-end latency at an acceptable level for real-time communication. The patches are ordered, and needs to be applied in that order.
Created attachment 333136 [details] [review] 0001
Created attachment 333137 [details] [review] expose more stats
Created attachment 333138 [details] [review] option to disable rtx-delay-reorder
Created attachment 333139 [details] [review] Major improvements for RTX stats
Created attachment 333140 [details] [review] Improve expected timer handling
Created attachment 333141 [details] [review] rtx-deadline as a property
Created attachment 333142 [details] [review] fix lost-duration when gap after lost-event
Created attachment 333143 [details] [review] don't request rtx if it is too late
Created attachment 333144 [details] [review] equidistant packet-spacing detector
Created attachment 333146 [details] [review] improved rtx-rtt averaging As this patch was a little short on an explanation, I might add something here. The basic idea is this: 1. For *larger* rtx-rtt, weigh a new measurement as before 2. For *smaller* rtx-rtt, be a bit more conservative and weigh a bit less 3. For very large measurements, consider them "outliers" and count them a lot less The idea being that reducing the rtx-rtt is much more harmful then increasing it, since we don't want to be underestimating the rtt of the network, and when using this number to estimate the latency you need for you jitterbuffer, you would rather want it to be a bit larger then a bit smaller, potentially losing rtx-packets. The "outlier-detector" is there to prevent a single skewed measurement to affect the outcome too much. On wireless networks, these are surprisingly common.
Created attachment 333148 [details] [review] rtxreceive: set buffer flag for rtx packets This is dependent on a gst-plugins-base patch, but not on the jitterbuffer-patches
Review of attachment 333136 [details] [review]: Don't we already compile everything with -Wno-unused-parameter .. GLib's API makes way too many of them.
Review of attachment 333139 [details] [review]: I'd also suggest the patch title to be something like "rtpjitterbuffer: Collect stats for unsuccessful packets" ::: gst/rtpmanager/gstrtpjitterbuffer.c @@ +718,3 @@ + * The time to wait for a retransmitted packet after it has been + * considered lost in order to collect RTX statistics. + * Please specify the unit !
Review of attachment 333140 [details] [review]: Your changing the algo that computes the delay, do you mind explaining why in the commit message? I don't understand why?
(In reply to Olivier Crête from comment #13) > Review of attachment 333139 [details] [review] [review]: > > I'd also suggest the patch title to be something like "rtpjitterbuffer: > Collect stats for unsuccessful packets" The patch does a whole lot more then just waiting for the rtx that did not arrive in time, if you look through the tests we wrote for this. We did however not find a proper way to split it up further since the changes are quite substantial, and hence the name. > > ::: gst/rtpmanager/gstrtpjitterbuffer.c > @@ +718,3 @@ > + * The time to wait for a retransmitted packet after it has been > + * considered lost in order to collect RTX statistics. > + * > > Please specify the unit ! It is specified in the property, but I can add it there as well.
(In reply to Olivier Crête from comment #14) > Review of attachment 333140 [details] [review] [review]: > > Your changing the algo that computes the delay, do you mind explaining why > in the commit message? I don't understand why? I wrote some ASCII art in some of the tests there to help us understand what is actually going on. :) Does that help?
Review of attachment 333141 [details] [review]: Why would the deadline ever differ from the latency? Can you add a bit of doc explaining that? I guess it's to be able to receive packets that are too late to display, but still allow you to not break inter-frame based decoding for later frames?
Review of attachment 333142 [details] [review]: Looks good, lets commit this with the previous commits when they are ready.
Review of attachment 333143 [details] [review]: Looks good, lets commit with the others.
(In reply to Olivier Crête from comment #17) > Review of attachment 333141 [details] [review] [review]: > > Why would the deadline ever differ from the latency? Can you add a bit of > doc explaining that? I guess it's to be able to receive packets that are too > late to display, but still allow you to not break inter-frame based decoding > for later frames? Just another flexibility for the application to control how long a RTX-request should be valid for in RTCP land. No point in sending another request for RTX if there is already one queued up waiting to be sent.
Review of attachment 333144 [details] [review]: That's an interesting heuristic.. I wonder if it would make sense to use the marker flag also.. After a packet with a marker bit set, you can assume that the next packet will be the first of the next frame. Also small bikeshedding, I dont like equidistant as it's not really what is tested, we've seen audio streams with variable duration packets which don't really fit the mold of equidistance.. Maybe something like "fragmented", I can't find find a great name either.
> > Please specify the unit ! > Looking at the code, none of rtx-retry-timeout, rtx-min-retry-timeout and rtx-retry-period specifies the unit, so I guess we just followed suit. Maybe someone instead should go over and clean up the docs?
Created attachment 333407 [details] [review] updated patch with unit specified
Review of attachment 333146 [details] [review]: Can you put the comments from your commit message in a comment in the code? They make a lot of sense and will be helpful when the next person tries to understand why those "magic" numbers are like that.
Created attachment 334030 [details] [review] improved rtx-rtt averaging updated
I also posted an RTX related improvement recently in bug 769769
*** Bug 770081 has been marked as a duplicate of this bug. ***
This one is blocked on #769771 currently, but I'd like to get all this in for 1.10 (after 1.9.2). (In reply to GstBlub from comment #26) > I also posted an RTX related improvement recently in bug 769769 How do they relate to each other? Independent?
> (In reply to GstBlub from comment #26) > > I also posted an RTX related improvement recently in bug 769769 > > How do they relate to each other? Independent? Yes, but I would really like to see a test for bug 769769 before that gets committed, and also an example of a real use-case, as I have yet to encounter anyone actually doing this.
(In reply to Håvard Graff (hgr) from comment #29) > Yes, but I would really like to see a test for bug 769769 before that gets > committed, and also an example of a real use-case, as I have yet to > encounter anyone actually doing this. I don't know if/when I get a chance to work on such a test. If you look at the diff, it basically is changing from determining which sequence numbers are lost (using the blp) and then sending an event for each lost sequence number to sending one event that contains the blp and do the exact same logic when receiving that event. In real world this is easy to hit with fairly high packet loss. I encounter this case quite frequently in a gstreamer<->gstreamer application, especially when I introduce artificial packet loss with an identity element.
Created attachment 334966 [details] [review] patch Update patch with a fix for some rebasing-foo
Created attachment 334967 [details] [review] updated: Major improvements for RTX stats Updated patch with a fix for some rebasing-foo
Created attachment 334968 [details] [review] updated: Major improvements for RTX stats Updated patch with fix for rebasing-foo
Created attachment 334969 [details] [review] Updated: Add and expose more stats Update patch after rebase-foo
Created attachment 334970 [details] [review] updated: improved rtx-rtt averaging Update patch with fix for rebase-foo
Comment on attachment 333136 [details] [review] 0001 Because of the way GObject signal works, we just compile everything with -Wno-unused-arguments otherwise it becomes madness!
Thanks, I pushed everything! The following fixes have been pushed: f440b07 rtpjitterbuffer: improved rtx-rtt averaging f8238f0 rtpjitterbuffer: Detect whether to assume equidistant spacing when loss 2eb7383 rtpjitterbuffer: Don't request rtx if 'now' is past retry period ab49dfd rtpjitterbuffer: Fix lost duration when gap after lost timer dd020f5 rtpjitterbuffer: Expose rtx-deadline as a property 8087a8a rtpjitterbuffer: Improved expected-timer handling when gap > 0 38a7545 rtpjitterbuffer: Major improvements for RTX stats 1b868cc rtpjitterbuffer: Add and expose more stats and increase testing of it 531199d rtxreceive: Set buffer flag for retransmitted packets 1436fc0 rtpjitterbuffer: Option to disable rtx-delay-reorder
Created attachment 335589 [details] [review] rtpjitterbuffer: improved rtx-rtt averaging The basic idea is this: 1. For *larger* rtx-rtt, weigh a new measurement as before 2. For *smaller* rtx-rtt, be a bit more conservative and weigh a bit less 3. For very large measurements, consider them "outliers" and count them a lot less The idea being that reducing the rtx-rtt is much more harmful then increasing it, since we don't want to be underestimating the rtt of the network, and when using this number to estimate the latency you need for you jitterbuffer, you would rather want it to be a bit larger then a bit smaller, potentially losing rtx-packets. The "outlier-detector" is there to prevent a single skewed measurement to affect the outcome too much. On wireless networks, these are surprisingly common.
Created attachment 335590 [details] [review] rtpjitterbuffer: Detect whether to assume equidistant spacing when loss Assuming equidistant packet spacing when that's not true leads to more loss than necessary in the case of reordering and jitter. Typically this is true for video where one frame often consists of multiple packets with the same rtp timestamp. In this case it's better to assume that the missing packets have the same timestamp as the last received packet, so that the scheduled lost timer does not time out too early causing the packets to be considered lost even though they may arrive in time.
Created attachment 335591 [details] [review] rtpjitterbuffer: Don't request rtx if 'now' is past retry period There is no need to schedule another EXPECTED timer if we're already past the retry period. Under normal operation this won't happen, but if there are more timers than the jitterbuffer is able to process in real-time, scheduling more timers will just make the situation worse. Instead, consider this packet as lost and move on. This scenario can occur with high loss rate, low rtt and high configured latency.
Created attachment 335592 [details] [review] rtpjitterbuffer: Fix lost duration when gap after lost timer This patch fixes an issue with the estimated gap duration when there is a gap immediately after a lost timer has been processed. Previously there was a discrepancy beteen the gap in seqnum and gap in dts which would cause wrong calculated duration. The issue would only be seen with retranmission enabled since when it's disabled lost timers are only created when a packet is received and the actual gap length and last dts is known.
Created attachment 335593 [details] [review] rtpjitterbuffer: Expose rtx-deadline as a property The default -1 gives the old behavior.
Created attachment 335594 [details] [review] rtpjitterbuffer: Improved expected-timer handling when gap > 0
Created attachment 335595 [details] [review] rtpjitterbuffer: Major improvements for RTX stats Stats should also be collected for unsuccessful packets. rtx-rtt is very important for determining the necessary configured latency on the jitterbuffer. It's especially important to be able to increase the latency when retransmitted packets arrive too late and are considered lost. This patch includes these late packets in the calculation of the various rtx stats, making them more correct and useful. Also in the case where the original packet arrives after a NACK is sent, the received RTX packet should update the stats since it provides useful information about RTT. The RTT is only updated if and only if all requested retranmissions are received. That way the RTT is guaranteed to make sense. If not we don't know which request the packet is a response to and the RTT may be bogus. A consequence of this patch is that RTT is not updated for a request when one of the RTX packets for that seqnum is lost, but that since measured RTT will be more accurate. The implementation store the RTX information from the timed out timers and use this when the retransmitted packet arrives. For performance these timers are stored separately from the "normal" timers in order to not impact performance (see attached performance test).
Created attachment 335596 [details] [review] rtpjitterbuffer: Add and expose more stats and increase testing of it Add num-pushed and num-lost. Expose num-late, num-duplicates and avg-jitter.
Created attachment 335597 [details] [review] rtxreceive: Set buffer flag for retransmitted packets
Created attachment 335598 [details] [review] rtpjitterbuffer: Option to disable rtx-delay-reorder When disabled we can save some iterations over timers. There is probably an argument for rtx-delay-reorder to exist, but for normal operations, handling jitter (reordering) is something a jitterbuffer should do, and this variable feels like functionality that is not "in-sync" with what the jitterbuffer is trying to achieve. Example: You have 50ms jitter on your network, and are receiving audio packets with 10ms durations. An audio packet should not be considered late until its rtx-timeout has expired (and hence a rtx-event is sent), but with rtx-delay-reorder, events will be sent pretty much all the time due to the jitter on the network. Point being: The jitterbuffer should adapt its size to the measured network jitter, and then rtx-delay-reorder needs to adapt as well, or simply get out of the way and let the other (better) rtx-mechanisms do their job. Also change find_timer to only use seqnum as an argument, since there will only ever be one timer per seqnum at any given time. In the one case where the type matters, the caller simply checks the type.