GNOME Bugzilla – Bug 772646
rtpjitterbuffer: fix lost-event using dts instead of pts
Last modified: 2016-12-05 09:18:01 UTC
Created attachment 337270 [details] [review] Tests and fix The lost-event was using a different time-domain (dts) than the outgoing buffers (pts). Given certain network-conditions these two would become sufficiently different and the lost-event contained timestamp/duration that was really wrong. As an example GstAudioDecoder could produce a stream that jumps back and forth in time after receiving a lost-event. The previous behavior calculated the pts (based on the rtptime) inside the rtp_jitter_buffer_insert function, but now this functionality has been refactored into a new function rtp_jitter_buffer_calculate_pts that is called much earlier in the _chain function to make pts available to various calculations that wrongly used dts previously (like the lost-event). There are however two calculations where using dts is the right thing to do: calculating the receive-jitter and the rtx-round-trip-time, where the arrival time of the buffer from the network is the right metric (and is what dts in fact is today). The patch also adds two tests regarding B-frames or the “rtptime-going-backwards”-scenario, as there were some concerns that this patch might break this behavior (which the tests shows it does not).
Created attachment 337275 [details] [review] updated test and fix We found that pts was calculated a bit too early in the last patch, wait with the calculation until it is actually needed.
Review of attachment 337275 [details] [review]: Just some comments for now ::: gst/rtpmanager/gstrtpjitterbuffer.c @@ +2342,2 @@ /* calculate expected arrival time of the next seqnum */ + expected = pts + priv->packet_spacing; Isn't the arrival time dependant on the dts (aka arrival time at this point)? While the packet spacing would depend on the pts I guess @@ +2408,3 @@ static void calculate_expected (GstRtpJitterBuffer * jitterbuffer, guint32 expected, + guint16 seqnum, GstClockTime pts, gint gap) Same here, shouldn't this consider the actual arrival times of previous packets (dts) too? @@ +2832,3 @@ + /* calculate a pts based on rtptime and arrival time (dts) */ + pts = rtp_jitter_buffer_calculate_pts (priv->jbuf, dts, rtptime, Calculating the PTS changes jbuf's state... is this going to cause problems if we later drop this specific packet? Previously we would only change the state once the packet is definitely inserted. @@ +2963,3 @@ * possible moment to push this buffer, maybe we get an earlier seqnum * while we wait */ + set_timer (jitterbuffer, TIMER_TYPE_DEADLINE, seqnum, pts); And timers also seem like something that is related to dts (arrival times)
(In reply to Sebastian Dröge (slomo) from comment #2) > Review of attachment 337275 [details] [review] [review]: > > Just some comments for now > > ::: gst/rtpmanager/gstrtpjitterbuffer.c > @@ +2342,2 @@ > /* calculate expected arrival time of the next seqnum */ > + expected = pts + priv->packet_spacing; > > Isn't the arrival time dependant on the dts (aka arrival time at this > point)? While the packet spacing would depend on the pts I guess So this is a very good question. I was sure using dts was right for a very long time, but have now changed my mind and believe pts is the right answer. Let me try to explain why: Expected (rtx) timers are all about sending a rtx-request for a packet that has not arrived (yet). If you imagine a "network" (upstream) and a "decoder" (downstream) side of the jitterbuffer (JB), the idea is that the network is chaotic, and the job of the JB is to create order for the decoder. Now, complete chaos in when we are receiving packets should not really affect when the decoder needs those packets. Extreme jitter and reordering of packets on the network side should not change the timing of when those buffers needs to be presented to the decoder. Hence, the expected time of those packets should be driven by their need for being presented in time, and not by when the happen to arrive from the wire. > > @@ +2832,3 @@ > > + /* calculate a pts based on rtptime and arrival time (dts) */ > + pts = rtp_jitter_buffer_calculate_pts (priv->jbuf, dts, rtptime, > > Calculating the PTS changes jbuf's state... is this going to cause problems > if we later drop this specific packet? Previously we would only change the > state once the packet is definitely inserted. Very good question. I was worried about this too, but have not found any cases (practical or theoretical) where it is a problem. All existing tests are passing, but I am very happy to help write up a new test-case to prove me wrong though.
(In reply to Håvard Graff (hgr) from comment #3) > > @@ +2832,3 @@ > > > > + /* calculate a pts based on rtptime and arrival time (dts) */ > > + pts = rtp_jitter_buffer_calculate_pts (priv->jbuf, dts, rtptime, > > > > Calculating the PTS changes jbuf's state... is this going to cause problems > > if we later drop this specific packet? Previously we would only change the > > state once the packet is definitely inserted. > > Very good question. I was worried about this too, but have not found any > cases (practical or theoretical) where it is a problem. All existing tests > are passing, but I am very happy to help write up a new test-case to prove > me wrong though. It will break for all cases where you get a packet with a completely different seqnum (and timestamp!) but ignore it as there are not enough consecutive seqnums following (see the gap code below your PTS calculation, the cases where it does not go into reset). So for example some spurious packet arriving here.
(In reply to Sebastian Dröge (slomo) from comment #4) > It will break for all cases where you get a packet with a completely > different seqnum (and timestamp!) but ignore it as there are not enough > consecutive seqnums following (see the gap code below your PTS calculation, > the cases where it does not go into reset). So for example some spurious > packet arriving here. Sorry, I am not quite following. Why don't we write some tests to check old vs. new behavior instead of theorizing around it? If you can describe the behavior you want to make sure are not broken with the new patch, I am happy to try and write it up. Steady flow of buffers interrupted with a buffer with completely unrelated seqnum and timestamp, and then going back to the normal stream? What is the expected behavior? Why would we encounter such a buffer? Given that this is a stream of the same SSRC, we would either likely be facing a faulty rtp-implementation (those exist!) or a malicious attack (those exist too!). I can think of three scenarios: 1. The single bad packet: 1, 2, 3, 4, 60000, 5, 6, 7, 8, etc. 2. Restarting the stream: 1, 2, 3, 4, 60000, 60001, 60002, 60003, etc. 3. Interleaved madness: 1, 2, 3, 4, 60000, 5, 60001, 6, 60002, 7 etc. Maybe we should first decide what is the desired behavior in each of those cases, then check how that matches reality for the old and the new implementation? Sounds good?
(In reply to Håvard Graff (hgr) from comment #5) > 1. The single bad packet: 1, 2, 3, 4, 60000, 5, 6, 7, 8, etc. That packet is queued up, but as the following ones are not a continuation it is dropped again. (we wait for iirc 5 consecutive packets) > 2. Restarting the stream: 1, 2, 3, 4, 60000, 60001, 60002, 60003, etc. We queue up the new packets, and after 5 consecutive ones they are all output after resetting the jitterbuffer > 3. Interleaved madness: 1, 2, 3, 4, 60000, 5, 60001, 6, 60002, 7 etc. Chaos :) We would output the 1, 2, 3... stream and queue up the 6000x packets just to drop them again on the next packet. What comes first wins until there are 5 consecutive packets of something new If this is the desired behaviour, I don't know. It seems sensible though and that was is currently implemented. The problem here now is that with your patch, case 1) and case 3) are updating jitterbuffer state for the 6000x packets although they are thrown away. As the jitterbuffer stores the ext timestamp, recvtimes, sendtimes, this seems undesirable.
Ok, so how about this as a plan: 1. I will write up a test for case 1. and try to craft it so that it will fail with the suggested patch. 2. I will move around the gap-handling code a bit (maybe refactor and split into functions where possible) to do the "reset"-cases prior to calculating pts, which should give the exact same behavior as before. Sounds good?
Sounds good, yes. Thanks :)
So I have experimented a bit over the last few days, and I can't make it fail... Here is the test I ended up writing: GST_START_TEST (test_single_bad_packet_in_stream) { GstHarness *h = gst_harness_new ("rtpjitterbuffer"); GstClockTime now = 0; const gint num_consecutive = 5; gint i; g_object_set (h->element, "latency", 80, NULL); gst_harness_set_src_caps (h, generate_caps ()); for (i = 0; i < num_consecutive; i++) { gst_harness_set_time (h, now); fail_unless_equals_int (GST_FLOW_OK, gst_harness_push (h, generate_test_buffer_full (now, TRUE, i, i * PCMU_RTP_TS_DURATION))); now += PCMU_BUF_DURATION; } fail_unless (gst_harness_crank_single_clock_wait (h)); for (i = 0; i < num_consecutive; i++) { GstBuffer *buf = gst_harness_pull (h); fail_unless_equals_int (i, get_rtp_seq_num (buf)); fail_unless_equals_uint64 (i * PCMU_BUF_DURATION, GST_BUFFER_PTS (buf)); gst_buffer_unref (buf); } /* push one packet with a completely wrong sequence number and rtp-timestamp */ fail_unless_equals_int (GST_FLOW_OK, gst_harness_push (h, generate_test_buffer_full (now, TRUE, 1000, 22222 * PCMU_RTP_TS_DURATION))); /* and then continue the same flow of buffers */ for (i = num_consecutive; i < num_consecutive * 2; i++) { gst_harness_set_time (h, now); fail_unless_equals_int (GST_FLOW_OK, gst_harness_push (h, generate_test_buffer_full (now, TRUE, i, i * PCMU_RTP_TS_DURATION))); now += PCMU_BUF_DURATION; } for (i = num_consecutive; i < num_consecutive * 2; i++) { GstBuffer *buf = gst_harness_pull (h); fail_unless_equals_int (i, get_rtp_seq_num (buf)); fail_unless_equals_uint64 (i * PCMU_BUF_DURATION, GST_BUFFER_PTS (buf)); gst_buffer_unref (buf); } gst_harness_teardown (h); } GST_END_TEST; As you can see, I verify that pts is correct on all outgoing buffers, and for the one "bad" packet in the middle there, I am able to trigger a few different code-paths (some of which looks like bugs) by changing the seqnum and rtp-timestamp to various things, but I am unable to make the stream fail. Maybe you could try and experiment a bit with that test to see if you can get a failing scenario I could work with?
I don't have the time for that at least in the next two weeks, and lots of other bugs that are already piling up on my list, sorry. But even if you can't actually cause problems here, you surely agree that changing the jitterbuffer state based on such broken packets that we also drop afterwards is less than ideal. It's going to bite us sooner or later and then we'll have a difficult to debug problem somewhere. But maybe that you can't make it fail now is an indication that the existing jitterbuffer code is broken. Does your wrong packet go into the skew calculation code, does it cause updating of the times in the jitterbuffer, and are those times used then for the following packets? It might not have any effect here because we only calculate the skew from the packets with the lowest jitter (the window of packets there), and your packet is always at the very top?
Created attachment 338991 [details] [review] refactor the gap-handling I have tried to make the gap-handling code a bit clearer and added the pts-calculation in two places where it is now actually needed.
Created attachment 339053 [details] [review] updated patch Found a small difference in behavior, where the lost-timer > max_dropout would trigger also without gap > 0, so changed that to be identical with old behavior.
commit fb9c75db360ff1aa61710804b08788b6c35f44df Author: Havard Graff <havard.graff@gmail.com> Date: Sun Oct 9 15:59:05 2016 +0200 rtpjitterbuffer: fix lost-event using dts instead of pts The lost-event was using a different time-domain (dts) than the outgoing buffers (pts). Given certain network-conditions these two would become sufficiently different and the lost-event contained timestamp/duration that was really wrong. As an example GstAudioDecoder could produce a stream that jumps back and forth in time after receiving a lost-event. The previous behavior calculated the pts (based on the rtptime) inside the rtp_jitter_buffer_insert function, but now this functionality has been refactored into a new function rtp_jitter_buffer_calculate_pts that is called much earlier in the _chain function to make pts available to various calculations that wrongly used dts previously (like the lost-event). There are however two calculations where using dts is the right thing to do: calculating the receive-jitter and the rtx-round-trip-time, where the arrival time of the buffer from the network is the right metric (and is what dts in fact is today). The patch also adds two tests regarding B-frames or the “rtptime-going-backwards”-scenario, as there were some concerns that this patch might break this behavior (which the tests shows it does not).