GNOME Bugzilla – Bug 599292
Synchronizing two RTP streams from different devices using RTCP is inaccurate.
Last modified: 2010-02-15 21:20:19 UTC
This example tries to demonstrate why synchronizing two devices using RTCP is impossible to get accurate. This is done by look at the relation between the RTP timeline and the NTP timeline. RTP PACKAGE (#1) RTP: 146893517 NTP: 39:14.947678 (minute:second.microseconds) RTP PACKAGE (#2) RTP: 147050080 NTP: 39:16.687274 RTCP PACKAGE (#3) RTP: 147069882 NTP: 39:17.0959 DIFF (RTP2-RTP1): 156563 / 90000 = 1.739588888888888889 DIFF (NTP2-NTP1): 1.739596 Here the time difference between two RTP packages are the same. DIFF (RTP3-RTP1): 176365 / 90000 = 1.959611111111111111 DIFF (NTP3-NTP1): 2.148222 When comparing the clocks between a RTP and RTCP package we see that the difference in RTP time is not the same as the difference in NTP time. Therefore when synchronizing using RTCP we will get an incorrect RTP timestamp.
Created attachment 146038 [details] [review] Solution suggestion using the pipeline clock to calculate the RTP timestamp and g_get_current_time to get the NTP time Here is the same analysis as before but made with this patch applied. RTP PACKAGE (#1) RTP: 605386774 NTP: 51:33.393658 RTP PACKAGE (#2) RTP: 605593369 NTP: 51:35.689204 RTCP PACKAGE (#3) RTP: 605699781 NTP: 51:36.8735 DIFF (RTP2-RTP1): 206595 / 90000 = 2.2955 DIFF (NTP2-NTP1): 2.295546 Here the time difference between two RTP packages are the same. DIFF (RTP3-RTP1): 313007 / 90000 = 3.477855555555555556 DIFF (NTP3-NTP1): 3.479842 Here the difference between a RTP and RTCP package is much more accurate than before.
This patch adds another time arg to various functions. It seems that this time is an absolute value from a pipeline clock, which cannot be compared to any other absolute values (because the clock can change inbetween compares).
Why don't you distribute the base_time to both pipeline on both machines?
My understanding is that the problem is that there is skewing between the RTP and NTP times reported by the sender that we dont compensate for ? I guess we also produce the same thing now with the perfect timestamp patches if the pipeline clock is not the audio clock.
The NTP time that corresponds to an RTP time is taken from the buffer RTP timestamp and the corresponding running_time on that buffer. We then also have a mapping between running_time and NTP time. This last mapping is taken by sampling the current system time when the pipeline goes to PLAYING. When we need to calculate the NTP time for a buffer RTP time, we take the running_time of that buffer and bring it back to NTP time. This assumes that the running_time is aligned to the system time, which might not always be true. Also another problem is that when you have multiple pipelines, they don't always take the same system time as a base for NTP time. Then the running_time can also drift from the system time if the pipeline clock is not based on anything system time. We would probably need to sample the system clock and the pipeline clock when we need to generate an NTP time, then use the last running_time/RTP values of the last buffer to scale the running_time back to system time.
Distributing the base time is much more hazzle than just running an ntp client on all hosts sending rtp. We can just configure the rtsp server to reply with different hosts (rtsp urls) for different streams in sdp and if all hosts sending rtp have an ntp client it works fine without having to communicate between senders.
I go confused again by the patch. What again does it do now? How does it make things work?
Created attachment 151821 [details] [review] Improved solution suggestion using pipeline clock to calculate the RTP timestamp
There were a few mistakes in the previous patch, so I have made this one to try to be a bit more concise. The original problem comes from trying to synchronize an RTP stream with another RTP stream from another device using RTCP. Let us assume that the other device is perfect to be able to focus entirely on the issue with GStreamer here. Both devices share a common wall-clock to ensure that the NTP clock is "exactly" the same on both devices. However, in GStreamer the rtpmanager does not read what the clock really is but rather calculates the NTP time from a running-time and an NTP base-time. Unfortunately this is not very accurate and on an embedded system the time this calculated time can differ from real time with several tenth of seconds. As a matter of fact the NTP time is calculated using two different formulas. One is used when sending RTP packets and one is used when sending RTCP. If these two formulas were used at the exact same point in time two different timestamps would have been obtained. RTCP formula: base_time = GST_ELEMENT_CAST (rtpsession)->base_time; ntpnsbase = rtpsession->priv->ntpnsbase; rt = gst_clock_get_time (clock) - base_time; ntpns = rt + ntpnsbase; RTP formula: ntpnstime = gst_segment_to_running_time (&rtpsession->send_rtp_seg, GST_FORMAT_TIME, timestamp); ntpnstime += priv->ntpnsbase; When sending the RTCP packet, the RTP timestamp is based on a substraction between the NTP time the last RTP packet was sent and the current NTP time. Since these NTP times are skewing the resulting RTP timestamp is wrong and causing a ~200ms offset in the synchronization. Solution attempt: When the RTP packets are sent the buffer timestamp is saved together with the last RTP timestamp. When sending the RTCP packet the NTP time is polled from the system clock, to get an exact value. The pipeline clock is also polled to get the current time of the pipeline, which can be compared with the saved buffer timestamp. The times then sent with the RTCP are the exact NTP time and an RTP time which is calculated as: RTP timestamp = last RTP timestamp + pipeline time - last buffer timestamp The diff pipeline time - last buffer timestamp will give us the exact time that has passed since the last RTP packet, thus giving us two very exact times for the RTCP When verifying using wireshark and looking at 2 RTP packets and 1 RTCP packets the times were accurate down to 100 microseconds. Diff between two RTP packets UTC Diff: 3.487038 RTP Diff: 3.487044444444444444 Diff between RTP and RTCP packet UTC Diff: 2.328304 RTP Diff: 2.328355555555555556
Created attachment 151846 [details] [review] proposed patch there were some problems with your patch, mostly it compared a buffer timestamp to an absolute clock value while it should have compared a buffer running_time against a clock running_time. Also rearranged some other fields in an attempt to completely get rid of the ntpnsbase value.
Unfortunately the problem still persists with your latest patch Wim. It appears that the problem is caused by running_time not being the same when calculated from the segment and when calculated from the base_time. I applied your patch and made modifications purely for debugging to illustrate the behavior I get. Excerpt from gst_rtp_session_chain_send_rtp_common: if (GST_CLOCK_TIME_IS_VALID (timestamp)) { /* convert to running time using the segment start value. */ running_time = gst_segment_to_running_time (&rtpsession->send_rtp_seg, GST_FORMAT_TIME, timestamp); get_current_times(rtpsession, &running_time2, NULL); g_print ("Running time (segment): %llu\n", running_time / 1000000ULL); g_print ("Running time (get_current): %llu\n", running_time2 / 1000000ULL); g_print ("Timestamp age: %llu\n", (gst_clock_get_time (priv->sysclock) - timestamp) / 1000000ULL); g_print ("Time difference: %lld\n\n", (gint64)(running_time - running_time2) / 1000000LL); } else { As you can see I get the running time both ways now when sending an RTP packet and divide with 1000000 to get it into readable milliseconds. The timestamp age is compared to the absolute clock, i know, but it is purely for debug purposes to show how much time passed since the gst-buffer was created. Here is the runtime result: Running time (segment): 0 Running time (get_current): 10 Timestamp age: 290 Time difference: -10 Running time (segment): 395 Running time (get_current): 118 Timestamp age: 1 Time difference: 276 Running time (segment): 414 Running time (get_current): 137 Timestamp age: 1 Time difference: 276 Running time (segment): 423 Running time (get_current): 146 Timestamp age: 1 Time difference: 276 It seems like the time difference becomes approximate to the timestamp age for the first packet (when segment runtime becomes zero). Is this how it should be or are we looking at a bug?
The patch is similar to yours except that it takes normalized values for the timestamps and the clock times (starting from 0). The idea is to capture a pair of running_time (of a buffer using the segment) and the RTP timestamp in that buffer. The timestamp represents the time when the buffer was captured and when it should be played back (after adding latency). To make the RTCP packet, we sample a pair of pipeline running_time (using the pipeline clock + base_time) and NTP time (using the system clock). We can calculate the diff between the last buffer running_time and the pipeline running_time and add this diff to the last RTP timestamp. The end result is that we then have the RTP time for the current NTP time, which is what was needed. I think your patch attempted to do that but without normalizing the buffer timestamps (which can be anything) and the clock absolute times (which can also be anything). Above patch normalizes those values to work in more situations. I guess I'll have to actually try the patch then, now :)
Running the same debug prints with my patch yielded the same result: Running time (segment): 0 Running time (get_current): 10 Timestamp age: 276 Time difference: -10 Running time (segment): 382 Running time (get_current): 118 Timestamp age: 1 Time difference: 263 Running time (segment): 401 Running time (get_current): 137 Timestamp age: 1 Time difference: 263 Running time (segment): 410 Running time (get_current): 146 Timestamp age: 1 Time difference: 263 The problem as I see it is that we must know exactly how much time passed since the last RTP packet when sending the RTCP packet. If our two points of time are not normalized to exactly the same point in time we will always have a time offset. I will try to add an ugly hack on top of your patch to try to calculate the difference between segment running_time and the pipeline running_time, just like you suggested. If this works we might actually be on to a solution :)
Created attachment 151933 [details] [review] A test with an offset between the two running_times to verify it solves the problem When testing with an offset the problem was solved on my target system. The problem with doing it this way is that the buffer timestamp is equaled to the current time which is not necessarily the case. It works on my system since the buffer is never older than 1-2 ms.
Created attachment 152304 [details] [review] Solution proposal when base_time is forced to 0 This patch handles the special case when base_time is forced to 0. It assumes that when base_time is set to 0 the buffer timestamp is comparable with running time. The only change made compared to Wim's previous patch is done in the function gst_rtp_session_chain_send_rtp_common.
Wim's proposed solution solved the problem when I sent a new_segment event with segment.start = 0 and set the base_time of the pipeline to 0.
This is good and desired. Now people can sync multiple pipelines by choosing a common clock and base_time as it always has been according to the design. I will commit this patch after the freeze.
commit 9d40d60960d6abc2675d41c463d4d81e4b9be319 Author: Wim Taymans <wim.taymans@collabora.co.uk> Date: Wed Jan 20 18:52:51 2010 +0100 rtpbin: remove use of ntp_ns_base commit 5a4ecc9da1f73dc8b912b0cc49a59475478b7646 Author: Wim Taymans <wim.taymans@collabora.co.uk> Date: Wed Jan 20 18:22:20 2010 +0100 rtpbin: remove more ntpnstime and cleanups Remove some code where we pass ntpnstime around, we can do most things with the running_time just fine. Rename a variable in the ArrivalStats struct so that it's clear that this is the current system time. commit 74241e549fb54db8a3847b3e90d9d90c27c4506c Author: Wim Taymans <wim.taymans@collabora.co.uk> Date: Wed Jan 20 18:19:34 2010 +0100 rtpsource: use running_time for jitter Use the running_time to calculate the jitter instead of the ntp time. Part of the plan to get rid of ntpnsbase. commit 83cb1aecc8218dabc2a7a99a916c73a97037db74 Author: Wim Taymans <wim.taymans@collabora.co.uk> Date: Wed Jan 20 17:04:03 2010 +0100 rtpbin: change how NTP time is calculated in RTCP Don't calculate the NTP time based on the running_time of the pipeline but from the systemclock. This allows us to generate more accurate NTP timestamps in case the systemclock is synchronized with NTP or similar.