Bug 599292 – Synchronizing two RTP streams from different devices using RTCP is inaccurate.

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 599292 - Synchronizing two RTP streams from different devices using RTCP is inaccurate.


Summary:	Synchronizing two RTP streams from different devices using RTCP is inaccurate.


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	0.10.19
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2009-10-22 12:40 UTC by Per Smitt
Modified:	2010-02-15 21:20 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Solution suggestion using the pipeline clock to calculate the RTP timestamp and g_get_current_time to get the NTP time (9.44 KB, patch) 2009-10-22 12:46 UTC, Per Smitt	none	Details \| Review
Improved solution suggestion using pipeline clock to calculate the RTP timestamp (9.95 KB, patch) 2010-01-20 09:52 UTC, Per Smitt	none	Details \| Review
proposed patch (17.69 KB, patch) 2010-01-20 16:10 UTC, Wim Taymans	none	Details \| Review
A test with an offset between the two running_times to verify it solves the problem (18.87 KB, patch) 2010-01-21 11:22 UTC, Per Smitt	none	Details \| Review
Solution proposal when base_time is forced to 0 (18.00 KB, patch) 2010-01-26 12:16 UTC, Per Smitt	none	Details \| Review

Description Per Smitt 2009-10-22 12:40:37 UTC

This example tries to demonstrate why synchronizing two devices using RTCP
is impossible to get accurate. This is done by look at the relation between
the RTP timeline and the NTP timeline.

RTP PACKAGE (#1)
RTP: 146893517
NTP: 39:14.947678 (minute:second.microseconds)

RTP PACKAGE (#2)
RTP: 147050080
NTP: 39:16.687274

RTCP PACKAGE (#3)
RTP: 147069882
NTP: 39:17.0959

DIFF (RTP2-RTP1): 156563 / 90000 = 1.739588888888888889
DIFF (NTP2-NTP1): 1.739596
Here the time difference between two RTP packages are the same.

DIFF (RTP3-RTP1): 176365 / 90000 = 1.959611111111111111
DIFF (NTP3-NTP1): 2.148222
When comparing the clocks between a RTP and RTCP package we see
that the difference in RTP time is not the same as the difference
in NTP time. Therefore when synchronizing using RTCP we will get
an incorrect RTP timestamp.

Comment 1 Per Smitt 2009-10-22 12:46:55 UTC

Created attachment 146038 [details] [review]
Solution suggestion using the pipeline clock to calculate the RTP timestamp and g_get_current_time to get the NTP time

Here is the same analysis as before but made with this patch applied.

RTP PACKAGE (#1)
RTP: 605386774
NTP: 51:33.393658

RTP PACKAGE (#2)
RTP: 605593369
NTP: 51:35.689204

RTCP PACKAGE (#3)
RTP: 605699781
NTP: 51:36.8735

DIFF (RTP2-RTP1): 206595 / 90000 = 2.2955
DIFF (NTP2-NTP1): 2.295546
Here the time difference between two RTP packages are the same.

DIFF (RTP3-RTP1): 313007 / 90000 = 3.477855555555555556
DIFF (NTP3-NTP1): 3.479842
Here the difference between a RTP and RTCP package is much more
accurate than before.

Comment 2 Wim Taymans 2009-10-27 15:07:28 UTC

This patch adds another time arg to various functions. It seems that this time is an absolute value from a pipeline clock, which cannot be compared to any other absolute values (because the clock can change inbetween compares).

Comment 3 Wim Taymans 2009-10-27 15:12:02 UTC

Why don't you distribute the base_time to both pipeline on both machines?

Comment 4 Olivier Crête 2009-10-27 16:21:48 UTC

My understanding is that the problem is that there is skewing between the RTP and NTP times reported by the sender that we dont compensate for ? I guess we also produce the same thing now with the perfect timestamp patches if the pipeline clock is not the audio clock.

Comment 5 Wim Taymans 2009-10-27 16:30:27 UTC

The NTP time that corresponds to an RTP time is taken from the buffer RTP timestamp and the corresponding running_time on that buffer. We then also have a mapping between running_time and NTP time. This last mapping is taken by sampling the current system time when the pipeline goes to PLAYING.

When we need to calculate the NTP time for a buffer RTP time, we take the running_time of that buffer and bring it back to NTP time. This assumes that the running_time is aligned to the system time, which might not always be true.

Also another problem is that when you have multiple pipelines, they don't always take the same system time as a base for NTP time.

Then the running_time can also drift from the system time if the pipeline clock is not based on anything system time.

We would probably need to sample the system clock and the pipeline clock when we need to generate an NTP time, then use the last running_time/RTP values of the last buffer to scale the running_time back to system time.

Comment 6 Jonas Holmberg 2009-10-27 18:11:40 UTC

Distributing the base time is much more hazzle than just running an ntp client on all hosts sending rtp. We can just configure the rtsp server to reply with different hosts (rtsp urls) for different streams in sdp and if all hosts sending rtp have an ntp client it works fine without having to communicate between senders.

Comment 7 Wim Taymans 2009-12-22 22:00:16 UTC

I go confused again by the patch. What again does it do now? How does it make things work?

Comment 8 Per Smitt 2010-01-20 09:52:28 UTC

Created attachment 151821 [details] [review]
Improved solution suggestion using pipeline clock to calculate the RTP timestamp

Comment 9 Per Smitt 2010-01-20 09:53:44 UTC

There were a few mistakes in the previous patch, so I have made this one to try to be a bit more concise.

The original problem comes from trying to synchronize an RTP stream with another RTP stream from another device using RTCP. Let us assume that the other device is perfect to be able to focus entirely on the issue with GStreamer here.

Both devices share a common wall-clock to ensure that the NTP clock is "exactly" the same on both devices.

However, in GStreamer the rtpmanager does not read what the clock really is but rather calculates the NTP time from a running-time and an NTP base-time. Unfortunately this is not very accurate and on an embedded system the time this calculated time can differ from real time with several tenth of seconds.

As a matter of fact the NTP time is calculated using two different formulas. One is used when sending RTP packets and one is used when sending RTCP. If these two formulas were used at the exact same point in time two different timestamps would have been obtained.

RTCP formula:
base_time = GST_ELEMENT_CAST (rtpsession)->base_time;
ntpnsbase = rtpsession->priv->ntpnsbase;
rt = gst_clock_get_time (clock) - base_time;
ntpns = rt + ntpnsbase;

RTP formula:
ntpnstime =
gst_segment_to_running_time (&rtpsession->send_rtp_seg, GST_FORMAT_TIME,
timestamp);
ntpnstime += priv->ntpnsbase;

When sending the RTCP packet, the RTP timestamp is based on a substraction between the NTP time the last RTP packet was sent and the current NTP time. Since these NTP times are skewing the resulting RTP timestamp is wrong and causing a ~200ms offset in the synchronization.

Solution attempt:
When the RTP packets are sent the buffer timestamp is saved together with the last RTP timestamp.

When sending the RTCP packet the NTP time is polled from the system clock, to get an exact value. The pipeline clock is also polled to get the current time of the pipeline, which can be compared with the saved buffer timestamp.

The times then sent with the RTCP are the exact NTP time and an RTP time which is calculated as:
RTP timestamp = last RTP timestamp + pipeline time - last buffer timestamp
The diff pipeline time - last buffer timestamp will give us the exact time that has passed since the last RTP packet, thus giving us two very exact times for the RTCP

When verifying using wireshark and looking at 2 RTP packets and 1 RTCP packets the times were accurate down to 100 microseconds.

Diff between two RTP packets
UTC Diff: 3.487038
RTP Diff: 3.487044444444444444

Diff between RTP and RTCP packet
UTC Diff: 2.328304
RTP Diff: 2.328355555555555556

Comment 10 Wim Taymans 2010-01-20 16:10:12 UTC

Created attachment 151846 [details] [review]
proposed patch

there were some problems with your patch, mostly it compared a buffer timestamp to an absolute clock value while it should have compared a buffer running_time against a clock running_time.

Also rearranged some other fields in an attempt to completely get rid of the ntpnsbase value.

Comment 11 Per Smitt 2010-01-21 08:22:46 UTC

Unfortunately the problem still persists with your latest patch Wim.

It appears that the problem is caused by running_time not being the same when calculated from the segment and when calculated from the base_time.

I applied your patch and made modifications purely for debugging to illustrate the behavior I get.

Excerpt from gst_rtp_session_chain_send_rtp_common:
  if (GST_CLOCK_TIME_IS_VALID (timestamp)) {

    /* convert to running time using the segment start value. */
    running_time =
        gst_segment_to_running_time (&rtpsession->send_rtp_seg, GST_FORMAT_TIME,
        timestamp);
    get_current_times(rtpsession, &running_time2, NULL);

    g_print ("Running time (segment): %llu\n", running_time / 1000000ULL);
    g_print ("Running time (get_current): %llu\n", running_time2 / 1000000ULL);
    g_print ("Timestamp age: %llu\n",
        (gst_clock_get_time (priv->sysclock) - timestamp) / 1000000ULL);
    g_print ("Time difference: %lld\n\n",
        (gint64)(running_time - running_time2) / 1000000LL);
  } else {

As you can see I get the running time both ways now when sending an RTP packet and divide with 1000000 to get it into readable milliseconds. The timestamp age is compared to the absolute clock, i know, but it is purely for debug purposes to show how much time passed since the gst-buffer was created.

Here is the runtime result:
Running time (segment): 0
Running time (get_current): 10
Timestamp age: 290
Time difference: -10

Running time (segment): 395
Running time (get_current): 118
Timestamp age: 1
Time difference: 276

Running time (segment): 414
Running time (get_current): 137
Timestamp age: 1
Time difference: 276

Running time (segment): 423
Running time (get_current): 146
Timestamp age: 1
Time difference: 276

It seems like the time difference becomes approximate to the timestamp age for the first packet (when segment runtime becomes zero). Is this how it should be or are we looking at a bug?

Comment 12 Wim Taymans 2010-01-21 09:40:01 UTC

The patch is similar to yours except that it takes normalized values for the timestamps and the clock times (starting from 0).

The idea is to capture a pair of running_time (of a buffer using the segment) and the RTP timestamp in that buffer. The timestamp represents the time when the buffer was captured and when it should be played back (after adding latency).

To make the RTCP packet, we sample a pair of pipeline running_time (using the pipeline clock + base_time) and NTP time (using the system clock). We can calculate the diff between the last buffer running_time and the pipeline running_time and add this diff to the last RTP timestamp.

The end result is that we then have the RTP time for the current NTP time, which is what was needed. 

I think your patch attempted to do that but without normalizing the buffer timestamps (which can be anything) and the clock absolute times (which can also be anything). Above patch normalizes those values to work in more situations.

I guess I'll have to actually try the patch then, now :)

Comment 13 Per Smitt 2010-01-21 10:34:36 UTC

Running the same debug prints with my patch yielded the same result:
Running time (segment): 0
Running time (get_current): 10
Timestamp age: 276
Time difference: -10

Running time (segment): 382
Running time (get_current): 118
Timestamp age: 1
Time difference: 263

Running time (segment): 401
Running time (get_current): 137
Timestamp age: 1
Time difference: 263

Running time (segment): 410
Running time (get_current): 146
Timestamp age: 1
Time difference: 263

The problem as I see it is that we must know exactly how much time passed since the last RTP packet when sending the RTCP packet. If our two points of time are not normalized to exactly the same point in time we will always have a time offset.

I will try to add an ugly hack on top of your patch to try to calculate the difference between segment running_time and the pipeline running_time, just like you suggested. If this works we might actually be on to a solution :)

Comment 14 Per Smitt 2010-01-21 11:22:13 UTC

Created attachment 151933 [details] [review]
A test with an offset between the two running_times to verify it solves the problem

When testing with an offset the problem was solved on my target system. The problem with doing it this way is that the buffer timestamp is equaled to the current time which is not necessarily the case. It works on my system since the buffer is never older than 1-2 ms.

Comment 15 Per Smitt 2010-01-26 12:16:10 UTC

Created attachment 152304 [details] [review]
Solution proposal when base_time is forced to 0

This patch handles the special case when base_time is forced to 0.

It assumes that when base_time is set to 0 the buffer timestamp is comparable with running time. The only change made compared to Wim's previous patch is done in the function gst_rtp_session_chain_send_rtp_common.

Comment 16 Per Smitt 2010-01-26 16:04:05 UTC

Wim's proposed solution solved the problem when I sent a new_segment event with segment.start = 0 and set the base_time of the pipeline to 0.

Comment 17 Wim Taymans 2010-01-26 18:18:23 UTC

This is good and desired. Now people can sync multiple pipelines by choosing a common clock and base_time as it always has been according to the design. I will commit this patch after the freeze.

Comment 18 Wim Taymans 2010-02-15 21:20:19 UTC

commit 9d40d60960d6abc2675d41c463d4d81e4b9be319
Author: Wim Taymans <wim.taymans@collabora.co.uk>
Date:   Wed Jan 20 18:52:51 2010 +0100

    rtpbin: remove use of ntp_ns_base

commit 5a4ecc9da1f73dc8b912b0cc49a59475478b7646
Author: Wim Taymans <wim.taymans@collabora.co.uk>
Date:   Wed Jan 20 18:22:20 2010 +0100

    rtpbin: remove more ntpnstime and cleanups
    
    Remove some code where we pass ntpnstime around, we can do most things with the
    running_time just fine.
    Rename a variable in the ArrivalStats struct so that it's clear that this is the
    current system time.

commit 74241e549fb54db8a3847b3e90d9d90c27c4506c
Author: Wim Taymans <wim.taymans@collabora.co.uk>
Date:   Wed Jan 20 18:19:34 2010 +0100

    rtpsource: use running_time for jitter
    
    Use the running_time to calculate the jitter instead of the ntp time. Part of
    the plan to get rid of ntpnsbase.

commit 83cb1aecc8218dabc2a7a99a916c73a97037db74
Author: Wim Taymans <wim.taymans@collabora.co.uk>
Date:   Wed Jan 20 17:04:03 2010 +0100

    rtpbin: change how NTP time is calculated in RTCP
    
    Don't calculate the NTP time based on the running_time of the pipeline but from
    the systemclock. This allows us to generate more accurate NTP timestamps in case
    the systemclock is synchronized with NTP or similar.