GNOME Bugzilla – Bug 755125
rtp: RTCP mapping between NTP and RTP time could be capture or send time based
Last modified: 2015-09-25 22:28:46 UTC
RFC3550 defines that the NTP time in the SR should be the NTP time when the SR is sent. So in practice this means based on the current clock time as we don't have any latency downstream of rtpbin to the RTCP sinks, and RTCP sinks are not synchronizing anyway. Now for the RTP time the RFC just says that it should be the same as the NTP time, which could have two meanings: 1) is it the RTP time we are going to send now? (we are doing this currently) or 2) is it the RTP time we are currently capturing? The difference between the two is latency on the sender is the latency L1 between rtpbin and the RTP sink (i.e., usually the pipeline latency minus the upstream latency for this stream) plus latency L2 between rtpbin and the source that produces the RTP stream. I think it would be good to add a property to allow 2), as this would allow the receiver to infer the time when the media was captured. This would be interesting for example for the attached example: - gst-rtsp-server has a pipeline that has a timeoverlay painting the current clock time into the frames - the client paints the running_time+base_time (= clock time when this frame is synced without the client latency) into the frames - client and server use the same clock, and synchronize each other via RTCP - server pipeline has a latency of 1s, receiver pipeline has a latency of 1.5s (statically configured) In theory both timestamps in the frame would be expected to be the same, with 2) they are. With 1) the difference between both timestamps is L1+L2 (==1s). Now the main problem for implementing 2) is code in gstrtpjitterbuffer.c:do_handle_sync(). It does not allow the rtptime to be more than 1 second in the future compared to what it currently receives. However with 2) and a sender latency of more than 1 second, this would happen.
Created attachment 311499 [details] [review] example diff
Created attachment 311500 [details] [review] rtpbin/session: Allow RTCP sync to happen based on capture time or send time Send time is the previous behaviour and the default, but there are use cases where you want to synchronize based on the capture time.
1) is how I've always seen it done. If rtpjitterbuffer has this assumption, I wouldn't be surprised if other receivers had the same assumption. I'm not sure why don't synchronize the sending of RTCP packets NTP time in the SR? Apart from the need to have async=false on the RTCP syncs.
Created attachment 311501 [details] [review] rtpjitterbuffer: Don't drop RTCP packets with RTP timestamps more than 1s in the future compared to what we received last There's nothing necessarily wrong with this. The NTP / RTP timestamp mapping is what matters, not the difference between the RTP timestamp in the RTCP packet and what we currently receive. The RTP packets could be delayed by more than 1s in theory, while the RTCP packets are not, or the RTP timestamp in the RTCP packet could be based on the capture time of the media instead of the send time.
(In reply to Olivier Crête from comment #3) > 1) is how I've always seen it done. > > If rtpjitterbuffer has this assumption, I wouldn't be surprised if other > receivers had the same assumption. > > I'm not sure why don't synchronize the sending of RTCP packets NTP time in > the SR? Apart from the need to have async=false on the RTCP syncs. That would mean that we delay all RTCP packets by the pipeline latency... which kind of defeats the point of RTCP. You want to give feedback ASAP. Instead of what my patches do, you could also move the NTP timestamp into the past by the latency to get the same behaviour (just that then the RTP timestamp in the RTCP packet would be close to what the receiver currently has). But I think this is not according to the RFC, the RFC explicitly states that the NTP time should be when the packet is sent.
Created attachment 311502 [details] [review] rtpbin/session: Allow RTCP sync to happen based on capture time or send time Send time is the previous behaviour and the default, but there are use cases where you want to synchronize based on the capture time.
Review of attachment 311501 [details] [review]: Maybe just up the number to 5 seconds? or 10 seconds?
Review of attachment 311502 [details] [review]: ::: gst/rtpmanager/gstrtpbin.c @@ +2177,3 @@ + g_object_class_install_property (gobject_class, PROP_RTCP_SYNC_SEND_TIME, + g_param_spec_boolean ("rtcp-sync-send-time", "RTCP Sync Send Time", + "Use send time or capture time for RTCP sync", Document which is TRUE, which is FALSE?
(In reply to Olivier Crête from comment #7) > Review of attachment 311501 [details] [review] [review]: > > Maybe just up the number to 5 seconds? or 10 seconds? I don't know, I can imagine use cases where I would like to have multiple minutes of latency somewhere between capture and sending :/
(In reply to Sebastian Dröge (slomo) from comment #9) > (In reply to Olivier Crête from comment #7) > > Review of attachment 311501 [details] [review] [review] [review]: > > > > Maybe just up the number to 5 seconds? or 10 seconds? > > I don't know, I can imagine use cases where I would like to have multiple > minutes of latency somewhere between capture and sending :/ And with that I mean valid use cases obviously. You think that this check was added to work with senders sending broken RTCP. But did we ever observe such senders, how are they behaving, ...? And what are the chances that they don't break other things too at the same time. Currently we ignore such RTCP SRs for the inter-session synchronization in rtpbin.
Which also means that as long as the NTP-rtptime mapping is consistently wrong between all sessions of that sender, it doesn't really matter.
Here's a long quote from the RFC, with a nice illustrative example that says we probably want to allow both behaviours as a programmatic option: timestamp: 32 bits .... RTP timestamps from different media streams may advance at different rates and usually have independent, random offsets. Therefore, although these timestamps are sufficient to reconstruct the timing of a single stream, directly comparing RTP timestamps from different media is not effective for synchronization. Instead, for each medium the RTP timestamp is related to the sampling instant by pairing it with a timestamp from a reference clock (wallclock) that represents the time when the data corresponding to the RTP timestamp was sampled. The reference clock is shared by all media to be synchronized. The timestamp pairs are not transmitted in every data packet, but at a lower rate in RTCP SR packets as described in Section 6.4. The sampling instant is chosen as the point of reference for the RTP timestamp because it is known to the transmitting endpoint and has a common definition for all media, independent of encoding delays or other processing. The purpose is to allow synchronized presentation of all media sampled at the same time. Applications transmitting stored data rather than data sampled in real time typically use a virtual presentation timeline derived from wallclock time to determine when the next frame or other unit of each medium in the stored data should be presented. In this case, the RTP timestamp would reflect the presentation time for each unit. That is, the RTP timestamp for each unit would be related to the wallclock time at which the unit becomes current on the virtual presentation timeline. Actual presentation occurs some time later as determined by the receiver. An example describing live audio narration of prerecorded video illustrates the significance of choosing the sampling instant as the reference point. In this scenario, the video would be presented locally for the narrator to view and would be simultaneously transmitted using RTP. The "sampling instant" of a video frame transmitted in RTP would be established by referencing its timestamp to the wallclock time when that video frame was presented to the narrator. The sampling instant for the audio RTP packets containing the narrator's speech would be established by referencing the same wallclock time when the audio was sampled. The audio and video may even be transmitted by different hosts if the reference clocks on the two hosts are synchronized by some means such as NTP. A receiver can then synchronize presentation of the audio and video packets by relating their RTP timestamps using the timestamp pairs in RTCP SR packets.
For the RTCP/RTP timestamp offset limit, let's make it a property, with GST_CLOCK_TIME_NONE to disable it.
Agreed, property on jitterbuffer plus proxy property on rtpbin. I'll do that tomorrow
Created attachment 311745 [details] [review] rtpbin/session: Allow RTCP sync to happen based on capture time or send time Send time is the previous behaviour and the default, but there are use cases where you want to synchronize based on the capture time.
Created attachment 311746 [details] [review] rtpbin/rtpjitterbuffer: Add property to set maximum ms between RTCP SR RTP time and last observed RTP time
Created attachment 311747 [details] [review] rtpbin/rtpjitterbuffer/rtspsrc: Add property to set maximum ms between RTCP SR RTP time and last observed RTP time
Created attachment 311946 [details] [review] rtpbin/rtpjitterbuffer/rtspsrc: Add property to set maximum ms between RTCP SR RTP time and last observed RTP time
commit 01c0f8723f9891da75357212293ccd0839d41523 Author: Sebastian Dröge <sebastian@centricular.com> Date: Mon Sep 21 13:47:21 2015 +0200 rtpbin/rtpjitterbuffer/rtspsrc: Add property to set maximum ms between RTCP SR RTP time and last observed RTP time https://bugzilla.gnome.org/show_bug.cgi?id=755125 commit a0ae6b5b5a73d1afeeeced38e61bf1f106f9eb12 Author: Sebastian Dröge <sebastian@centricular.com> Date: Wed Sep 16 19:28:11 2015 +0200 rtpbin/session: Allow RTCP sync to happen based on capture time or send time Send time is the previous behaviour and the default, but there are use cases where you want to synchronize based on the capture time. https://bugzilla.gnome.org/show_bug.cgi?id=755125