Bug 746479 – rtsp: Only two second of playback with rtpsrc and test-mp4 (rtsp-server)

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 746479 - rtsp: Only two second of playback with rtpsrc and test-mp4 (rtsp-server)


Summary:	rtsp: Only two second of playback with rtpsrc and test-mp4 (rtsp-server)


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	git master
Hardware:	Other Linux

Importance:	Normal blocker
Target Milestone:	1.5.1
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-03-19 18:20 UTC by Nicolas Dufresne (ndufresne)
Modified:	2015-04-02 22:53 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch adding warning and protection around substractions (2.73 KB, patch) 2015-03-20 17:13 UTC, Nicolas Dufresne (ndufresne)	none	Details \| Review
rtpjitter: Protect substractions (2.78 KB, patch) 2015-03-20 17:21 UTC, Nicolas Dufresne (ndufresne)	none	Details \| Review
rtpjitter: Protect substractions (2.78 KB, patch) 2015-03-20 17:29 UTC, Nicolas Dufresne (ndufresne)	none	Details \| Review
rtpjitter: Protect substractions (2.79 KB, patch) 2015-03-20 17:54 UTC, Nicolas Dufresne (ndufresne)	none	Details \| Review
Capture showing missing reply to play request (1.04 MB, application/vnd.tcpdump.pcap) 2015-03-20 20:04 UTC, Nicolas Dufresne (ndufresne)		Details
rtsp-media: Properly return first rtptime (1.18 KB, patch) 2015-03-21 15:05 UTC, Nicolas Dufresne (ndufresne)	committed	Details \| Review

Description Nicolas Dufresne (ndufresne) 2015-03-19 18:20:01 UTC

I'm having issue at the moment with RTSP playback on latest git (kind of worked two weeks ago). Right now if I run test-mp4 and gst-play-1.0 rtps://... only two second is played then it EOS. 2s is the size of the jitter buffer.

So far I know that EOS comes from the jitter buffer. I notice that npt time are going backward. As these is no protection in the code for that, we get large value which I suspect are responsible for the eos timeout to trigger early. It's notice that Farstream unit test started failing too.

Comment 1 Tim-Philipp Müller 2015-03-19 19:26:22 UTC

Marking as blocker.

Comment 2 Nicolas Dufresne (ndufresne) 2015-03-20 17:13:48 UTC

Created attachment 299969 [details] [review]
Patch adding warning and protection around substractions

I don't think this patch is entirely correct, at least it uncovers some of the issues going on, allowing to trap overflow in places we did assume there would not be.

Comment 3 Nicolas Dufresne (ndufresne) 2015-03-20 17:21:12 UTC

Created attachment 299970 [details] [review]
rtpjitter: Protect substractions

This patch protects all the substraction that where not before. It
also warn as clearly these where not expected to overflow.

Comment 4 Nicolas Dufresne (ndufresne) 2015-03-20 17:23:29 UTC

With the fixed version of this patch, we see that base_time and last_rtptime becomes invalid only after a clock skew. Maybe something is not reset properly ?

Comment 5 Nicolas Dufresne (ndufresne) 2015-03-20 17:29:45 UTC

Created attachment 299972 [details] [review]
rtpjitter: Protect substractions

This patch protects all the substraction that where not before. It
also warn as clearly these where not expected to overflow.

Comment 6 Nicolas Dufresne (ndufresne) 2015-03-20 17:31:07 UTC

With the right check (was checking > instead of >=) it does not warn anymore. Which mean there is no overflow, my bad, the patch is still worth it probably.

Comment 7 Nicolas Dufresne (ndufresne) 2015-03-20 17:54:37 UTC

Created attachment 299975 [details] [review]
rtpjitter: Protect substractions

This patch protects all the substraction that where not before. It
also warn as clearly these where not expected to overflow.

Comment 8 Nicolas Dufresne (ndufresne) 2015-03-20 18:04:26 UTC

Ok, new investigation path. First time we clock_id_wait(id, &clock_jitter), the resulting jitter is:

5124095:34:33.689709342

This cause all timeout to be reached at once. This does not seem like a reasonable jitter.

Comment 9 Nicolas Dufresne (ndufresne) 2015-03-20 18:11:58 UTC

Will be my week of false positive, jitter is all right, it's negative (but only slightly). Well, sometimes it's 2s ahead of time, but in general it's fine.

Comment 10 Nicolas Dufresne (ndufresne) 2015-03-20 20:04:52 UTC

Created attachment 299998 [details]
Capture showing missing reply to play request

So it seems the the server does not reply to PLAY request.

Comment 11 Nicolas Dufresne (ndufresne) 2015-03-20 21:01:34 UTC

More information, the server do send the reply (we can see it in wireshark with vlc as a client), but it looks like the client stop reading from the TCP socket. 

One other thing I notice though is that RTPInfo is broken. The initial rtptime information is not correct.

When running with VLC:
RTP-Info: url=rtsp://localhost:8554/test/stream=0;seq=24747;rtptime=0, url=rtsp://localhost:8554/test/stream=1;seq=17776;rtptime=0\r\n

And first RTP packet for each streams have:
  PT 96 Seq:24747 TS:931696044
  PT 97 Seq:17776 TS:2784754487

Comment 12 Nicolas Dufresne (ndufresne) 2015-03-20 21:28:30 UTC

Ok, no idea why I didn't get a reply in previous record, I always got it now. I think one should focus on the wrong ntptime in the RTP Info.

Comment 13 Nicolas Dufresne (ndufresne) 2015-03-21 15:05:50 UTC

Created attachment 300023 [details] [review]
rtsp-media: Properly return first rtptime

Instead we where returning first GstBuffer timestamp. This would result
in clock skew and unwanted behaviour in RTSP playback.

Comment 14 Nicolas Dufresne (ndufresne) 2015-03-21 15:15:04 UTC

This fixes the skew at start, which would confuse the RTP jitter buffer. Though a skew and an overflow still occurs on seeks.

(gstrtpjitterbuffer.c:2104):calculate_jitter: runtime check failed: (priv->last_rtptime == -1 || rtptime >= priv->last_rtptime)

From the spec, after a seek, some buffers of the old segment might be travelling over the network and we are supposed to drop it base on the initial rtptime and seq we got from the RTPInfo. I'm not sure yet if we have code for that, e.g. we have trusty start seq + rtptime, please trust and drop mode. Any input about how this is supposed to work would be appreciated. I'm not sure how this could have worked in the past.

Comment 15 Nicolas Dufresne (ndufresne) 2015-03-21 15:17:21 UTC

Additional note, this last patch playback position reporting when using VLC as a client. Though, seeks are followed by long freeze before resume on this client (which may be related to the seek issue).

Comment 16 Nicolas Dufresne (ndufresne) 2015-03-21 17:00:05 UTC

Interesting, the method that sends start_segment is simply never called. Unless I'm mistaken again, gst_rtspsrc_handle_data() is never called. We endup with the default segment all the time, which could explain the clock skew after seek.

Comment 17 Nicolas Dufresne (ndufresne) 2015-03-21 20:41:07 UTC

Learning the hard way, but in interleaved mode (streaming over the TCP socket) this method is called, and segment are correct. In UDP mode, this method is not called, and there is no code to actually fix the segment. The jitterbuffer receives a generic segment and fails.

Comment 18 Sebastian Dröge (slomo) 2015-03-22 09:03:55 UTC

Comment on attachment 300023 [details] [review]
rtsp-media: Properly return first rtptime

Shouldn't both values be the same? Or are they not because of the segment?

Comment 19 Sebastian Dröge (slomo) 2015-03-22 09:09:46 UTC

(In reply to Nicolas Dufresne (stormer) from comment #14)

> From the spec, after a seek, some buffers of the old segment might be
> travelling over the network and we are supposed to drop it base on the
> initial rtptime and seq we got from the RTPInfo. I'm not sure yet if we have
> code for that

That's the seqnum-base (and clock-base) from the caps. I added something to jitterbuffer last week to drop every packet with a seqnum before that.

(In reply to Nicolas Dufresne (stormer) from comment #17)
> In UDP mode, this method is not called, and there is no code to actually
> fix the segment. The jitterbuffer receives a generic segment and fails.

How did that ever work at all? :) Maybe people only tested seeks in TCP mode?!
I guess there should be a pad probe for modifying the segment after the udpsrcs then. Does that fix the remaining problem?

Comment 20 Nicolas Dufresne (ndufresne) 2015-03-22 13:56:39 UTC

(In reply to Sebastian Dröge (slomo) from comment #19)
> How did that ever work at all? :) Maybe people only tested seeks in TCP
> mode?!
> I guess there should be a pad probe for modifying the segment after the
> udpsrcs then. Does that fix the remaining problem?

This is exactly what I started writing, we'll see

I think if it worked before it would be because the skew code may have succeed where it actually fails badly now. Or that we were doing like Wowza in the past (setting rtptime=0,seq=0 after every seek) and we only changed that to be compliant "recently" without testing the seeks. Second case is not that likelyly since seeks are also broken with Wowza server as I can observe:

totem rtsp://wowzaec2demo.streamlock.net/vod/mp4:BigBuckBunny_115k.mov

Comment 21 Nicolas Dufresne (ndufresne) 2015-03-22 22:51:24 UTC

(In reply to Sebastian Dröge (slomo) from comment #18)
> Comment on attachment 300023 [details] [review] [review]
> rtsp-media: Properly return first rtptime
> 
> Shouldn't both values be the same? Or are they not because of the segment?

No, the rtp timestamp has a different scale. The buffer timestamp matches segment start in this case (0), while the first rtp timestamp is a random number.

Comment 22 Sebastian Dröge (slomo) 2015-03-23 08:16:29 UTC

Good point, forgot about that :)

Comment 23 Sebastian Dröge (slomo) 2015-03-23 13:29:14 UTC

For the segment event, you can just put a pad probe on the udpsrc srcpad... and replace all segment events with a new segment event that contains the rtspsrc segment.

Comment 24 Nicolas Dufresne (ndufresne) 2015-03-23 13:51:01 UTC

(In reply to Sebastian Dröge (slomo) from comment #23)
> For the segment event, you can just put a pad probe on the udpsrc srcpad...
> and replace all segment events with a new segment event that contains the
> rtspsrc segment.

I tried, but it didn't work. It's probably not enough, or out of sync. I need to learn more again.

Comment 25 Nicolas Dufresne (ndufresne) 2015-03-23 16:23:33 UTC

commit dfb053add33e6636ef473b78826e4a720cc1ed24
Author: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Date:   Sat Mar 21 11:04:05 2015 -0400

    rtsp-media: Properly return first rtptime
    
    Instead we where returning first GstBuffer timestamp. This would result
    in clock skew and unwanted behaviour in RTSP playback.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=746479

Comment 26 Nicolas Dufresne (ndufresne) 2015-03-23 19:52:08 UTC

Ok, was wondering, and then I just decided to test seeks with TCP streaming, to find out it does not work.

calculate_jitter: cannot get current running_time
gst_rtp_jitter_buffer_chain:<rtpjitterbuffer1> error: Received packet without DTS after a gap

This is the error I get if I try to replace the segment in UDP mode using a probe.

Comment 27 Sebastian Dröge (slomo) 2015-03-23 19:55:31 UTC

For TCP we should always set a timestamp on the first buffer (after a flush or in the very beginning). Otherwise rtpjitterbuffer can't really compensate discontinuities or anything.

With UDP we should get a DTS/PTS on every single buffer. The capture time.

Comment 28 Sebastian Dröge (slomo) 2015-03-23 19:56:25 UTC

(In reply to Sebastian Dröge (slomo) from comment #27)
> Otherwise rtpjitterbuffer can't really compensate discontinuities or anything

jitter I mean, it can't compensate for discontinuities/gaps at all in TCP mode. But in TCP mode we should not get them as TCP is a reliable transport.

Comment 29 Nicolas Dufresne (ndufresne) 2015-03-24 22:31:35 UTC

Ok, actually it's depayloaders (handled in baseclass) that deals with segments. Position reporting is wrong after seeks though, so there must be some bug there. Meanwhile, this is a totally different bug (not clearly a regression, it might have never worked) and this one is fixed now.

Comment 30 Nicolas Dufresne (ndufresne) 2015-03-27 11:43:30 UTC

Just realise overnight that this is not fully right yet :-( I need to check if the first buffer matches first expected. Otherwise the missing part may get clipped in one stream but not the other causing de-synch. In the case I don't know which one should be first, seeking should not be possible, hence the position should simply be the running time as before. Will work on that today.

Comment 31 Nicolas Dufresne (ndufresne) 2015-04-02 22:53:05 UTC

This is fixed now.