Bug 655804 – Play position advancing but frames not rendered

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 655804 - Play position advancing but frames not rendered


Summary:	Play position advancing but frames not rendered


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-bad
Version:	0.10.34
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	0.10.23
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-08-02 12:25 UTC by Tvrtko Ursulin
Modified:	2011-11-28 13:59 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
PyGst test program (4.78 KB, text/x-python) 2011-08-03 07:42 UTC, Tvrtko Ursulin		Details
GST_DEBUG=3 log of playback freezing for 14 seconds. (476.09 KB, text/x-log) 2011-08-08 14:13 UTC, Tvrtko Ursulin		Details
mpegdemux: catch smaller PCR dicontinuities (4.65 KB, patch) 2011-08-30 14:10 UTC, Vincent Penquerc'h	none	Details \| Review
mpegdemux: catch smaller PTS dicontinuities (3.53 KB, patch) 2011-08-31 13:27 UTC, Vincent Penquerc'h	none	Details \| Review
GST_DEBUG=basesink:4 log of stuck video but sound playing (782.12 KB, application/x-bzip2) 2011-09-01 10:05 UTC, Tvrtko Ursulin		Details
Another log file showing the problem (547.19 KB, application/x-bzip2) 2011-09-02 11:40 UTC, Tvrtko Ursulin		Details
mpegdemux: fix some more video freezing (1.67 KB, patch) 2011-09-03 13:47 UTC, Vincent Penquerc'h	committed	Details \| Review
mpegdemux: catch smaller PTS dicontinuities (5.72 KB, patch) 2011-09-21 15:57 UTC, Vincent Penquerc'h	committed	Details \| Review

Description Tvrtko Ursulin 2011-08-02 12:25:17 UTC

Hi all,

When playing an MPEG TS stream multicast over UDP I occasionally see GStreamer 
stop rendering frames while the play position is normally progressing.

On the last occasion this happened it actually managed to untangle itself 
after a while and started playing (rendering) again. Unfortunately I do not 
know whether it would always be case since previously I did not wait this 
long.

To start getting more specific...

I do not know when exactly it stopped rendering, but I know when I first saw it 
in that state which was at 14:14 today. It was streaming for roughly two hours 
up to that point.

Last interesting logged thing before that state was a QOS event at 14:06. Then 
at 14:21 it started to render again and reported another QOS event. 

In between the application was querying play position every ten seconds which 
showed progress.

If you look at this annotated log:

... <video state unknown>
14:06:15.006 PLAY POSITION 6615867197374L
14:06:18.408 QOS @ 6619179855556L Frames: 291654/704; Jitter: 88265288
14:14 <spotted video not playing (not rendering)>
14:20:59.484 PLAY POSITION 7500344289211L
14:21:08.142 QOS @ 6915779855556L Frames: 306475/705; Jitter: 593222818788L
<video started playing again>

Play position is what query_position() on playbin2 element returns 
(documentation says this is stream time). And times logged in QOS events come 
from stream-time message field, so it should be matched up in that respect.

This second event is reporting that a frame was almost ten minutes late, but 
that there was only one dropped frame in the period on roughly a correct 
number of rendered frames (for 25 fps video) during the period. Which is of 
course wrong because nothing was rendered.

Application is written in python and uses playbin2 with a custom bin set as 
video-sink which is queue ! deinterlace ! queue ! xvimagesink. Audio-sink is 
set to a bin containing alsasink only.

Comment 1 Tim-Philipp Müller 2011-08-02 15:55:38 UTC

Do you have a test program we can reproduce this with?

Comment 2 Tvrtko Ursulin 2011-08-03 07:42:55 UTC

Created attachment 193132 [details]
PyGst test program

Comment 3 Tvrtko Ursulin 2011-08-03 07:43:21 UTC

Hi,

Thanks for looking at this!

I will attach something I just hacked up which is basically the same GStreamer usage wise as the program I initially saw the problem. You will see that the code is pretty simple. It may have bugs since I am quite new to GStreamer and PyGst though.

Reproducing the problem may be difficult since I suspect it depends on the content and length of play. We found this when stress testing GStreamer (PyGst) playing DVB-T rebroadcast over Ethernet which is one of our main use cases. I mostly test with BBC 1 and the problem shows after a couple of hours of streaming. Not every time though, there are different failure modes this bug is only one of (655790 is the other one I entered so far).

I have also seen it when playing MPEG TS rebroadcast from localhost (UDP unicast to 127.0.0.1), unfortunately program which does rebroadcasting is proprietary. And with regards to Ethernet multicast those are proprietary hardware appliances.

I did not manage to reproduce using GStreamer only (dvbbasebin I think is the plugin name) since that crashes typically quite soon. Which reminds me I could raise a bug about that as well.

Versions wise, this is with Fedora's 15 GStreamer packages plus rpmfusion.org bad and ugly plugins. Former recompiled to include dvbsuboverlay for DVB-T subtitles support.

Regards,

Tvrtko

Comment 4 Tvrtko Ursulin 2011-08-03 07:46:05 UTC

Additional comment - it is possible that having a less than perfect DVB-T reception helps in triggering this. But I am not at all sure about this.

Comment 5 Tvrtko Ursulin 2011-08-08 14:13:56 UTC

Created attachment 193419 [details]
GST_DEBUG=3 log of playback freezing for 14 seconds.

Comment 6 Tvrtko Ursulin 2011-08-08 14:14:15 UTC

I have managed to capture a 15Mb transport stream which exhibits similar behaviour. I say similar because I am not 100% sure it the same - there is one possibly crucial difference in the behaviour.

With this stream playback freezes approx. two seconds into the play time and then resumes after approx. 14 seconds. All the time play position is reported as advancing and there are no QOS messages until ~17s into playing. Then playback resumes and a bunch of "old" QOS messages get sent.

Difference from the original report is that this time QOS messages are not reporting the number of rendered frames which would suggest frames were actually displayed in those ~14 seconds, but just 75 frames rendered when first message arrives.

This stream is also of a quite low quality (poor signal), much worse than with what I was seeing a similar problem previously.

However given how players like mplayer and vlc do play this stream much better I suspect there is scope for improvement here.

If nothing maybe send those QOS messages earlier?


I am attaching a GST_DEBUG=3 log of the above.

Comment 7 Vincent Penquerc'h 2011-08-26 17:05:43 UTC

Can that possible-repro sample be shared ?

Comment 8 Vincent Penquerc'h 2011-08-30 14:10:08 UTC

Created attachment 195204 [details] [review]
mpegdemux: catch smaller PCR dicontinuities

In a test stream, I get one buffer with a PCR of about 15 seconds
in the future compared to the previous one, and next buffers with
timestamps continuing where the original ones left off.

This caused the sink to wait 15 seconds to display the frame while
more frames queued up, and then dump all the subsequent frames as
they "arrived too late".

Maybe that threshold should be made configurable, but for now,
make it more smaller to catch more of these.

Comment 9 Vincent Penquerc'h 2011-08-30 14:13:51 UTC

That fixes the issue for the test stream in question.
Now, the threshold is 10 seconds, but if some stream jumps by 9 seconds and back, there will still be a 9 seconds pause.
A way to fix that could be to, when PCR is above something like a half second or so, keep the frame in a temp buffer and not send it, and wait for the next frame to see if it goes back in time or not. If it does, retimestamp the buffered frame and send both.
Seems hacky though. I'm nowhere near enough knowledgeable about MPEG to know if such timestamp jumps are somehow OK, or indicative of a broken stream.

Comment 10 Tvrtko Ursulin 2011-08-31 10:44:35 UTC

Where have you made in smaller in patch from comment #8?

I only see you pulled the constant out, unless I am missing something.

#define DISCONT_THRESHOLD (GST_SECOND * 10 * 60)

Is that ten minutes?

Comment 11 Simon Farnsworth 2011-08-31 10:45:11 UTC

In theory, at least, we can assume that PCR won't change by more than 100 milliseconds between PCR packets.

Reasoning from the spec (references to ISO/IEC 13818-1:2007):

Section 2.4.2.1 (System clock frequency) tells us that the PCR is derived from a stable 27MHz clock.

Section 2.4.2.2 says maximum encoding error of the PCR in the stream is 500 nanoseconds as compared to the "real" PCR value.

Section 2.4.3.5 defines a discontinuity_indicator in the adaptation field to tell us that the PCR is in the process of jumping. When it is set, packets before the packet with discontinuity_indicator set use the old PCR for their timestamps, packets after the one with discontinuity_indicator set use the new PCR.

Finally, section 2.7.2 tells us that the PCR must be coded in the stream at least once every 100 milliseconds, or the stream is not a valid transport stream.

Putting all these bits together, we know that for any PCR packet, if the discontinuity_indicator is not set (indicating that the timebase is changing, and we need to do future timestamping against the new PCR), any jump of more than 100 milliseconds is indicative of a broken stream, and we should ignore this PCR packet and continue to free-run until we pick up a valid PCR again - we should probably increase the permitted timestamp jump by 100 milliseconds for each faulty PCR packet we've seen, so we gracefully degrade and recover if several PCRs in a row are faulty followed by good PCRs again.

There's also the transport_error_indicator in transport packets to hint that this packet is corrupt; our DVB-T cards do appear to set this whenever there are uncorrected errors in the packet. This could also be used as a hint that the PCR is probably wrong if it's a large jump.

Comment 12 Vincent Penquerc'h 2011-08-31 11:22:32 UTC

Hah, I'd set it to 10 seconds, but mistakenly re-used the original 10 minutes value when cleaning up before posting, you're right.

Since the standard mandates 100 ms maximum delay between PCRs, I'll change the code with that in mind. I'm not sure about several PCRs in a row though. If we get two, it would seem that the more likely is an error in the discontinuity indicator instead, than an error in PCR, no ?

Comment 13 Vincent Penquerc'h 2011-08-31 11:48:31 UTC

Actually, it's not the PCR that's jumping, it's the PTS - which I understand to be "presentation timestamp". I'm not sure what is the relation between PTS and PCR.

Comment 14 Simon Farnsworth 2011-08-31 12:08:52 UTC

(In reply to comment #13)
> Actually, it's not the PCR that's jumping, it's the PTS - which I understand to
> be "presentation timestamp". I'm not sure what is the relation between PTS and
> PCR.

In a hardware decoder, PCR is used to synchronise your local 27MHz clock to the broadcaster's 27MHz clock. PTS is the presentation timestamp, measured against that 27MHz clock. Packets are removed from transport buffering at their decode time, which is DTS if present, PTS if no DTS present (see section 2.4.3.7's description of PTS and DTS). I would not personally expect decode times to go backwards, and certainly not by a large amount - this could be used to detect a probable discontinuity.

Comment 15 Vincent Penquerc'h 2011-08-31 13:27:29 UTC

Created attachment 195294 [details] [review]
mpegdemux: catch smaller PTS dicontinuities

In a test stream, I get one buffer with a PTS of about 15 seconds
in the future compared to the previous one, and next buffers with
timestamps continuing where the original ones left off.

This caused the sink to wait 15 seconds to display the frame while
more frames queued up, and then dump all the subsequent frames as
they "arrived too late".

Maybe that threshold should be made configurable, but for now,
make it more smaller to catch more of these.

Comment 16 Tvrtko Ursulin 2011-08-31 14:09:11 UTC

I can confirm the patch works around the issue in the test stream.

However since the originally reported render pause was longer than ten minutes, I will leave streaming on for a day or so on one box to see how it holds long term.

Comment 17 Tvrtko Ursulin 2011-09-01 07:55:26 UTC

Unfortunately this did not fix the original problem.

I left the test running overnight streaming BBC ONE from DVB-T and saw several instances of the bug - the one ongoing as we speak is longer than observed before.

First I had a couple of shorter instances, exactly like in the initial report.

2011-09-01 03:50:21.786 DEBUG QOS[xvimagesink0] @ 27934.877578s rendered/dropped=1394780/123; jitter=87ms
2011-09-01 03:50:28.874 DEBUG playPosition 27940535676389L
2011-09-01 03:50:38.874 DEBUG playPosition 27948599676389L

This goes on for some time, during which I am pretty sure nothing was rendered because it ends with:

2011-09-01 03:55:48.875 DEBUG playPosition 28196567676389L
2011-09-01 03:55:58.874 DEBUG playPosition 28204535676389L
2011-09-01 03:56:05.462 DEBUG QOS[xvimagesink0] @ 27936.537578s rendered/dropped=1394843/124; jitter=342103ms

A QOS message reporting jitter of roughly the period since last QOS message.

Second instance happened early morning, at 5:35 - which is 100% because I have a clock showing that time on a frozen frame on screen right now. And again GStreamer is reporting that play position is advancing and there are no QOS messages.

Log looks like this:

2011-09-01 05:35:01.821 DEBUG QOS[xvimagesink0] @ 5935.305789s rendered/dropped=295874/44; jitter=84ms
2011-09-01 05:35:08.874 DEBUG playPosition 5942442919718L
2011-09-01 05:35:18.874 DEBUG playPosition 5952443305672L
2011-09-01 05:35:28.874 DEBUG playPosition 5962443057111L
...
2011-09-01 08:53:28.874 DEBUG playPosition 16541955894445L
2011-09-01 08:53:38.874 DEBUG playPosition 16550019894445L
2011-09-01 08:53:48.874 DEBUG playPosition 16557987894445L

Comment 18 Vincent Penquerc'h 2011-09-01 08:50:11 UTC

Could you get check if the basesink (debug tag "basesink") emits the message "possibly waiting for clock to reach" with a timestamp that's suddenly higher than what it should be, and more or less the delay which video wedges for ?
Run with GST_DEBUG=basesink:4.
Better even if to run with GST_DEBUG=5 and save the log for later, I might ask you to grep in it and send me excerpts (the full log for a day will be *huge*, so redirect to a partition that's got space).

Comment 19 Tvrtko Ursulin 2011-09-01 08:59:19 UTC

Since it usually takes half day or more for this to happen do you think one or both debug levels would fit in a 1.5GB? (Which is how much I have available on the box).

Comment 20 Vincent Penquerc'h 2011-09-01 09:13:44 UTC

basesink:4 probably would. For the other, not a snowball's chance.
I guess you can run with basesink:4, and look at the log size after 15 minutes, then interpolate, so you'll know in advance if you risk running out.
You can also do something like this (untested):

export GST_DEBUG=basesink:4
run_player 2>&1 | bzip -c9 > log.bz2

GStreamer logs compress well.

Comment 21 Tvrtko Ursulin 2011-09-01 10:05:35 UTC

Created attachment 195372 [details]
GST_DEBUG=basesink:4 log of stuck video but sound playing

I got something new which may be related just as I started capturing debug. This time video got stuck but audio is playing fine. It happened shortly after starting playback and it has been in this state for more than 15 minutes now.

Does it make sense to keep capturing this log in case it unwedges itself or try again with GST_DEBUG=5 in hope something unusual happens again?

Comment 22 Vincent Penquerc'h 2011-09-01 10:38:32 UTC

This is odd, both sinks seem to have gone to PAUSED 20 seconds into playback, then back to PLAYING after 12 seconds.

But yes, the logs do show the image sink waits for a large timestamp (more than a minute into the the future) at some point, so that would pause video for that amount of time at least. Presumably lots of frames dropped after that as, when timestamps go back down, they'd all be considered to be too late.

That "high timestamp" goes on for a couple dozen frames, so it's not a single corrupt timestamp. What's even more worrying is that, after these couple dozen frames, timestamps go back to *before* the original timestamp series.

So the reason for the video pause seems to be the same (large timestamp causing the sink to wait for that timestamp to happen before showing the frame), but caused by a different issue. Maybe a bug in timestamp calculation, or some offset that we don't take into account.

(mostly typing that out for my own memory once I get back to investigating this)

Is the issue repeatable, if you saved that stream ? If so, could I have it ?

Comment 23 Tvrtko Ursulin 2011-09-01 11:11:32 UTC

(In reply to comment #22)
> This is odd, both sinks seem to have gone to PAUSED 20 seconds into playback,
> then back to PLAYING after 12 seconds.

Ah ignore this please, this is our application restarting playback because of too many dropped frames.

> But yes, the logs do show the image sink waits for a large timestamp (more than
> a minute into the the future) at some point, so that would pause video for that
> amount of time at least. Presumably lots of frames dropped after that as, when
> timestamps go back down, they'd all be considered to be too late.
> 
> That "high timestamp" goes on for a couple dozen frames, so it's not a single
> corrupt timestamp. What's even more worrying is that, after these couple dozen
> frames, timestamps go back to *before* the original timestamp series.
> 
> So the reason for the video pause seems to be the same (large timestamp causing
> the sink to wait for that timestamp to happen before showing the frame), but
> caused by a different issue. Maybe a bug in timestamp calculation, or some
> offset that we don't take into account.
> 
> (mostly typing that out for my own memory once I get back to investigating
> this)
> 
> Is the issue repeatable, if you saved that stream ? If so, could I have it ?

Unfortunately I don't have this stream because I was totally not expecting it to replicate so quickly. Going forward I will start capturing so there is hope in taking out a slice which causes this.

Test player has gotten unstuck in the meantime, ~50 minutes after it got stuck and indeed an appropriate jitter was reported in the QOS message.

Compressed log is 10Mb if you are interested I can upload it to our server.

Comment 24 Vincent Penquerc'h 2011-09-01 11:36:43 UTC

As it only contains the basesink information, it's not so useful, I just wanted to know if the sink was going to wait on a large timestamp, which it did, without generating huge amounts of logs.
If you can get a smallish captured video that reproduces it, I'll be able to generate my own fuller logs.
Thanks

Comment 25 Tvrtko Ursulin 2011-09-02 08:49:37 UTC

I left the test running capturing gst debug log and also the stream but unfortunately due to a power cut last night I lost my test logs - meaning I don't know if the problem triggered. Is there a way to see if it happened from the GST_DEBUG log (which is 6.9GB in size)?

Comment 26 Vincent Penquerc'h 2011-09-02 10:11:42 UTC

Yes:

grep "basesink.*xv.*ossibly.waiting" log | less -SR

(You might not have "xv", replace with a unique string of your video sink, it is here to discard references to the audio base sink).

Then look at the timings, and check if there's a sudden large jump. If you see one, look at the full log at that point, and check if the next "clock returned" for that sink happens a long time later.

If so, you can try to cut a portion of the video around that time and hope it's reproducible.

Comment 27 Tvrtko Ursulin 2011-09-02 11:40:05 UTC

Created attachment 195470 [details]
Another log file showing the problem

Please can you look at this look file - I think this could be it and I have a sample stream, but with a small issue that the hang didn't repro although time stamps from this log and from the stream are the same. There is a glitch in the stream just about on the mark where it hung over night. I will upload the stream so you can have a look.

Comment 28 Vincent Penquerc'h 2011-09-03 13:47:19 UTC

Created attachment 195569 [details] [review]
mpegdemux: fix some more video freezing

There was a second threshold, which apparently needs to be smaller
than the first, though I'm not certain of it as I don't understand
yet this nest of wtf that is the mpeg demuxer timing logic.

Fixes video freezing on one (corrupted) MPEG sample. It would
previously never think it was out of the discontinuity, and would
push buffers with no timestamp.

Now this took me more than a day's poking at the thing, for just
one constant change, and I'm scared to have to touch this again :S

Comment 29 Vincent Penquerc'h 2011-09-21 15:57:02 UTC

Created attachment 197168 [details] [review]
mpegdemux: catch smaller PTS dicontinuities

In a test stream, I get one buffer with a PTS of about 15 seconds
in the future compared to the previous one, and next buffers with
timestamps continuing where the original ones left off.

This caused the sink to wait 15 seconds to display the frame while
more frames queued up, and then dump all the subsequent frames as
they "arrived too late".

Maybe that threshold should be made configurable, but for now,
make it more smaller to catch more of these.

https://bugzilla.gnome.org/show_bug.cgi?id=655804

Non AV streams keep using the larger threshold (10 minutes), as
subtitles may arrive only every so often.

Comment 30 Vincent Penquerc'h 2011-09-21 15:58:23 UTC

This update ensures we don't spuriously detect a discontinuity on subtitles streams after a gap of 2 seconds.
Fixes DVB subtitles on one of the sample streams.

Comment 31 Tvrtko Ursulin 2011-09-22 11:25:35 UTC

Looks good in testing, thank you!

Comment 32 Vincent Penquerc'h 2011-11-28 13:59:40 UTC

commit 918d2e710fc12ba9f489d8344549deb511f85ac9
Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk>
Date:   Sat Sep 3 14:37:40 2011 +0100

    mpegdemux: fix some more video freezing
    
    There was a second threshold, which apparently needs to be smaller
    than the first, though I'm not certain of it as I don't understand
    yet this nest of wtf that is the mpeg demuxer timing logic.
    
    Fixes video freezing on one (corrupted) MPEG sample. It would
    previously never think it was out of the discontinuity, and would
    push buffers with no timestamp.
    
    Now this took me more than a day's poking at the thing, for just
    one constant change, and I'm scared to have to touch this again :S
    
    https://bugzilla.gnome.org/show_bug.cgi?id=655804

commit 7960280afbb2fdc436d8e1fcf89a414b37f6e135
Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk>
Date:   Tue Aug 30 14:50:52 2011 +0100

    mpegdemux: catch smaller PTS dicontinuities
    
    In a test stream, I get one buffer with a PTS of about 15 seconds
    in the future compared to the previous one, and next buffers with
    timestamps continuing where the original ones left off.
    
    This caused the sink to wait 15 seconds to display the frame while
    more frames queued up, and then dump all the subsequent frames as
    they "arrived too late".
    
    Maybe that threshold should be made configurable, but for now,
    make it more smaller to catch more of these.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=655804
    
    Non AV streams keep using the larger threshold (10 minutes), as
    subtitles may arrive only every so often.