GNOME Bugzilla – Bug 679443
[0.11] x264enc produces wrong timestamps
Last modified: 2013-05-19 18:59:44 UTC
Created attachment 218080 [details] [review] Patch x264 sometimes outputs keyframe DTS lower than PTS and DTS of next frame equal to keyframe PTS. Videoencoder changes the keyframe DTS to PTS, which results in non-monotonous DTS causing decoding problems. Also x264 outputs following timestamps Frame1: PTS 0, DTS -66666666 Frame2: PTS: 20000000, DTS: 0 Videoencoder turns the first frame to PTS:0, DTS:0 so there are two frames with same DTS resulting in muxer warning
The first paragraph is little bit convoluted, example of timestamps generated by x264 Keyframe: PTS: 10000, DTS: 9000 Next frame: PTS 11000, DTS 10000 encoder turns this into Keyframe: PTS: 10000, DTS: 10000 Next frame: PTS 11000, DTS 10000
Actually, this is not entirely correct. X264 outputs negative DTS when b-frames are enabled, gstreamer is currently not able to handle negative DTS.
IMO, there is no reason to hack a particular codec lib's DTS/PTS interpretation into the base classes. Rather, one can accept that (in gstreamer sense/specs) PTS and DTS as set on buffers should be equal for keyframes (as defined by the baseclass and also long since assumed/established in e.g. gst-ffmpeg). In that sense, this is a x264enc bug and following commit fixes things to have it come up with proper timestamps which will play along nicely elsewhere (e.g. not suprise muxers etc): -ugly: commit 839e75d5d145d486a5c114f8f9f7ea362d12866f Author: Mark Nauwelaerts <mark.nauwelaerts@collabora.co.uk> Date: Fri Sep 7 17:38:18 2012 +0200 x264enc: handle possibly negative DTS provided by codec ... by arranging for an offset such that DTS == PTS for keyframes, which is expected elsewhere to go along with semantics of PTS and DTS. Fixes https://bugzilla.gnome.org/show_bug.cgi?id=679443 Also, for good measure, it's not very 'polite' to overwrite DTS if already set by baseclass (and then only for keyframe), so let's not do that in baseclass, but warn at least if they are then not equal as expected: -base: commit d4c1b160ef2efa037eaed203e3d73ba5a86e3cd2 Author: Mark Nauwelaerts <mark.nauwelaerts@collabora.co.uk> Date: Fri Sep 7 17:41:27 2012 +0200 videoencoder: only set invalid DTS equal to PTS for keyframe Also add a bit more debug. See also https://bugzilla.gnome.org/show_bug.cgi?id=679443
I don't think that everyone agrees that PTS = DTS on keyframes. Your patch made it impossible to mux proper MPEG TS (where you need the exact DTS values from encoder otherwise the decoder buffer can underflow). About negative DTS: https://bugzilla.gnome.org/show_bug.cgi?id=679466
If you really insist on current DTS semantics then perhaps there should be a way to restore original values (i.e. sticky event).
for reference: ds: I am strongly against any method that does not put the correct DTS value in the DTS field. matej_k: ds, that's happening now matej_k: depends on how you define correct DTS value I guess ds: The MP4 method is not correct. ds: The correct dts is the time when the decoder should start decoding a frame. Decoding is assumed to be instantaneous, so you can have dts==pts for some frames ds: but never dts > pts. ds: it has to be exactly right to mux correct MPEG-TS streams, otherwise hardware decoders will fail
It may very well (unfortunately) have introduced other problems/bug/drawbacks, but "at least" it is consistent with what videoencoder was/is doing/expecting, which was the intention here. But feel free to revert (or have it reverted), though afaics that will also not fix the original problem reported here (including videoencoder behaviour/assumptions). There are indeed problems wrt PTS/DTS timestamps, as in the referenced report (and in a -devel post), such as negative timestamps but IMO also what to do in case of a missing/unknown PTS and/or DTS, and all sorts of interpolation that it entails (a major pain in e.g. baseparse and probably others). In that respect, we seem to have shifted some fantastic DTS/PTS guesswork coding in e.g. qtmux to other places (to deal with missing DTS or PTS, e.g. baseparse, videoencoder, -decoder). Hopefully, it is at least a bit more reusable in these latter places now. In fact, fwiw, the (I feel) still messy PTS/DTS situation in 1.0 makes me a bit worried about how reliable some ts is at any point (in pipeline). Also, btw, note that no longer assuming PTS == DTS is probably going to make things even harder for some PTS/DTS guesstimating heuristics around. Suffice to say, any particular spec/agreement probably works, if at least care is taken so it all "Just Works" in videoencoder-/decoder and baseparse (including guesstimating etc).
Yeah, guesstimating DTS is tricky. Even if you infer it, it still depends on codec to make it at least approximately correct (i.e. you need to know if you have b frames or b-pyramid so that you shift DTS x frames), so maybe the functionality should be moved to codec parsers? So there are 2 major problems right now if I understand it correctly. 1) It's impossible to represent negative DTS (needed when PTS start from 0). Also in turn negative stream/running time. 2) Some code assumes that PTS = DTS for keyframes. Which classes do this exactly? Is it just videoencoder? Videodecoder? Baseparser? I think in ideal world the encoder should be responsible for producing the proper DTS there shouldn't be need for trying to guess what it should be. Same goes for demuxer, except it gets tricky with matroska and avi. I still don't get why there are no decoding timestamps in matroska.