GNOME Bugzilla – Bug 783470
qtmux with x264 causes offset in video timestamps
Last modified: 2017-06-07 13:47:14 UTC
I'm generating a file like this: gst-launch-1.0 -v videotestsrc num-buffers=10 ! identity silent=false ! x264enc ! qtmux ! filesink location="tmp.mp4" | grep -o "pts: *[^ ]*" pts: 0:00:00.000000000, pts: 0:00:00.033333333, ... and decoding it by: gst-launch-1.0 -v filesrc location="tmp.mp4" ! qtdemux ! avdec_h264 ! fakesink silent=false | grep -o "pts: *[^ ]*" pts: 0:00:00.066666666, pts: 0:00:00.100000000, ... The PTS of the first frame appears to change for some reason, always by two frame durations. This causes offset in lipsync if the file has audio in it. I haven't git bisected yet, but the bug is present in 1.12.0 and 1.8.3. In 1.3.3 the output begins at 0 as expected. Files generated in 1.3.3 however show the offset when decoded in newer versions, and files generated in newer versions show the offset also in 1.3.3. Muxing raw video (without x264enc) gives correct timestamps in all versions.
Okay, bisected to commit 12181efddcd33: qtmux: Handle DTS with negative running time So the problem stems from that QT format doesn't support negative DTS, and qtmux resorts to incrementing both DTS and PTS to make it positive. It seems to me that when the PTS of one pad/stream is adjusted, all the other pads should be adjusted by the same amount?
I doubt you have identified any issues at all here. Here's how things works. First, QT format supports negative DTS. It's GStreamer that does not have support for that. So what we do, is that we forward the timestamp to make them positive and we adjust the segment to bring those timestamp back to appropriate running time and stream time. Your report is entirely based on the reported timestamp, which have no meaning without the associated segment information. Share the segment being pushed, with that information we will be able to determine if the timestamp are correct. Forwarding the timestamp of two frames duration matches x264 default behaviour of starting DTS at minus two frames.
Hmm, ok.. I thought the reported PTS values are what will determine A/V sync, i.e. audio buffer with given PTS will be played at the same time as video frame with given PTS. Is my understanding correct this far? (I have a separate, longer pipeline that demonstrates that video buffers have their PTS changed but corresponding audio does not.)
(In reply to Petteri Aimonen from comment #3) > Hmm, ok.. I thought the reported PTS values are what will determine A/V > sync, i.e. audio buffer with given PTS will be played at the same time as > video frame with given PTS. Is my understanding correct this far? No, the PTS for each buffer will be converter to a time format we call running time (gst_segment_to_running_time()). This time is on the same scale as the pipeline clock, and can be correlated with the clock using the pipeline base time (gst_element_get_base_time()). Each sink share the same clock, and they synchronize their stream using it. > > (I have a separate, longer pipeline that demonstrates that video buffers > have their PTS changed but corresponding audio does not.) Which isn't an indication that A/V sync isn't correct as I just explained.
Ok, thank you for the explanation and sorry for my misunderstanding.