Bug 659489 – h264parse: Calculate PTS from DTS (and vice-versa)

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 659489 - h264parse: Calculate PTS from DTS (and vice-versa)


Summary:	h264parse: Calculate PTS from DTS (and vice-versa)


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-bad
Version:	git master
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Duplicates:	649583 696300 709415 735628 736190 (view as bug list)
Depends on:	646327
Blocks:	667559 734547

Reported:	2011-09-19 16:07 UTC by Roman Gaufman
Modified:	2018-11-03 13:09 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
H.264 Stream Captured from Logitech C920 Webcam (584.39 KB, video/quicktime) 2012-06-07 14:30 UTC, Robert Krakora		Details
videoparsers: h264: Add POC calculation (15.84 KB, patch) 2014-09-03 23:17 UTC, sreerenj	none	Details \| Review
videoparsers: h264: Add POC calculation (16.01 KB, patch) 2014-09-04 12:00 UTC, sreerenj	none	Details \| Review

Description Roman Gaufman 2011-09-19 16:07:13 UTC

# GST_DEBUG=*:2 gst-launch filesrc location=bluecherry_test.raw ! h264parse ! mp4mux ! filesink location=test.mp4
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
0:00:00.020372638 31723      0x250b600 WARN       codecparsers_h264 gsth264parser.c:1697:gst_h264_parser_parse_slice_hdr: couldn't find associated picture parameter set with id: 0
0:00:00.020507003 31723      0x250b600 WARN                   qtmux gstqtmux.c:3217:gst_qt_mux_video_sink_set_caps:<mp4mux0> pad video_00 refused caps video/x-h264, parsed=(boolean)true, stream-format=(string)avc, alignment=(string)au
0:00:00.020542600 31723      0x250b600 WARN               baseparse gstbaseparse.c:2738:gst_base_parse_loop:<h264parse0> error: streaming stopped, reason not-negotiated
ERROR: from element /GstPipeline:pipeline0/GstH264Parse:h264parse0: GStreamer encountered a general stream error.
Additional debug info:
gstbaseparse.c(2738): gst_base_parse_loop (): /GstPipeline:pipeline0/GstH264Parse:h264parse0:
streaming stopped, reason not-negotiated
ERROR: pipeline doesn't want to preroll.
Setting pipeline to NULL ...
Freeing pipeline ...

# gst-typefind bluecherry_test.raw 
bluecherry_test.raw - video/x-h264, stream-format=(string)byte-stream

Full debug here: http://itstar.co.uk/gst-h264parse-mp4mux-debug.log
Test file here: http://itstar.co.uk/bluecherry_test.raw

Comment 1 Thiago Sousa Santos 2011-09-25 19:55:49 UTC

mp4mux is likely rejecting because width/height/framerate aren't present on caps.

Comment 2 Thiago Sousa Santos 2011-09-25 22:30:55 UTC

It seems h264parse should wait for SPS to properly put width/height/framerate on caps. Setting a dependency on the baseparse bug for it.

Comment 3 Tim-Philipp Müller 2011-10-08 19:35:25 UTC

*** Bug 649583 has been marked as a duplicate of this bug. ***

Comment 4 Robert Krakora 2012-06-07 14:30:14 UTC

Created attachment 215854 [details]
H.264 Stream Captured from Logitech C920 Webcam

This H.264 stream cannot be parsed successfully for width, height and
framerate.

Comment 5 Robert Krakora 2012-06-07 14:32:08 UTC

(In reply to comment #4)
> Created an attachment (id=215854) [details]
> H.264 Stream Captured from Logitech C920 Webcam
> 
> This H.264 stream cannot be parsed successfully for width, height and
> framerate.

gst-launch filesrc location=/media.mov ! h264parse ! qtmux ! filesink location=/test.mov

Comment 6 Tim-Philipp Müller 2012-10-21 13:06:15 UTC

Looks like h264parse now (git master) extracts all the details plus codec data properly (and waits until it has them all).

Now qtmux fails like this "DTS method failed to re-order timestamps" - h264parse seems to only put DTS on the buffers it pushes towards qtmux.

Comment 7 Tim-Philipp Müller 2013-01-03 00:36:53 UTC

foo.h264 ! h264parse ! .. only puts DTS and no PTS on buffers
foo.mkv ! demux ! h264parse ! ... only puts PTS and no DTS on buffers

Comment 8 Edward Hervey 2013-07-24 08:04:51 UTC

Making this more generic.

h264parse should be able to figure out the PTS from the DTS and vice-versa (based on frame number, reordering,....).

Comment 9 Edward Hervey 2013-07-24 08:12:43 UTC

*** Bug 696300 has been marked as a duplicate of this bug. ***

Comment 10 Tim-Philipp Müller 2013-07-24 11:22:29 UTC

> *** Bug 696300 has been marked as a duplicate of this bug. ***

That bug was marked as blocker since it described a regression. Also it has some more useful information, such as a pointer to https://gitorious.org/gstreamer-omap/gst-plugins-bad/commits/v0.10.23+ti which apparently contains code to do this in the parser.

Comment 11 Sebastian Dröge (slomo) 2013-10-10 13:13:03 UTC

*** Bug 709415 has been marked as a duplicate of this bug. ***

Comment 12 Baby octopus 2014-05-07 18:06:43 UTC

Can someone explain why should the PTS be set based on DTS? I can't think of a  case where in DTS will be present for a frame but not PTS

Comment 13 Nicolas Dufresne (ndufresne) 2014-08-28 23:07:45 UTC

*** Bug 735628 has been marked as a duplicate of this bug. ***

Comment 14 Nicolas Dufresne (ndufresne) 2014-08-28 23:17:11 UTC

(In reply to comment #12)
> Can someone explain why should the PTS be set based on DTS? I can't think of a 
> case where in DTS will be present for a frame but not PTS

If you have a raw h264 stream, it is expected to be store in decoding order, so
the easiest way to generate timestamp, is to set first DTS to 0, and then
increment with the desired frame duration (from framerate). This gives you a
stream with only DTS. By figure-out out the needed reordering, you should be
able to figure-out the PTS (and vis-versa). Luckily that direction will never
hit issues with negative timestamp. Here's an interesting diagram that may help
understanding. It show the presence of this pts-dts shift that exist, in order
to allow B frames to have pts == dts. For the other direction, me might endup
having to ship the PTS format, in order to prevent the DTS from being negative

Comment 15 Peter Maersk-Moller 2014-08-29 07:28:05 UTC

(In reply to comment #14)
> (In reply to comment #12)
> > Can someone explain why should the PTS be set based on DTS? I can't think of a 
> > case where in DTS will be present for a frame but not PTS
> 
> If you have a raw h264 stream, it is expected to be store in decoding order, so
> the easiest way to generate timestamp, is to set first DTS to 0, and then
> increment with the desired frame duration (from framerate). This gives you a
> stream with only DTS. By figure-out out the needed reordering, you should be
> able to figure-out the PTS (and vis-versa). Luckily that direction will never
> hit issues with negative timestamp. Here's an interesting diagram that may help
> understanding. It show the presence of this pts-dts shift that exist, in order
> to allow B frames to have pts == dts. For the other direction, me might endup
> having to ship the PTS format, in order to prevent the DTS from being negative

Did you forget to include the diagram?

Comment 16 Baby octopus 2014-08-29 13:08:18 UTC

I would expect the behaviour to be this way

Case 1:Container with PTS and DTS
    Incoming buffer has both PTS and DTS
    Passthrough case

Case 2:Container having only PTS
    Incoming buffer has only PTS
    Derive DTS from PTS after parsing H264 stream

Case 3:Raw stream but with SEI
    Incoming buffer does not have any timestamp
    DTS_START=0, and keep incrementing DTS based on framerate obtained in SEI. Based on picture type and reference frame derive the PTS

Case 4:Raw stream and no SEI
    Incoming buffer does not have any timestamp
    Assume framerate=25 and set DTS_START=0, and keep incrementing DTS. Based on picture type and reference frame derive the PTS

Comment 17 Baby octopus 2014-08-29 13:11:59 UTC

The number of cases are varied here since you wouldn't now the element downstream h264parse. It could be a decoder, or a muxer(if you are converting bytesream to avc and vice versa)

In general the next element might/might not need DTS and/or PTS. But h264parse needn't worry where it sits and unconditionally calculate PTS from DTS, DTS from PTS, DTS and PTS from SEI of elementary stream

Comment 18 Baby octopus 2014-08-29 13:21:38 UTC

I have also seen a bug where in if PTS of two consecutive frames are same i.e., lets say cur_pts = X, prev_pts = X you end up getting the cur_pts = -1

Note
cur_pts = prev_pts happens for lossy mpegts streams since avdec_h264 makes cur_pts = prev_pts if cur_pts < prev_pts

Comment 19 Nicolas Dufresne (ndufresne) 2014-08-29 14:21:47 UTC

That reminds me the last question I have about this, shall be fix the timestamp if something is detected wrong ? Obviously if both timestamp are present, we will keep them and avoid the extra latency required to do the derivation. Here's the missing graphic:

https://software.intel.com/sites/default/files/pts-dts_shift_explain.gif

Basic rules for H264 is that for B-Frames, pts == dts. P frames (when in decoded order) are moved after the following B-Frames. This create a gap at the start, hence the required initial shift. We should detect if there is possible presence of B-Frames or not, I think there some buffer depth value that can tell use that. We also need to report our latency accordingly, as it won't be negligible in live pipeline. Anyone knowns special details, trick for this ?

Comment 20 sreerenj 2014-08-29 14:34:39 UTC

(In reply to comment #19)
> That reminds me the last question I have about this, shall be fix the timestamp
> if something is detected wrong ? Obviously if both timestamp are present, we
> will keep them and avoid the extra latency required to do the derivation.
> Here's the missing graphic:
> 
> https://software.intel.com/sites/default/files/pts-dts_shift_explain.gif
> 
> Basic rules for H264 is that for B-Frames, pts == dts. P frames (when in
> decoded order) are moved after the following B-Frames. This create a gap at the
> start, hence the required initial shift. We should detect if there is possible
> presence of B-Frames or not, I think there some buffer depth value that can
> tell use that. We also need to report our latency accordingly, as it won't be
> negligible in live pipeline. Anyone knowns special details, trick for this ?

It is possible to find out the POC value based on h264 specification which will give you the exact display order for each frame (This is bit complicated if you follow the spec as it is). Then find out the pts based on this POC.

Comment 21 Nicolas Dufresne (ndufresne) 2014-08-29 14:57:46 UTC

(In reply to comment #20)
> It is possible to find out the POC value based on h264 specification which will
> give you the exact display order for each frame (This is bit complicated if you
> follow the spec as it is). Then find out the pts based on this POC.

Do you know if there is an OSS code somewhere that implement this, or someone kind enough to document it? It's all a bit new to me. Would that mean we don't have to add latency ? Also, how do we know the pts-dts shift from that ?

Comment 22 sreerenj 2014-08-29 15:32:42 UTC

(In reply to comment #21)
> (In reply to comment #20)
> > It is possible to find out the POC value based on h264 specification which will
> > give you the exact display order for each frame (This is bit complicated if you
> > follow the spec as it is). Then find out the pts based on this POC.
> 
> Do you know if there is an OSS code somewhere that implement this, or someone
> kind enough to document it? It's all a bit new to me. Would that mean we don't
> have to add latency ? Also, how do we know the pts-dts shift from that ?

In my understanding, If there are buffering_period and picture_timing SEI messages in the stream, we can utilize the cpb_removal_delay and dpb_output_delay values to find out the dts/pts. But many streams doesn't include SEI . So we need to do some calculations. Find out the POC based on h264 spec section 8.2.1 (this is bit lengthy) . Then some heuristics are possible, for eg: 1/fps*poc  will give the pts with in an idr period.

We have POC calculation in gstreamer-vaapi:
https://gitorious.org/vaapi/gstreamer-vaapi/source/406aa37373e2b9917714eccd2834a45d18b61fd1:gst-libs/gst/vaapi/gstvaapidecoder_h264.c#L1914

Comment 23 Baby octopus 2014-08-30 05:15:06 UTC

It would be wrong to assume PTS = DTS for B frames. If a B frame is a reference frame to another B frame, this won't hold good. It is only true for those B frames which are non reference pictures

Essentially the encoder would buffer N frames, before it releases H264 NAL for the first frame. This N would depend be num_b_frames + 1 in worst case. Similarly, decoder has to buffer minimum that many frames(which is dictated by LEVEL in SPS) before it can release any frame

So h264parse should not worry about any buffering scheme here. It should be the decoder who should worry about buffering
h264parse should take care of
1. Inter conversion of stream formats(byte stream and avc)
2. COnverting packetized data to frame data
3. Extracting certain key information from the header so that caps and buffer params can be set properly

Comment 24 Baby octopus 2014-08-30 05:48:42 UTC

H264 slice header has POC and frame_num fields. They are similar in nature but have some difference

As mentioned, the parser can either use what is present in SEI or worst case use POC. All this, if both DTS and PTS arent present

Comment 25 Nicolas Dufresne (ndufresne) 2014-08-30 15:05:42 UTC

As I learn further, doing I/P/B analyses is just a bad idea, as it would not cope with missing frame. I'm not sure what is present in SEI, but I clearly figure-out the POC is a good way forward (even though that isn't completely trivial, still need to look at gst-vaapi code that sree mention, that could save us a lot of time).

My concern is if we will actually get PTS/DTS with correct offset or not. Current experience is that flv to youtube streaming with B-Frames is very sensible to this, so I would like if we get that right. Btw, raise your hand if you'd like to implement this, I'm clearly not the best person, I would do this as last resort.

Comment 26 sreerenj 2014-08-31 00:00:51 UTC

The POC calculation has been in my codecanalyzer-todo list for a while :) But didn't get time to start with h264 support yet. 

May be we can add an API to codecparser library to find out the poc, something like  gst_h264_frame_get_display_num ().. WDT?

Comment 27 Nicolas Dufresne (ndufresne) 2014-08-31 02:41:18 UTC

(In reply to comment #26)
> The POC calculation has been in my codecanalyzer-todo list for a while :) But
> didn't get time to start with h264 support yet. 
> 
> May be we can add an API to codecparser library to find out the poc, something
> like  gst_h264_frame_get_display_num ().. WDT?

I think it's a very good idea.

Comment 28 sreerenj 2014-08-31 09:59:38 UTC

Okay, let me check whether it is possible to write a general API with out providing a list of arguments! Otherwise we can stick with a subroutine in h264parser itself.

Comment 29 sreerenj 2014-08-31 16:24:01 UTC

It seems to be pretty straight forward to copy the stuffs from gstreamer-vaapi to h264parser :) Only some minor tweaks are needed. I will provide the patch unless Gwenole wants to do it by himself since the initial code was from him.

Comment 30 sreerenj 2014-09-02 10:59:35 UTC

Gwenole, Ping?

Comment 31 sreerenj 2014-09-03 15:41:38 UTC

Had a chat with Gwenole. I will provide the patch for POC calculation but not sure about a separate API.

Comment 32 sreerenj 2014-09-03 23:17:39 UTC

Created attachment 285301 [details] [review]
videoparsers: h264: Add POC calculation

Decoding the PictureOrderCount for each frame/field based on h264 specification (sec: 8.2.1)
Original code from gstreamer-vaapi::/gst-libs/gst/vaapi/gstvaapidecoder_h264.c

Comment 33 sreerenj 2014-09-03 23:33:41 UTC

I would like to get some review for this.
There are slight changes here and there from gstreamer-vaapi, but almost the same code :). Not updating the TopFieldOrderCount/BottomFieldOrderCount in FIELD_POC[0]/FIELD_POC[1] because of two reasons(I have a comment block for the same with in the patch too):

1: This need more changes since we have to wait for the second field picture like gst-vaapi.
2: The requirement is to derive PTS and for that, the h264parse->POC field is sufficient I guess

Comment 34 Sebastian Dröge (slomo) 2014-09-04 07:59:40 UTC

Also the other way around, we need to calculate DTS if we only have PTS (e.g. with matroska).

Comment 35 sreerenj 2014-09-04 12:00:32 UTC

Created attachment 285355 [details] [review]
videoparsers: h264: Add POC calculation

Added bug report link to commit log and some copyright message changes.

Comment 36 Nicolas Dufresne (ndufresne) 2014-09-04 13:50:05 UTC

(In reply to comment #34)
> Also the other way around, we need to calculate DTS if we only have PTS (e.g.
> with matroska).

Isn't "DTS(n) = DTS(n-1) + duration - shift" ?

p.s. most likely the shift need to be added to PTS in gstreamer to avoid negative values

Comment 37 sreerenj 2014-09-05 11:36:21 UTC

I have noticed that gst_h264_parse_get_timestamp() is invoking with &GST_BUFFER_TIMESTAMP (buffer) as argument. I think it should be &GST_BUFFER_DTS (buffer) since we don't have the implementation to handle the PTS right now. ??

Comment 38 Nicola 2014-09-06 18:14:48 UTC

I have a similar issue with mpeg4 too, see Bug 736190 (I sent patch for both demuxer and muxer ignoring that this should be done by the parser), the patch attached here seems to fix the h264 case only

Comment 39 sreerenj 2014-09-07 22:42:06 UTC

Are we considering the DTS of a a frame as "DTS_Of_Previous_Frame + Duration_Of_Previous_Frame" in Gstreamer?
If so it won't be right for all the cases in h264.
Let say there is a SEI Picture Timing messages which indicates the frame duration as double (pic_struct == FrameDoubling),
then the prev_frame_dts + prev_duration = next_frame_dts won't work. Because here this frame duration has nothing to do with deocoding time,
but it only affect the PTS of the frame.

This will complicate the PTS calculation too. I think it is impossible to find out the PTS of a frame with in parser element if pic_struct >= 3  which includes frame doubling , frame tripling etc. Because we have no clue about the frame 
duration associated with other pictures having poc less than the current frame's poc.
But in all other cases we can still interpolate the PTS based on POC.

Comment 40 sreerenj 2014-09-08 21:29:13 UTC

I wrote some code to handle the pts/dts stuff:
http://cgit.freedesktop.org/~sree/gst-plugins-bad/commit/?h=h264-pts

It still need some careful review and testing. I only did some basic testing. PTS is generating based on POC and frame_count. 
And as I mentioned before, it won't try to find pts if pic_struct value >=3.
Anyone have better interpolation methods???

Right now, if the stream coming from a container like mp4 or matroska then the parser will be working in a pass-through mode by default(in most cases unless we try to change the stream-format or alignment) and it didn't parse the slice headers at all. So the current code path for TS generation wont't get invoked.

Comment 41 Sebastian Dröge (slomo) 2014-09-12 11:26:20 UTC

*** Bug 736190 has been marked as a duplicate of this bug. ***

Comment 42 Matej Knopp 2014-11-23 00:45:52 UTC

I noticed that the timestamp calculation code depends on num_units_in_tick and time_scale, but what about files that don't have VUI or don't have the framerate specified? it's not mandatory (or even guaranteed to be right).

With encoders the timestamp delta (PTS - DTS) seem to depend on b-pyramid level. I.e. when b-pyramid is 0, DTS are one frame behind PTS; with b-pyramid 1, DTS are two frames behind DTS. I have also noticed certain muxers do that. Could that be done here?

Comment 43 Pieter Willem Jordaan 2018-10-18 08:52:23 UTC

Is there a workaround or fix for this yet?

Comment 44 Nicolas Dufresne (ndufresne) 2018-10-18 13:43:47 UTC

gst_base_parse_set_infer_ts/gst_base_parse_set_pts_interpolation() has been used with mitigated success.

Short term, one could just detect if there isn't any B-Frames as in this case you can copy the PTS to DTS and vis-versa.

Comment 45 GStreamer system administrator 2018-11-03 13:09:06 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/47.