GNOME Bugzilla – Bug 659489
h264parse: Calculate PTS from DTS (and vice-versa)
Last modified: 2018-11-03 13:09:06 UTC
# GST_DEBUG=*:2 gst-launch filesrc location=bluecherry_test.raw ! h264parse ! mp4mux ! filesink location=test.mp4 Setting pipeline to PAUSED ... Pipeline is PREROLLING ... 0:00:00.020372638 31723 0x250b600 WARN codecparsers_h264 gsth264parser.c:1697:gst_h264_parser_parse_slice_hdr: couldn't find associated picture parameter set with id: 0 0:00:00.020507003 31723 0x250b600 WARN qtmux gstqtmux.c:3217:gst_qt_mux_video_sink_set_caps:<mp4mux0> pad video_00 refused caps video/x-h264, parsed=(boolean)true, stream-format=(string)avc, alignment=(string)au 0:00:00.020542600 31723 0x250b600 WARN baseparse gstbaseparse.c:2738:gst_base_parse_loop:<h264parse0> error: streaming stopped, reason not-negotiated ERROR: from element /GstPipeline:pipeline0/GstH264Parse:h264parse0: GStreamer encountered a general stream error. Additional debug info: gstbaseparse.c(2738): gst_base_parse_loop (): /GstPipeline:pipeline0/GstH264Parse:h264parse0: streaming stopped, reason not-negotiated ERROR: pipeline doesn't want to preroll. Setting pipeline to NULL ... Freeing pipeline ... # gst-typefind bluecherry_test.raw bluecherry_test.raw - video/x-h264, stream-format=(string)byte-stream Full debug here: http://itstar.co.uk/gst-h264parse-mp4mux-debug.log Test file here: http://itstar.co.uk/bluecherry_test.raw
mp4mux is likely rejecting because width/height/framerate aren't present on caps.
It seems h264parse should wait for SPS to properly put width/height/framerate on caps. Setting a dependency on the baseparse bug for it.
*** Bug 649583 has been marked as a duplicate of this bug. ***
Created attachment 215854 [details] H.264 Stream Captured from Logitech C920 Webcam This H.264 stream cannot be parsed successfully for width, height and framerate.
(In reply to comment #4) > Created an attachment (id=215854) [details] > H.264 Stream Captured from Logitech C920 Webcam > > This H.264 stream cannot be parsed successfully for width, height and > framerate. gst-launch filesrc location=/media.mov ! h264parse ! qtmux ! filesink location=/test.mov
Looks like h264parse now (git master) extracts all the details plus codec data properly (and waits until it has them all). Now qtmux fails like this "DTS method failed to re-order timestamps" - h264parse seems to only put DTS on the buffers it pushes towards qtmux.
foo.h264 ! h264parse ! .. only puts DTS and no PTS on buffers foo.mkv ! demux ! h264parse ! ... only puts PTS and no DTS on buffers
Making this more generic. h264parse should be able to figure out the PTS from the DTS and vice-versa (based on frame number, reordering,....).
*** Bug 696300 has been marked as a duplicate of this bug. ***
> *** Bug 696300 has been marked as a duplicate of this bug. *** That bug was marked as blocker since it described a regression. Also it has some more useful information, such as a pointer to https://gitorious.org/gstreamer-omap/gst-plugins-bad/commits/v0.10.23+ti which apparently contains code to do this in the parser.
*** Bug 709415 has been marked as a duplicate of this bug. ***
Can someone explain why should the PTS be set based on DTS? I can't think of a case where in DTS will be present for a frame but not PTS
*** Bug 735628 has been marked as a duplicate of this bug. ***
(In reply to comment #12) > Can someone explain why should the PTS be set based on DTS? I can't think of a > case where in DTS will be present for a frame but not PTS If you have a raw h264 stream, it is expected to be store in decoding order, so the easiest way to generate timestamp, is to set first DTS to 0, and then increment with the desired frame duration (from framerate). This gives you a stream with only DTS. By figure-out out the needed reordering, you should be able to figure-out the PTS (and vis-versa). Luckily that direction will never hit issues with negative timestamp. Here's an interesting diagram that may help understanding. It show the presence of this pts-dts shift that exist, in order to allow B frames to have pts == dts. For the other direction, me might endup having to ship the PTS format, in order to prevent the DTS from being negative
(In reply to comment #14) > (In reply to comment #12) > > Can someone explain why should the PTS be set based on DTS? I can't think of a > > case where in DTS will be present for a frame but not PTS > > If you have a raw h264 stream, it is expected to be store in decoding order, so > the easiest way to generate timestamp, is to set first DTS to 0, and then > increment with the desired frame duration (from framerate). This gives you a > stream with only DTS. By figure-out out the needed reordering, you should be > able to figure-out the PTS (and vis-versa). Luckily that direction will never > hit issues with negative timestamp. Here's an interesting diagram that may help > understanding. It show the presence of this pts-dts shift that exist, in order > to allow B frames to have pts == dts. For the other direction, me might endup > having to ship the PTS format, in order to prevent the DTS from being negative Did you forget to include the diagram?
I would expect the behaviour to be this way Case 1:Container with PTS and DTS Incoming buffer has both PTS and DTS Passthrough case Case 2:Container having only PTS Incoming buffer has only PTS Derive DTS from PTS after parsing H264 stream Case 3:Raw stream but with SEI Incoming buffer does not have any timestamp DTS_START=0, and keep incrementing DTS based on framerate obtained in SEI. Based on picture type and reference frame derive the PTS Case 4:Raw stream and no SEI Incoming buffer does not have any timestamp Assume framerate=25 and set DTS_START=0, and keep incrementing DTS. Based on picture type and reference frame derive the PTS
The number of cases are varied here since you wouldn't now the element downstream h264parse. It could be a decoder, or a muxer(if you are converting bytesream to avc and vice versa) In general the next element might/might not need DTS and/or PTS. But h264parse needn't worry where it sits and unconditionally calculate PTS from DTS, DTS from PTS, DTS and PTS from SEI of elementary stream
I have also seen a bug where in if PTS of two consecutive frames are same i.e., lets say cur_pts = X, prev_pts = X you end up getting the cur_pts = -1 Note cur_pts = prev_pts happens for lossy mpegts streams since avdec_h264 makes cur_pts = prev_pts if cur_pts < prev_pts
That reminds me the last question I have about this, shall be fix the timestamp if something is detected wrong ? Obviously if both timestamp are present, we will keep them and avoid the extra latency required to do the derivation. Here's the missing graphic: https://software.intel.com/sites/default/files/pts-dts_shift_explain.gif Basic rules for H264 is that for B-Frames, pts == dts. P frames (when in decoded order) are moved after the following B-Frames. This create a gap at the start, hence the required initial shift. We should detect if there is possible presence of B-Frames or not, I think there some buffer depth value that can tell use that. We also need to report our latency accordingly, as it won't be negligible in live pipeline. Anyone knowns special details, trick for this ?
(In reply to comment #19) > That reminds me the last question I have about this, shall be fix the timestamp > if something is detected wrong ? Obviously if both timestamp are present, we > will keep them and avoid the extra latency required to do the derivation. > Here's the missing graphic: > > https://software.intel.com/sites/default/files/pts-dts_shift_explain.gif > > Basic rules for H264 is that for B-Frames, pts == dts. P frames (when in > decoded order) are moved after the following B-Frames. This create a gap at the > start, hence the required initial shift. We should detect if there is possible > presence of B-Frames or not, I think there some buffer depth value that can > tell use that. We also need to report our latency accordingly, as it won't be > negligible in live pipeline. Anyone knowns special details, trick for this ? It is possible to find out the POC value based on h264 specification which will give you the exact display order for each frame (This is bit complicated if you follow the spec as it is). Then find out the pts based on this POC.
(In reply to comment #20) > It is possible to find out the POC value based on h264 specification which will > give you the exact display order for each frame (This is bit complicated if you > follow the spec as it is). Then find out the pts based on this POC. Do you know if there is an OSS code somewhere that implement this, or someone kind enough to document it? It's all a bit new to me. Would that mean we don't have to add latency ? Also, how do we know the pts-dts shift from that ?
(In reply to comment #21) > (In reply to comment #20) > > It is possible to find out the POC value based on h264 specification which will > > give you the exact display order for each frame (This is bit complicated if you > > follow the spec as it is). Then find out the pts based on this POC. > > Do you know if there is an OSS code somewhere that implement this, or someone > kind enough to document it? It's all a bit new to me. Would that mean we don't > have to add latency ? Also, how do we know the pts-dts shift from that ? In my understanding, If there are buffering_period and picture_timing SEI messages in the stream, we can utilize the cpb_removal_delay and dpb_output_delay values to find out the dts/pts. But many streams doesn't include SEI . So we need to do some calculations. Find out the POC based on h264 spec section 8.2.1 (this is bit lengthy) . Then some heuristics are possible, for eg: 1/fps*poc will give the pts with in an idr period. We have POC calculation in gstreamer-vaapi: https://gitorious.org/vaapi/gstreamer-vaapi/source/406aa37373e2b9917714eccd2834a45d18b61fd1:gst-libs/gst/vaapi/gstvaapidecoder_h264.c#L1914
It would be wrong to assume PTS = DTS for B frames. If a B frame is a reference frame to another B frame, this won't hold good. It is only true for those B frames which are non reference pictures Essentially the encoder would buffer N frames, before it releases H264 NAL for the first frame. This N would depend be num_b_frames + 1 in worst case. Similarly, decoder has to buffer minimum that many frames(which is dictated by LEVEL in SPS) before it can release any frame So h264parse should not worry about any buffering scheme here. It should be the decoder who should worry about buffering h264parse should take care of 1. Inter conversion of stream formats(byte stream and avc) 2. COnverting packetized data to frame data 3. Extracting certain key information from the header so that caps and buffer params can be set properly
H264 slice header has POC and frame_num fields. They are similar in nature but have some difference As mentioned, the parser can either use what is present in SEI or worst case use POC. All this, if both DTS and PTS arent present
As I learn further, doing I/P/B analyses is just a bad idea, as it would not cope with missing frame. I'm not sure what is present in SEI, but I clearly figure-out the POC is a good way forward (even though that isn't completely trivial, still need to look at gst-vaapi code that sree mention, that could save us a lot of time). My concern is if we will actually get PTS/DTS with correct offset or not. Current experience is that flv to youtube streaming with B-Frames is very sensible to this, so I would like if we get that right. Btw, raise your hand if you'd like to implement this, I'm clearly not the best person, I would do this as last resort.
The POC calculation has been in my codecanalyzer-todo list for a while :) But didn't get time to start with h264 support yet. May be we can add an API to codecparser library to find out the poc, something like gst_h264_frame_get_display_num ().. WDT?
(In reply to comment #26) > The POC calculation has been in my codecanalyzer-todo list for a while :) But > didn't get time to start with h264 support yet. > > May be we can add an API to codecparser library to find out the poc, something > like gst_h264_frame_get_display_num ().. WDT? I think it's a very good idea.
Okay, let me check whether it is possible to write a general API with out providing a list of arguments! Otherwise we can stick with a subroutine in h264parser itself.
It seems to be pretty straight forward to copy the stuffs from gstreamer-vaapi to h264parser :) Only some minor tweaks are needed. I will provide the patch unless Gwenole wants to do it by himself since the initial code was from him.
Gwenole, Ping?
Had a chat with Gwenole. I will provide the patch for POC calculation but not sure about a separate API.
Created attachment 285301 [details] [review] videoparsers: h264: Add POC calculation Decoding the PictureOrderCount for each frame/field based on h264 specification (sec: 8.2.1) Original code from gstreamer-vaapi::/gst-libs/gst/vaapi/gstvaapidecoder_h264.c
I would like to get some review for this. There are slight changes here and there from gstreamer-vaapi, but almost the same code :). Not updating the TopFieldOrderCount/BottomFieldOrderCount in FIELD_POC[0]/FIELD_POC[1] because of two reasons(I have a comment block for the same with in the patch too): 1: This need more changes since we have to wait for the second field picture like gst-vaapi. 2: The requirement is to derive PTS and for that, the h264parse->POC field is sufficient I guess
Also the other way around, we need to calculate DTS if we only have PTS (e.g. with matroska).
Created attachment 285355 [details] [review] videoparsers: h264: Add POC calculation Added bug report link to commit log and some copyright message changes.
(In reply to comment #34) > Also the other way around, we need to calculate DTS if we only have PTS (e.g. > with matroska). Isn't "DTS(n) = DTS(n-1) + duration - shift" ? p.s. most likely the shift need to be added to PTS in gstreamer to avoid negative values
I have noticed that gst_h264_parse_get_timestamp() is invoking with &GST_BUFFER_TIMESTAMP (buffer) as argument. I think it should be &GST_BUFFER_DTS (buffer) since we don't have the implementation to handle the PTS right now. ??
I have a similar issue with mpeg4 too, see Bug 736190 (I sent patch for both demuxer and muxer ignoring that this should be done by the parser), the patch attached here seems to fix the h264 case only
Are we considering the DTS of a a frame as "DTS_Of_Previous_Frame + Duration_Of_Previous_Frame" in Gstreamer? If so it won't be right for all the cases in h264. Let say there is a SEI Picture Timing messages which indicates the frame duration as double (pic_struct == FrameDoubling), then the prev_frame_dts + prev_duration = next_frame_dts won't work. Because here this frame duration has nothing to do with deocoding time, but it only affect the PTS of the frame. This will complicate the PTS calculation too. I think it is impossible to find out the PTS of a frame with in parser element if pic_struct >= 3 which includes frame doubling , frame tripling etc. Because we have no clue about the frame duration associated with other pictures having poc less than the current frame's poc. But in all other cases we can still interpolate the PTS based on POC.
I wrote some code to handle the pts/dts stuff: http://cgit.freedesktop.org/~sree/gst-plugins-bad/commit/?h=h264-pts It still need some careful review and testing. I only did some basic testing. PTS is generating based on POC and frame_count. And as I mentioned before, it won't try to find pts if pic_struct value >=3. Anyone have better interpolation methods??? Right now, if the stream coming from a container like mp4 or matroska then the parser will be working in a pass-through mode by default(in most cases unless we try to change the stream-format or alignment) and it didn't parse the slice headers at all. So the current code path for TS generation wont't get invoked.
*** Bug 736190 has been marked as a duplicate of this bug. ***
I noticed that the timestamp calculation code depends on num_units_in_tick and time_scale, but what about files that don't have VUI or don't have the framerate specified? it's not mandatory (or even guaranteed to be right). With encoders the timestamp delta (PTS - DTS) seem to depend on b-pyramid level. I.e. when b-pyramid is 0, DTS are one frame behind PTS; with b-pyramid 1, DTS are two frames behind DTS. I have also noticed certain muxers do that. Could that be done here?
Is there a workaround or fix for this yet?
gst_base_parse_set_infer_ts/gst_base_parse_set_pts_interpolation() has been used with mitigated success. Short term, one could just detect if there isn't any B-Frames as in this case you can copy the PTS to DTS and vis-versa.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/47.