GNOME Bugzilla – Bug 796169
vaapih264dec: Sample video that renders fine with FFMPEG but not in GStreamer
Last modified: 2018-06-18 16:22:46 UTC
This sample file work fine with FFMPEG VAAPI code, but not in GStreamer. http://people.collabora.com/%7Enicolas/SliceHD.mpg
Now I've been tracking the DPB state (some of it) against the short/long_ref lists, which if I understood well is a bit of duplication, as the DPB is kind of supposed to be these two list. Anyway, tracing them shows that whenever the image starts being broken, the short/long ref lists diverged from the dpb, in fact it becomes larger then the DPB size. Something like this is a recurring pattern when it breaks: 0:00:28.952788127 31809 0x7f9580013590 INFO vaapi gstvaapidecoder_h264.c:759:dpb_dump: DPB State: 4/4 [ P|66752, P|66760, P|66748, B|66756, ] 0:00:28.952809735 31809 0x7f9580013590 INFO vaapi gstvaapidecoder_h264.c:780:dpb_dump: Refs State: 8/4 [ P|66752, P|66752, P|66760, P|66760, P|66748, P|66748, B|66756, B|66756, || ]
Ok, that's hard to read, maybe this is better: DPP: 4/4 [ P|66752, P|66760, P|66748, B|66756, ] Refs: 8/4 [ P|66752, P|66752, P|66760, P|66760, P|66748, P|66748, B|66756, B|66756, || ]
Also, notice the duplication. Later on, we still have the duplication, but then some frames seems lost: DPB State: 4/4 [ P|66752, P|66760, B|66756, P|66770, ] Refs State: 6/4 [ P|66752, P|66752, P|66760, P|66760, B|66756, B|66756, || ] P|66770 is gone missing.
Ok, I have switch to tracing PicNum, as it seems to make more sense. Then notice this: DPB State: 4/4 [ B|31, P|32, B|33, I|0, ] Refs State: 3/4 [ B|31, P|32, B|33, || ] -> fill_picture_first_field_gap_done() DPB State: 4/4 [ B|62, P|64, B|66, I|68, ] Refs State: 8/4 [ B|62, B|63, P|64, P|65, B|66, B|67, I|68, I|69, || ] So the Refs having less frames actually make sense, some frame came in, but we don't have all of it yet (so refs having one less frame reflect this). In this case, it seems we get a lonely I frame field, and then it goes wild.
Created attachment 372405 [details] [review] hack to avoid ghost field insertion It is a quick hack for testing.
Interesting, that indeed hide the issue I suppose. I'm been testing here with JM reference decoder on the side. I've implement the equivalent dump_dpb() function in gstreamer-vaapi. First thing I notice, is that JM thinks the DPB size should be 5, while gstreamer-vaapi thinks it's 4.
I think there is still a field inversion happening (unless its a bug in vpp deinterlacer), as we have 1 flash back on the first scene change. At this step, JM output an interlaced frames with the top field on the first scene, and bottom field on the second scene.
For the DPB, if I round up in get_max_dec_frame_buffering(), it fixes it, not sure who's right here.
(but it has no effect on the bug though)
Even with the hack, the video still has a bit of shaking. right? There are few field-encoded frames (poc 2/3, poc 66/67 etc) where top and bottom fields have different scenes and, even the field parity order is changing. I guess the core issue is related to the handling of these frames.
(In reply to Nicolas Dufresne (ndufresne) from comment #8) > For the DPB, if I round up in get_max_dec_frame_buffering(), it fixes it, > not sure who's right here. dpb size: https://bugzilla.gnome.org/show_bug.cgi?id=762509#c10
(In reply to Nicolas Dufresne (ndufresne) from comment #7) > I think there is still a field inversion happening (unless its a bug in vpp > deinterlacer), as we have 1 flash back on the first scene change. At this > step, JM output an interlaced frames with the top field on the first scene, > and bottom field on the second scene. Yup, you are on track :) https://bugzilla.gnome.org/show_bug.cgi?id=796169#c10
Ok, sorry, the DPB size aspect was just noise, JM default config increase by one the size, I commented that out, and decoding is still clean. (In reply to sreerenj from comment #10) > Even with the hack, the video still has a bit of shaking. right? Right, note that most of the shaking seems to come from the deinterlacer (same result from VPP and deinterlace really). I'm not too concern about that. Great, just need a root cause now ...
Created attachment 372409 [details] [review] h264dec: Properly set sentinel in ref frame list This ensure that we always have sentinels set in the reference pictures arrays. The code wasn't unsafe, this simply improve the tracing, so instead of printing 32 lines of zeros, va tracer prints proper empty lists.
Not strictly related, but while debugging, I notice that the LIBVA_TRACE output was messed up, this fixes it so that it traces empty lists properly.
(In reply to Nicolas Dufresne (ndufresne) from comment #13) > Ok, sorry, the DPB size aspect was just noise, JM default config increase by > one the size, I commented that out, and decoding is still clean. > > (In reply to sreerenj from comment #10) > > Even with the hack, the video still has a bit of shaking. right? > > Right, note that most of the shaking seems to come from the deinterlacer > (same result from VPP and deinterlace really). I'm not too concern about > that. > > Great, just need a root cause now ... I think I know the root cause. It is a bug in interpreting the "missing first field", due to the wrong assumption based on previous frame field order.
Created attachment 372412 [details] This is a dump of the GStreamer DPB
Created attachment 372413 [details] And this is the same dump, but with JM reference decoder
Created attachment 372414 [details] [review] HACK: Implement JM reference decoder DPB Dump
Current code tries to identify missing first field based on the assumption that we keep the same field sequence, that is, if previous frames were in top-field-first (TFF) order, then so are subsequent frames. Unfortunately H264 spec doesn't have a TFF syntax element and only field parity is communicating through slice_hdr syntax elements. So gstreamer-vaapi took an assumption that within the same SPS activated sequence always have a same first-field parity. See this bug report: https://bugzilla.gnome.org/show_bug.cgi?id=745048 In this stream, 49th frame has a FRAME which has TFF parity order. 50 th frame (two separate fields with two slice headers) has a parity change, Bottom field came first. But since gstreaamer-vaapi took the assumption of TFF in a sequence of frames activated by SPS, it guess the first field is missing and tries to add ghost field and eventually everything get messed-up. I'm not sure how to fix this scenario. One option could be to remove the assumption we took in #745048 may cause issues with other streams. The second option is to cache 2 fields (all its internal state structures) before making the assumption "missing-first-field", unfortunately implementing this requires significant effort!
I'm trying to understand what to do. What I see from FFMPEG code (assuming I'm reading it properly), whenever it starts a new frame, it will first go over the DPB to find all frame with unpaired fields. When one is found, there is two venue, pair it with the new frame, otherwise it's a lost field, and it must be concealed. In the software decoder, they copy the previous field of the same parity, I don't see any concealment on the VAAPI side there. Is the software FFMPEG decoder method what you had in mind when you say "cache 2 fields" ?
One thing to notice is that h264parse will update the caps 13 times, and there is 13 parity changes in that stream. Each time, the framerate goes from 30000/1001 -> 20000/1001, that means SPS (or selected SPS) have changed. So the SPS driven TFF stuff is probably badly broken as it does not seem to catch these SPS changes. Maybe it would only work if the SPS change is followed by an IDR ?
(In reply to Nicolas Dufresne (ndufresne) from comment #21) > I'm trying to understand what to do. What I see from FFMPEG code (assuming > I'm reading it properly), whenever it starts a new frame, it will first go > over the DPB to find all frame with unpaired fields. When one is found, > there is two venue, pair it with the new frame, otherwise it's a lost field, > and it must be concealed. In the software decoder, they copy the previous > field of the same parity, I don't see any concealment on the VAAPI side > there. > > Is the software FFMPEG decoder method what you had in mind when you say > "cache 2 fields" ? I was thinking to wait for the next field too before making an assumption. But I afraid that this could break the current design of gstreamer-vaapi. Again, it won't work well if there is two immediate field-parity order change in consecutive frames. What we need is some heuristics to find whether the field was missing or not. I mean, how to distinguish field-parity change from "missing the first field" scenario. As you said checking in dpb for the unpaired field could be an option to try.
(In reply to Nicolas Dufresne (ndufresne) from comment #22) > One thing to notice is that h264parse will update the caps 13 times, and > there is 13 parity changes in that stream. Each time, the framerate goes > from 30000/1001 -> 20000/1001, that means SPS (or selected SPS) have > changed. So the SPS driven TFF stuff is probably badly broken as it does not > seem to catch these SPS changes. Maybe it would only work if the SPS change > is followed by an IDR ? SEI Buffering period header can also activate the SPS. May it is missing there? I have seen a lot of SEI headers in the stream, didn't get time to analyze it.
(In reply to sreerenj from comment #24) > SEI Buffering period header can also activate the SPS. May it is missing > there? > I have seen a lot of SEI headers in the stream, didn't get time to analyze > it. Ok, to be looked at. (In reply to sreerenj from comment #23) > (In reply to Nicolas Dufresne (ndufresne) from comment #21) > > I'm trying to understand what to do. What I see from FFMPEG code (assuming > > I'm reading it properly), whenever it starts a new frame, it will first go > > over the DPB to find all frame with unpaired fields. When one is found, > > there is two venue, pair it with the new frame, otherwise it's a lost field, > > and it must be concealed. In the software decoder, they copy the previous > > field of the same parity, I don't see any concealment on the VAAPI side > > there. > > > > Is the software FFMPEG decoder method what you had in mind when you say > > "cache 2 fields" ? > > > I was thinking to wait for the next field too before making an assumption. > But I afraid that this could break the current design of gstreamer-vaapi. > Again, it won't work well if there is two immediate field-parity order > change in consecutive frames. > > What we need is some heuristics to find whether the field was missing or > not. I mean, how to distinguish field-parity change from "missing the first > field" scenario. > > As you said checking in dpb for the unpaired field could be an option to try. What's the semantic for gst_vaapi_decoder_h264_start_frame / gst_vaapi_decoder_h264_end_frame ? Is that really frames ? or it's in hidden pictures/fields ?
I think the logic in find_first_field() is completly flawed. Nothing really holds there. We simply match field with different parity, or fill if same. We should look at the frame_num. If the frame_num is different then current pic, previous pic is a frame and only has one field, it's pretty clear a field was lost. Though, I'd need an explanation of what is the design here. Why can't we just detect missing field like this, and when recurse to process the actual field that triggered that detection ?
(In reply to Nicolas Dufresne (ndufresne) from comment #25) > (In reply to sreerenj from comment #24) > > SEI Buffering period header can also activate the SPS. May it is missing > > there? > > I have seen a lot of SEI headers in the stream, didn't get time to analyze > > it. > > Ok, to be looked at. > > (In reply to sreerenj from comment #23) > > (In reply to Nicolas Dufresne (ndufresne) from comment #21) > > > I'm trying to understand what to do. What I see from FFMPEG code (assuming > > > I'm reading it properly), whenever it starts a new frame, it will first go > > > over the DPB to find all frame with unpaired fields. When one is found, > > > there is two venue, pair it with the new frame, otherwise it's a lost field, > > > and it must be concealed. In the software decoder, they copy the previous > > > field of the same parity, I don't see any concealment on the VAAPI side > > > there. > > > > > > Is the software FFMPEG decoder method what you had in mind when you say > > > "cache 2 fields" ? > > > > > > I was thinking to wait for the next field too before making an assumption. > > But I afraid that this could break the current design of gstreamer-vaapi. > > Again, it won't work well if there is two immediate field-parity order > > change in consecutive frames. > > > > What we need is some heuristics to find whether the field was missing or > > not. I mean, how to distinguish field-parity change from "missing the first > > field" scenario. > > > > As you said checking in dpb for the unpaired field could be an option to try. > > What's the semantic for gst_vaapi_decoder_h264_start_frame / > gst_vaapi_decoder_h264_end_frame ? Is that really frames ? or it's in hidden > pictures/fields ? The boundary checking is based on 7.4.1.2.4. The parsing code detects the frame boundary. Start_frame() get invoked with first data unit(slice) of a field or frame. end_frame() get invoked once we have a frame boundary. Here the frame boundary can be "only a field" in field encoded frames. For eg: in the above stream, all the frames till the 50 th frame are field-packed-frames and only one start_frame invocation for both fields. For the 50 th frame, there are two separate fields, and each one get one start_frame invocation.
(In reply to Nicolas Dufresne (ndufresne) from comment #26) > I think the logic in find_first_field() is completly flawed. No, I disagree :) It was the only way to deal with "first-field" missing case. > Nothing really > holds there. We simply match field with different parity, or fill if same. > We should look at the frame_num. > > If the frame_num is different then current pic, previous pic is a frame and > only has one field, it's pretty clear a field was lost. Though, I'd need an > explanation of what is the design here. Why can't we just detect missing > field like this, and when recurse to process the anctual field that triggered > that detection ? Here I don't understand what you are trying to say, sorry. If you interpreted the frame-num as a monotonically increasing value, then it is wrong :) Usually if (previous_frame_is_ref) next_frame_frame_num_increments I P B P P B B 0 1 2 2 3 4 4
It's flawed, because it does not adhere to the spec. No other decoder I've read seems to do that. It's also flawed because if you have two frames, placed like this t0 b0 t1 b1 and loose b0 and t1, it will gently create one mixed frame t0 b1. The frame num check would solve this (JM does check this). But then we need late concealment, as this decode call will produce two pictures.
I also need to clarify about the frame_num, I'm referring to what goes into the DPB, each frame_num is reference once in the DPB. When we have an unpaired field, it's frame_num should match the interlaced frame currently being decoded. If not, it's a new frame, and the incomplete frame need to be filled and output (it's already in the DPB at this stage). It's all confusing, because frame_start() is sometimes a field and sometimes a frame.
(In reply to Nicolas Dufresne (ndufresne) from comment #29) > It's flawed, because it does not adhere to the spec. No other decoder I've > read seems to do that. It's also flawed because if you have two frames, > placed like this t0 b0 t1 b1 and loose b0 and t1, it will gently create one > mixed frame t0 b1. It loses fields only if the order is "t0 b0 t1 b1" and no dpb reset after b0. copy&pasting the commit message: "Try to identify missing first fields too, thus disregarding any intermediate gaps in frames. We also assume that we keep the same field sequence, i.e. if previous frames were in top-field-first (TFF) order, then so are subsequent frames. Note that insertion of dummy first fields need to operate in two steps: (i) create the original first field that the current field will inherit from, and (ii) submit that field into the DPB prior to initializing the current (other) field POC values but after any reference flag was set. i.e. copy reference flags from the child (other field) to the parent (first field). " > > The frame num check would solve this (JM does check this). But then we need > late concealment, as this decode call will produce two pictures. Of course, Checking dpb for missing fields seems to be a viable option for me. A patch might be more tricky. Waiting for the fix :)
Ok, I wonder how many time I'll need to read this, this code is hard to follow. So reading through find_first_field() correct, before you hack: 0. Get the previous frame (for current view) 1. If the previous frame is missing a field, and current slice is progressive (!slice_hdr->field_pic_flag), fill the gap in previous frame and add it to the DPB. 2. If the previous frame is a frame, and the first field of the previous frame is different parity then this "new frame" slice parity, assume there was a missing picture. 3. If the slice if from the same frame_num as previous frame, then take the previous frame as current to complete it, unless the parity is the same (looks like this case is a bitstream error though), in which case, just finish previous frame and start a new one. 4. I don't fully understand this one, but basically we pick the first field of previous frame, fill_picture_other_field_gap() which adds it to the DPB. That is very odd. If the previous frame first field parity is different then current slice parity, assume there was a missing first field. So basically two 2 implements what I had in mind already. And that's why it mostly work with your hack. But case 4 seems to mess-up a little bit. I'm very surprised we just assume field was missing, if some bitstream is passed twice, it would endup overriding pictures ... Is that normal thing to do ?
Created attachment 372483 [details] [review] h264dec: Remove false assumption about parity order The decoder was trying to detect earlier that a field was lost base on guessing the parity order. This breaks in streams were the parity order changes.
That would be my change, it does not remove gap filling, it simply happens later. I've done couple of test with packet lost also, it seems to survive, and there is not surprising artefact. In 2015 bug report, Gwenole mention doing regression testing, anything public ?
(In reply to Nicolas Dufresne (ndufresne) from comment #32) > Ok, I wonder how many time I'll need to read this, this code is hard to > follow. So reading through find_first_field() correct, before you hack: > > 0. Get the previous frame (for current view) > > 1. If the previous frame is missing a field, and current slice is > progressive (!slice_hdr->field_pic_flag), fill the gap in previous frame and > add it to the DPB. > > 2. If the previous frame is a frame, and the first field of the previous > frame is different parity then this "new frame" slice parity, assume there > was a missing picture. > > 3. If the slice if from the same frame_num as previous frame, then take the > previous frame as current to complete it, unless the parity is the same > (looks like this case is a bitstream error though), in which case, just > finish previous frame and start a new one. > > 4. I don't fully understand this one, but basically we pick the first field > of previous frame, fill_picture_other_field_gap() which adds it to the DPB. > That is very odd. At this point, 1) both previous and current frames are interlaced 2) frame-num value changed. Then it indicates that the previous frame has a missing field Frame-num shouldn't be changing between two fields of a frame (that is my understanding of the spec). >If the previous frame first field parity is different then > current slice parity, assume there was a missing first field. > > So basically two 2 implements what I had in mind already. And that's why it > mostly work with your hack. But case 4 seems to mess-up a little bit. I'm > very surprised we just assume field was missing, if some bitstream is passed > twice, it would endup overriding pictures ... Is that normal thing to do ?
(In reply to Nicolas Dufresne (ndufresne) from comment #34) > That would be my change, it does not remove gap filling, it simply happens > later. Nice! First I thought you just removed first-field-missing case which of course can be done if needed since this is based on an assumption not per spec. But as you said, the filling can happen later. >I've done couple of test with packet lost also, it seems to survive, > and there is not surprising artefact. In 2015 bug report, Gwenole mention > doing regression testing, anything public ? Nothing public. I can ask Intel QA to run one round test with this patch. But it might not reveal the real issues. We may need to ask the Vecima guy (who filed 745048) to run some tests :)
(In reply to sreerenj from comment #36) > Nice! First I thought you just removed first-field-missing case which of > course can be done if needed since this is based on an assumption not per > spec. > But as you said, the filling can happen later. Yes and this time base on the assumption that two fields are within the same frame_num, which I believe is according to the spec this time. > Nothing public. I can ask Intel QA to run one round test with this patch. > But it might not reveal the real issues. We may need to ask the Vecima guy > (who filed 745048) to run some tests :) Yep, they are on it. They will also see if they can make few samples public.
(In reply to Nicolas Dufresne (ndufresne) from comment #37) > (In reply to sreerenj from comment #36) > > Nice! First I thought you just removed first-field-missing case which of > > course can be done if needed since this is based on an assumption not per > > spec. > > But as you said, the filling can happen later. > > Yes and this time base on the assumption that two fields are within the same > frame_num, which I believe is according to the spec this time. > > > Nothing public. I can ask Intel QA to run one round test with this patch. > > But it might not reveal the real issues. We have received Green signal from Intel QA. > >We may need to ask the Vecima guy > > (who filed 745048) to run some tests :) > > Yep, they are on it. They will also see if they can make few samples public.
That's great ! Let me ping Vecima again, it also passed first test stage on their HW. On top of that, I have went over all the samples from FATE (ffmpeg suite), there was no new failures.
Comment on attachment 372405 [details] [review] hack to avoid ghost field insertion marking patch for maintenance.
Comment on attachment 372409 [details] [review] h264dec: Properly set sentinel in ref frame list I would merge this, even when it adds processing.
Comment on attachment 372414 [details] [review] HACK: Implement JM reference decoder DPB Dump mark as reviewed just for patch maintenance.
Review of attachment 372483 [details] [review]: I would add in the commit log that this basically reverts commit 8dd93e9c8, and removes the fill_gaps parameter ::: gst-libs/gst/vaapi/gstvaapidecoder_h264.c @@ -4031,3 @@ static GstVaapiPictureH264 * -find_first_field (GstVaapiDecoderH264 * decoder, GstVaapiParserInfoH264 * pi, - gboolean fill_gaps) I wonder
Review of attachment 372405 [details] [review]: mark as reject thus git bz won't apply it
Review of attachment 372414 [details] [review]: mark as rejected thus git bz won't apply it
Review of attachment 372483 [details] [review]: I would add in the commit log that this basically reverts commit 8dd93e9c8 and removes the fill_gap parameter in find_first_field() since it is always true.
Note that it does not fully revert it, since this commit adds 1 filling based on prediction, and two filling based on observation. I only kept the one base on observation. Though, this is only couple of lines of code left, so most of the patch is reverted. I'll update before pushing.
Attachment 372409 [details] pushed as 076af84 - h264dec: Properly set sentinel in ref frame list Attachment 372483 [details] pushed as 72ea2a5 - h264dec: Remove false assumption about parity order
I don't have the priviledge to add new versions for this project, would it be possible to add 1.15.1 and tag this bug as fixed in that version please ?
Hrm? This is just another GStreamer component. What privileges are you missing? Is there anything needed other than setting the target milestone to 1.15.1 ?
Oops, distraction, I click on the version drop down instead of the target.