GNOME Bugzilla – Bug 796863
vaapih264dec: H264 parse stops if missing an SPS/PPS
Last modified: 2018-08-22 01:40:41 UTC
I'm facing an issue where we lost the SPS and/or PPS from the stream. In this case, the vaapih264dec (and also h264parse) will be stuck and will not progress in the stream. Also after receiving a valid SPS/PPS. Another note of this stream is that the picture id is incrementing each SPS/PPS. I was trying to discard the NALU until we receive a new SPS/PPS and the stream can resume decoding but I don't have a good idea about how to do that and if it should be the parser or vaapidecode to discard any data. Media available here: https://drive.google.com/file/d/1kU5UD_ksCa0IPIb2eagWfPPcNwY84MH2/view?usp=sharing I use this command line GST_DEBUG=codec*:5,vaapi:5 gst-launch-1.0 filesrc location=stream1.pcap ! pcapparse dst-port=6002 ! "application/x-rtp, payload=127, media=video, clock-rate=90000, encoding-name=H264" ! rtph264depay ! vaapidecodebin ! vaapipostproc ! waylandsink In the uploaded stream I removed the SPS/PPS for the picture set id = 5 but after a couple of seconds, the decoder will receive picture set id 6 and 7. Best Matteo See also: https://bugzilla.gnome.org/show_bug.cgi?id=796832
Just a little more context, there is no concealment in VAAPI, we were passing NULL frames to the codec until recently, which would randomly fails or harm the accelerator. But now, in the current scenario, the decoder will fail for over 10 repeat, and as this decoder uses GST_VIDEO_DECODER_ERROR() error, it will stop the pipeline on GST_VIDEO_DECODER_MAX_ERRORS (or whatever is configured). I consider this API in GstVideoDecode harmful for live pipeline, which could take several hundred frames to recover.
I' using GStreamer 1.14.0 and I don't see what you describe. I see that the slice_parse will continue to try to parse the same NALU each time and it does indefinitely. I suggest skipping the NALU that doesn't have the SPS/PPS until a new set SPS/PPS is available and next NALU are referring to this set. It will be useful to raise a custom event (like GstUDPPacketLoss) that inform the pipeline of this error. In this case, the application can request a new I-frame and try to recover from this situation.
(In reply to Matteo Valdina from comment #2) > I' using GStreamer 1.14.0 and I don't see what you describe. > > I see that the slice_parse will continue to try to parse the same NALU each > time and it does indefinitely. Sure, this was fixed in master only, see bug #796832. But after this fix, there will be over 10 decoder errors, and the decoder will give up, which is bad for live pipeline. That's what I want to address here, we should ditch this bad API and just try forever. > > I suggest skipping the NALU that doesn't have the SPS/PPS until a new set > SPS/PPS is available and next NALU are referring to this set. We can't, because in some application, movement is better then correct images. For this purpose, FFMPEG, which has a very high end error concealment, manages to produce frames in the first part, creating movement on a gray background. So if we add this filtering to h264parse, this behaviour won't be selectable (see output-corrupt knob on avdec_h264). > > It will be useful to raise a custom event (like GstUDPPacketLoss) that > inform the pipeline of this error. In this case, the application can request > a new I-frame and try to recover from this situation. That's all implemented already. But it's decoder driven, the VAAPI decoder is not there yet. When the decoder detects corrupted frames, it sends a custom event upstream, which get wrapped into the RTCP feedback (if enabled). So that the sender can send a new IDR (with headers if not out-of-band). This is common in RTP, since we often only produce keyframes when requested.
> We can't, because in some application, movement is better then correct > images. For this purpose, FFMPEG, which has a very high end error > concealment, manages to produce frames in the first part, creating movement > on a gray background. So if we add this filtering to h264parse, this > behaviour won't be selectable (see output-corrupt knob on avdec_h264). > From my understanding of H.264, in the case of missing SPS/PPS you cannot decode any frame that depends by these SPS/PPS. There will be missing crucial information for the decoding process. I compared the behavior with avdec_h264 and this element skip the frame. 0:00:21.894119003 11063 0x7fea2c193c40 ERROR libav :0:: non-existing PPS 5 referenced 0:00:21.894476865 11063 0x7fea2c193c40 ERROR libav :0:: decode_slice_header error 0:00:21.894522779 11063 0x7fea2c193c40 ERROR libav :0:: no frame! 0:00:21.894711233 11063 0x558c54836a30 WARN libav gstavviddec.c:1747:gst_ffmpegviddec_frame:<avdec_h264-0> avdec_h264: decoding error (len: -1094995529, have_data: 0) 0:00:21.895379627 11063 0x7fea0016c590 ERROR libav :0:: non-existing PPS 5 referenced 0:00:21.895600328 11063 0x7fea0016c590 ERROR libav :0:: decode_slice_header error 0:00:21.895649853 11063 0x7fea0016c590 ERROR libav :0:: no frame! 0:00:21.895722664 11063 0x558c54836a30 WARN libav gstavviddec.c:1747:gst_ffmpegviddec_frame:<avdec_h264-0> avdec_h264: decoding error (len: -1094995529, have_data: 0) 0:00:21.896326457 11063 0x7fea10042d20 ERROR libav :0:: non-existing PPS 5 referenced 0:00:21.896486225 11063 0x7fea10042d20 ERROR libav :0:: decode_slice_header error 0:00:21.896527583 11063 0x7fea10042d20 ERROR libav :0:: no frame! 0:00:21.896584050 11063 0x558c54836a30 WARN libav gstavviddec.c:1747:gst_ffmpegviddec_frame:<avdec_h264-0> avdec_h264: decoding error (len: -1094995529, have_data: 0) ...
Sorry, but the PPS is not missing in that stream, only the IDR. 0:00:00.230633151 27417 0x56223729a230 DEBUG codecparsers_h264 gsth264parser.c:242:gst_h264_parse_nalu_header: Nal type 7, ref_idc 3 0:00:00.230686956 27417 0x56223729a230 DEBUG codecparsers_h264 gsth264parser.c:1749:gst_h264_parse_sps: parsing SPS 0:00:00.230716363 27417 0x56223729a230 DEBUG codecparsers_h264 gsth264parser.c:416:gst_h264_parse_vui_parameters: parsing "VUI Parameters" 0:00:00.230730428 27417 0x56223729a230 LOG codecparsers_h264 gsth264parser.c:1596:gst_h264_parse_sps_data: initial width=640, height=368 0:00:00.230743207 27417 0x56223729a230 LOG codecparsers_h264 gsth264parser.c:1621:gst_h264_parse_sps_data: crop_rectangle x=0 y=0 width=640, height=360 0:00:00.230752786 27417 0x56223729a230 DEBUG codecparsers_h264 gsth264parser.c:1475:gst_h264_parser_parse_sps: adding sequence parameter set with id: 0 to array 0:00:00.230767807 27417 0x56223729a230 DEBUG codecparsers_h264 gsth264parser.c:242:gst_h264_parse_nalu_header: Nal type 8, ref_idc 3 0:00:00.230776550 27417 0x56223729a230 DEBUG codecparsers_h264 gsth264parser.c:1885:gst_h264_parse_pps: parsing PPS 0:00:00.230787916 27417 0x56223729a230 DEBUG codecparsers_h264 gsth264parser.c:2017:gst_h264_parser_parse_pps: adding picture parameter set with id: 0 to array 0:00:00.231037342 27417 0x56223729a230 DEBUG codecparsers_h264 gsth264parser.c:242:gst_h264_parse_nalu_header: Nal type 1, ref_idc 3 0:00:00.231060572 27417 0x56223729a230 DEBUG codecparsers_h264 gsth264parser.c:2079:gst_h264_parser_parse_slice_hdr: parsing "Slice header", slice type 0
At the beginning is missing the IDR but at 21 seconds of the stream (or RTP sequence number 37141 is missing), It's missing the SPS/PPS.
I think there is a little misunderstanding due to my test are on the 1.14.0 and your test I assume on the current master. Sorry for not clarifying which version I was using. I checkout the latest master and I the behavior is drastically different due to the https://bugzilla.gnome.org/show_bug.cgi?id=796832 . With master, I'm not able to start the decoding when with the release 1.14.0 I'm able to decode until the 21th seconds. My issue is after the 21 seconds where it is only missing the packet with SPS/PPS.
Sure, I have posted plausible fix for the regression in master, which brings back to the same situation. Now, an SPS is missing at 21s (no idea why image breaks at 13 for me, but it's later). What do you expect should happen next that is not happening ?
(just a side note, GstForceKeyUnit is implemented by RTP payloader/depayloader, so it's decoder agnostic)
Thanks for the note about GstForceKeyUnit, I'll give a look. What I saw that the decode image freeze and the report these logs line couldn't find associated picture parameter set with id: 5 parse error -1 ... ... couldn't find associated picture parameter set with id: 5 parse error -1 Adding some logs to the gsth264parse and gstvaapidecoder I found that it continues to decode the same NALU until the stream ended codecparsers_h264 gsth264parser.c:2073:gst_h264_parser_parse_slice_hdr: Parsing NALU 0x7f60c009f6f5 8576 This error condition continue beside that the stream contains new SPS/PPS and new IDR refering these SPS/PPS. Thanks
I observed that If I replace the vaapih264dec with h264parse ! fakesink. The h264 parse is not stuck in the nalu but discard the NALU where is missing the SPS/PPS and resume correctly with the newer SPS/PPS.
Yeah, this specific issue seems like a missing jump on the offset in order to skip the broken part.
Thanks, Could you provide any suggestion where/how to fix this.
I tried couple of things, but didn't figure out how to get that one working yet.
Created attachment 373162 [details] [review] WIP: vaapidecode: Skip unparsable units from adapter If the unit could not be parsed, just skip this nal and keep parsing what is left in the adapter. We need to flush the broken unit in the decoder specific parser because the generic code does not know about units boundary. This increases error resilliance. Before this, the broken unit would stay in the adapter and EOS would be returned. Which stopped the streaming. Just removing the EOS would have lead to the adapter size growing indefinitely.
Might be your lucky day, this is exactly what I had in mind. This patch is WIP because we need to visit every single decoder to do the same, because it's not possible to know the boundary of the unit (NAL) from the outside.
Note, it would be nice to try and see why we reach this condition twice per broken unit: codecparsers_h264 gsth264parser.c:2086:gst_h264_parser_parse_slice_hdr: couldn't find associated picture parameter set with id: 5 Normally that would mean broken parsing, but slice parsing is only happening in VAAPI code, not in h264parse, so this isn't normal. If we ignore this, you'll see that we hit this condition 9 times, exactly like FFMPEG.
I'll give a try today and I'll give a look to the double "couldn't find associated .." Thanks a lot for the patch :-) I appreciated that.
Sorry, we do call gst_h264_parser_parse_slice_hdr() from h264parse element, so the double trace is caused by the known double parsing. A different subject/issue.
Created attachment 373164 [details] [review] vaapidecode: Skip unparsable units from adapter If the unit could not be parsed, just skip this nal and keep parsing what is left in the adapter. We need to flush the broken unit in the decoder specific parser because the generic code does not know about units boundary. This increases error resilliance. Before this, the broken unit would stay in the adapter and EOS would be returned. Which stopped the streaming. Just removing the EOS would have lead to the adapter size growing indefinitely.
Created attachment 373165 [details] [review] WIP: decoder_h264: Maybe me leaking a picture This is just a place holder commit, until I figure-out what we are supposed to do. The picture is not set as last picture, and it's also no unref.
I went over the other decoder and only h265 had the same issue in the parser. I've added this second commit, but it's just to underline a problem, I have no idea what was the intention, should the skip picture be set as prev_pi or unreffed, in which case setting flags make no sense. But I could be missing something.
Thanks, I test in my real scenario. It is working but I need to put an h264parse before the decoder otherwise it continues to fail to find the SPS/PPS. This time fails to find the latest SPS/PPS before was always the same. I probably need to do more testing and better understanding what is happening. (I tested backporting the patch to 1.14.0) Thanks for the patch.
Works for me on master branch without h264parse.
Because it started to be confusing, I have patches from multiple bugs, you can access by work branch over here: https://gitlab.collabora.com/nicolas/gstreamer-vaapi/tree/unrecoverrable-error
Thanks for sharing the branch.
Comment on attachment 373165 [details] [review] WIP: decoder_h264: Maybe me leaking a picture The ref has been given to the unit, so there is no leak here.
Attachment 373164 [details] pushed as 06aa82f - vaapidecode: Skip unparsable units from adapter
Thanks, I extended your original patch with a request of GstForceKey (upstream) in case of parse and decode errors. I think this is reasonable and I saw other decoders (openh264) do the same. Do you agree?
Here the idea: diff --git a/gst/vaapi/gstvaapidecode.c b/gst/vaapi/gstvaapidecode.c index d11b8032..53d08a6b 100644 --- a/gst/vaapi/gstvaapidecode.c +++ b/gst/vaapi/gstvaapidecode.c @@ -764,6 +764,13 @@ error_decode: ("Decode error %d", status), ret); break; } + + if (ret != GST_FLOW_NOT_SUPPORTED) { + GST_DEBUG ("Requesting a key unit"); + gst_pad_push_event (GST_VIDEO_DECODER_SINK_PAD (decode), + gst_video_event_new_upstream_force_key_unit (GST_CLOCK_TIME_NONE, + FALSE, 0)); + } gst_video_decoder_drop_frame (vdec, frame); return ret; } @@ -1176,8 +1183,13 @@ gst_vaapidecode_parse_frame (GstVideoDecoder * vdec, break; default: GST_ERROR ("parse error %d", status); - ret = GST_FLOW_EOS; + /* just keep parsing, the decoder should have flushed the broken unit */ + ret = GST_VAAPI_DECODE_FLOW_PARSE_DATA; decode->current_frame_size = 0; + GST_DEBUG ("Requesting a key unit"); + gst_pad_push_event (GST_VIDEO_DECODER_SINK_PAD (decode), + gst_video_event_new_upstream_force_key_unit (GST_CLOCK_TIME_NONE, + FALSE, 0)); break; } return ret;
Looks reasonable, can you make a git commit and git format-patch, and then attach ?
yes, for sure.
Created attachment 373248 [details] [review] Requests GstForceKeyUnit to the upstream in case of parse/decode error. Requests GstForceKeyUnit to the upstream in case of parse/decode error. In the case of decode error, there is a change that affects other cases. The "GstFlowret value was never initialized in some condition and so it was undefined. I saw by experience that it was 0 (GST_FLOW_OK) but this is a random behavior and tied to the specific compiler/cpu/kernel and other factors. I added a default value that it makes sense for my understanding of this codebase. But I'm probably wrong. At least I think it is better because it is a defined behavior. I saw that there are other functions that have this issue like gst_vaapidecode_push_all_decoded_frames where "ret" is returned in the default switch with the un-initialized value.
@Matteo, could file another bug with your patch? So we could keep track of it. Also, this affects all the codecs, not only h264
Yes, no problem. You can find the new issue here https://bugzilla.gnome.org/show_bug.cgi?id=797006 Thanks Matteo