GNOME Bugzilla – Bug 796832
vaapih264dec: Does not always recover from data lost
Last modified: 2018-08-01 23:49:51 UTC
I have hit some case were we have data lost and corruption which would lead to a slice decode error (slices from two frames got mixed up I think). the problem is that the decoder just never recover from this bad state. In the situation, the DPB get's propulated with a bad frame and I believe this causes the following decodes to fail. And then we are stuck in that state forever. When this case was reached, we'd first get the error: "found no short-term reference picture with PicNum ..." I notice that in this case, we send to the accelerator a broken (or even empty) reference list. This seems like a bad idea, it also puts a lot of pressure on the accelerator to do the right thing. So the following patch simply chain up that error and fails the affected slice decoding. And that allows the decoder to recover from this error. I'll try and extract a sample to reproduce this soon. The stream is interlaced.
Created attachment 373088 [details] [review] h264decoder: Don't scan empty buffer gst_adapter_masked_scan_uint32_peek() asserts if size is 0. Don't try and scan in that case. This fixes assertion that would some times happen when the stream is corrupted.
Created attachment 373089 [details] [review] h264decoder: Fail decoding slice if modification process failed This patch chains up failure to executing the modification process. The end result is that we now fail decoding the slice if this process fails. This avoid sending a corrupted state to the accelerator. In some special cases, this could lead to unrecoverable errors.
Review of attachment 373088 [details] [review]: ::: gst-libs/gst/vaapi/gstvaapidecoder_h264.c @@ +4482,3 @@ if (priv->stream_alignment == GST_VAAPI_STREAM_ALIGN_H264_NALU) { buf_size = size; + if (size > 4) I guess the patch would be simpler if, above, in line 4479 we change - if (size < 4) + if (size =< 4) return GST_VAAPI_DECODER_STATUS_ERROR_NO_DATA which works for both stream alignments
Review of attachment 373089 [details] [review]: lgtm
Ok, I'll try and simplify the first patch. Thanks for the review.
The suggested change does not work.
Attachment 373088 [details] pushed as 927536b - h264decoder: Don't scan empty buffer Attachment 373089 [details] pushed as 7c365bd - h264decoder: Fail decoding slice if modification process failed
I've just added a size == 0 check in the scan wrapper method, it covers both alignments, both could send 0 to the adapter and hit the assert, though -1 in this case was the expected offset.
(In reply to Víctor Manuel Jáquez Leal from comment #3) > Review of attachment 373088 [details] [review] [review]: > > ::: gst-libs/gst/vaapi/gstvaapidecoder_h264.c > @@ +4482,3 @@ > if (priv->stream_alignment == GST_VAAPI_STREAM_ALIGN_H264_NALU) { > buf_size = size; > + if (size > 4) > > I guess the patch would be simpler if, above, in line 4479 we change > > - if (size < 4) > + if (size =< 4) > return GST_VAAPI_DECODER_STATUS_ERROR_NO_DATA > Correction: It is legal to have nal units of size 4, eg: end of stream and end of sequece nals. > which works for both stream alignments
Hi, I'm facing a similar issue where we lost the SPS and/or PPS from the stream. In this case, the vaapih264dec (and also h264parse) will be stuck in an endless loop for complaining of the missing SPS/PPS. Another note of this stream is that the picture id is incrementing each SPS/PPS. I was trying to discard the NALU until we receive a new SPS/PPS and the stream can resume decoding but I don't have a good idea about how to do that and if it should be the parser or vaapidecode to discard any data. Do you need a new bug report or I can report in this one? Media available here: https://drive.google.com/file/d/1kU5UD_ksCa0IPIb2eagWfPPcNwY84MH2/view?usp=sharing I use this command line GST_DEBUG=codec*:5,vaapi:5 gst-launch-1.0 filesrc location=stream1.pcap ! pcapparse dst-port=6002 ! "application/x-rtp, payload=127, media=video, clock-rate=90000, encoding-name=H264" ! rtph264depay ! vaapidecodebin ! vaapipostproc ! waylandsink In the uploaded stream I removed the SPS/PPS for the picture set id = 5 but after a couple of seconds, the decoder will receive picture set id 6 and 7. Best Matteo
Can you file this in it's own bug please ? Btw, you should add a jitterbuffer when you test with pcap. I can see the it runs but spit an error instead of broken frames.
Btw, if I replace all GST_VIDEO_DECODER_ERROR() by ret = GST_FLOW_OK, then it skips to the first header + IDR. The vaapi decoder does not implement error concealment at all, so it seems like the best it could do atm. I'm not fan of GST_VIDEO_DECODER_ERROR(), specially in live cases, would need to look it up.
Thanks, I'll give atry to your suggestion. I reported this bug https://bugzilla.gnome.org/show_bug.cgi?id=796863
So this change did cause a regression, so re-opening.
Created attachment 373148 [details] [review] vaapidecoder: Don't error out on decode errors This is problematic on live pipeline where loosing network can cause an important amount of errors. This is a plausible solution, comments are welcome !
Review of attachment 373148 [details] [review]: ::: gst/vaapi/gstvaapidecode.c @@ +1089,3 @@ + /* Disable errors on decode errors */ + gst_video_decoder_set_max_errors (vdec, -1); I don't know. It doesn't feel right. What options do we have? 1. replacing those wrong GST_VIDEO_DECODER_ERROR() with other way to handling the error 2. disable the errors only if upstream is live
Another option, could be bubble up this configuration in the element property? As a first step, before a more robust GST_VIDEO_DECODER_ERROR handling. Detecting if a stream is live or not could be confusing. For example, if I have a network stream that contains some error. I could have two different behavior if I pass the stream through an udpsrc (rtpbin) or filesrc (pcapparse). If you want to save the stream from the network and make further analysis/test offline you could be able to do that. Because in the second case the decoder will give up when for the first will continue the decoding.
Right now we get these error because we no longer send NULL pointers to the accelerator to avoid undefined behaviour. That's the lack of concealment, which would require a reliable way to clone a close-by image or to create grey initialized images in case there is none (like this case). That being said, this is a lot of work, and I won't adventure that way. I've suggested this way, because I do think it's important to trace these errors, but I see no use of sending a pipeline error (even when not live). All it will do is make a bad situation worst. So I would says if this isn't the way, then we should go for 1 and drop these macro, just like avdec.
Attachment 373148 [details] pushed as d07bffb - vaapidecoder: Don't error out on decode errors