After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 796832 - vaapih264dec: Does not always recover from data lost
vaapih264dec: Does not always recover from data lost
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gstreamer-vaapi
unspecified
Other All
: Normal normal
: 1.15.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2018-07-18 17:22 UTC by Nicolas Dufresne (ndufresne)
Modified: 2018-08-01 23:49 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
h264decoder: Don't scan empty buffer (1.18 KB, patch)
2018-07-18 17:22 UTC, Nicolas Dufresne (ndufresne)
committed Details | Review
h264decoder: Fail decoding slice if modification process failed (4.46 KB, patch)
2018-07-18 17:22 UTC, Nicolas Dufresne (ndufresne)
committed Details | Review
vaapidecoder: Don't error out on decode errors (875 bytes, patch)
2018-07-24 16:49 UTC, Nicolas Dufresne (ndufresne)
committed Details | Review

Description Nicolas Dufresne (ndufresne) 2018-07-18 17:22:30 UTC
I have hit some case were we have data lost and corruption which would
lead to a slice decode error (slices from two frames got mixed up I think).
the problem is that the decoder just never recover from this bad state. In
the situation, the DPB get's propulated with a bad frame and I believe this
causes the following decodes to fail. And then we are stuck in that state
forever.

When this case was reached, we'd first get the error:
  "found no short-term reference picture with PicNum ..."

I notice that in this case, we send to the accelerator a broken (or even
empty) reference list. This seems like a bad idea, it also puts a lot of
pressure on the accelerator to do the right thing. So the following patch
simply chain up that error and fails the affected slice decoding. And that
allows the decoder to recover from this error. I'll try and extract a sample
to reproduce this soon. The stream is interlaced.
Comment 1 Nicolas Dufresne (ndufresne) 2018-07-18 17:22:35 UTC
Created attachment 373088 [details] [review]
h264decoder: Don't scan empty buffer

gst_adapter_masked_scan_uint32_peek() asserts if size is 0. Don't
try and scan in that case. This fixes assertion that would some times
happen when the stream is corrupted.
Comment 2 Nicolas Dufresne (ndufresne) 2018-07-18 17:22:39 UTC
Created attachment 373089 [details] [review]
h264decoder: Fail decoding slice if modification process failed

This patch chains up failure to executing the modification process. The
end result is that we now fail decoding the slice if this process fails.
This avoid sending a corrupted state to the accelerator. In some special
cases, this could lead to unrecoverable errors.
Comment 3 Víctor Manuel Jáquez Leal 2018-07-19 09:24:10 UTC
Review of attachment 373088 [details] [review]:

::: gst-libs/gst/vaapi/gstvaapidecoder_h264.c
@@ +4482,3 @@
     if (priv->stream_alignment == GST_VAAPI_STREAM_ALIGN_H264_NALU) {
       buf_size = size;
+      if (size > 4)

I guess the patch would be simpler if, above, in line 4479 we change

-    if (size < 4)
+    if (size =< 4) 
       return GST_VAAPI_DECODER_STATUS_ERROR_NO_DATA

which works for both stream alignments
Comment 4 Víctor Manuel Jáquez Leal 2018-07-19 09:26:46 UTC
Review of attachment 373089 [details] [review]:

lgtm
Comment 5 Nicolas Dufresne (ndufresne) 2018-07-19 12:45:51 UTC
Ok, I'll try and simplify the first patch. Thanks for the review.
Comment 6 Nicolas Dufresne (ndufresne) 2018-07-19 19:31:42 UTC
The suggested change does not work.
Comment 7 Nicolas Dufresne (ndufresne) 2018-07-19 20:50:21 UTC
Attachment 373088 [details] pushed as 927536b - h264decoder: Don't scan empty buffer
Attachment 373089 [details] pushed as 7c365bd - h264decoder: Fail decoding slice if modification process failed
Comment 8 Nicolas Dufresne (ndufresne) 2018-07-19 20:53:38 UTC
I've just added a size == 0 check in the scan wrapper method, it covers both alignments, both could send 0 to the adapter and hit the assert, though -1 in this case was the expected offset.
Comment 9 sreerenj 2018-07-20 00:58:38 UTC
(In reply to Víctor Manuel Jáquez Leal from comment #3)
> Review of attachment 373088 [details] [review] [review]:
> 
> ::: gst-libs/gst/vaapi/gstvaapidecoder_h264.c
> @@ +4482,3 @@
>      if (priv->stream_alignment == GST_VAAPI_STREAM_ALIGN_H264_NALU) {
>        buf_size = size;
> +      if (size > 4)
> 
> I guess the patch would be simpler if, above, in line 4479 we change
> 
> -    if (size < 4)
> +    if (size =< 4) 
>        return GST_VAAPI_DECODER_STATUS_ERROR_NO_DATA
> 

Correction: It is legal to have nal units of size 4, eg: end of stream and end of sequece nals.

> which works for both stream alignments
Comment 10 Matteo Valdina 2018-07-24 03:58:09 UTC
Hi,
I'm facing a similar issue where we lost the SPS and/or PPS from the stream. 

In this case, the vaapih264dec (and also h264parse) will be stuck in an endless loop for complaining of the missing SPS/PPS.

Another note of this stream is that the picture id is incrementing each SPS/PPS.

I was trying to discard the NALU until we receive a new SPS/PPS and the stream can resume decoding but I don't have a good idea about how to do that and if it should be the parser or vaapidecode to discard any data.

Do you need a new bug report or I can report in this one?

Media available here: https://drive.google.com/file/d/1kU5UD_ksCa0IPIb2eagWfPPcNwY84MH2/view?usp=sharing

I use this command line
GST_DEBUG=codec*:5,vaapi:5 gst-launch-1.0 filesrc location=stream1.pcap ! pcapparse dst-port=6002 !  "application/x-rtp, payload=127, media=video, clock-rate=90000, encoding-name=H264" ! rtph264depay ! vaapidecodebin ! vaapipostproc ! waylandsink

In the uploaded stream I removed the SPS/PPS for the picture set id = 5 but after a couple of seconds, the decoder will receive picture set id 6 and 7.

Best
Matteo
Comment 11 Nicolas Dufresne (ndufresne) 2018-07-24 11:17:03 UTC
Can you file this in it's own bug please ? Btw, you should add a jitterbuffer when you test with pcap. I can see the it runs but spit an error instead of broken frames.
Comment 12 Nicolas Dufresne (ndufresne) 2018-07-24 11:44:58 UTC
Btw, if I replace all GST_VIDEO_DECODER_ERROR() by ret = GST_FLOW_OK, then it skips to the first header + IDR. The vaapi decoder does not implement error concealment at all, so it seems like the best it could do atm. I'm not fan of GST_VIDEO_DECODER_ERROR(), specially in live cases, would need to look it up.
Comment 13 Matteo Valdina 2018-07-24 13:09:26 UTC
Thanks, I'll give atry to your suggestion.

I reported this bug https://bugzilla.gnome.org/show_bug.cgi?id=796863
Comment 14 Nicolas Dufresne (ndufresne) 2018-07-24 16:35:24 UTC
So this change did cause a regression, so re-opening.
Comment 15 Nicolas Dufresne (ndufresne) 2018-07-24 16:49:43 UTC
Created attachment 373148 [details] [review]
vaapidecoder: Don't error out on decode errors

This is problematic on live pipeline where loosing network can
cause an important amount of errors.

This is a plausible solution, comments are welcome !
Comment 16 Víctor Manuel Jáquez Leal 2018-07-25 10:58:07 UTC
Review of attachment 373148 [details] [review]:

::: gst/vaapi/gstvaapidecode.c
@@ +1089,3 @@
 
+  /* Disable errors on decode errors */
+  gst_video_decoder_set_max_errors (vdec, -1);

I don't know. It doesn't feel right.

What options do we have?

1. replacing those wrong GST_VIDEO_DECODER_ERROR() with other way to handling the error
2. disable the errors only if upstream is live
Comment 17 Matteo Valdina 2018-07-25 12:21:25 UTC
Another option, could be bubble up this configuration in the element property? 

As a first step, before a more robust GST_VIDEO_DECODER_ERROR handling.

Detecting if a stream is live or not could be confusing.

For example, if I have a network stream that contains some error.
I could have two different behavior if I pass the stream through an udpsrc (rtpbin) or filesrc (pcapparse).

If you want to save the stream from the network and make further analysis/test offline you could be able to do that. Because in the second case the decoder will give up when for the first will continue the decoding.
Comment 18 Nicolas Dufresne (ndufresne) 2018-07-25 12:50:05 UTC
Right now we get these error because we no longer send NULL pointers to the accelerator to avoid undefined behaviour. That's the lack of concealment, which would require a reliable way to clone a close-by image or to create grey initialized images in case there is none (like this case). That being said, this is a lot of work, and I won't adventure that way.

I've suggested this way, because I do think it's important to trace these errors, but I see no use of sending a pipeline error (even when not live). All it will do is make a bad situation worst.

So I would says if this isn't the way, then we should go for 1 and drop these macro, just like avdec.
Comment 19 Nicolas Dufresne (ndufresne) 2018-08-01 23:49:47 UTC
Attachment 373148 [details] pushed as d07bffb - vaapidecoder: Don't error out on decode errors