After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 796863 - vaapih264dec: H264 parse stops if missing an SPS/PPS
vaapih264dec: H264 parse stops if missing an SPS/PPS
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gstreamer-vaapi
1.14.0
Other Linux
: Normal normal
: 1.15.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2018-07-24 13:07 UTC by Matteo Valdina
Modified: 2018-08-22 01:40 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
WIP: vaapidecode: Skip unparsable units from adapter (1.82 KB, patch)
2018-07-25 18:00 UTC, Nicolas Dufresne (ndufresne)
none Details | Review
vaapidecode: Skip unparsable units from adapter (2.33 KB, patch)
2018-07-25 19:55 UTC, Nicolas Dufresne (ndufresne)
committed Details | Review
WIP: decoder_h264: Maybe me leaking a picture (1.11 KB, patch)
2018-07-25 19:55 UTC, Nicolas Dufresne (ndufresne)
rejected Details | Review
Requests GstForceKeyUnit to the upstream in case of parse/decode error. (1.83 KB, patch)
2018-08-03 00:44 UTC, Matteo Valdina
none Details | Review

Description Matteo Valdina 2018-07-24 13:07:59 UTC
I'm facing an issue where we lost the SPS and/or PPS from the stream. 

In this case, the vaapih264dec (and also h264parse) will be stuck and will not progress in the stream. Also after receiving a valid SPS/PPS.
Another note of this stream is that the picture id is incrementing each SPS/PPS.

I was trying to discard the NALU until we receive a new SPS/PPS and the stream can resume decoding but I don't have a good idea about how to do that and if it should be the parser or vaapidecode to discard any data.

Media available here: https://drive.google.com/file/d/1kU5UD_ksCa0IPIb2eagWfPPcNwY84MH2/view?usp=sharing

I use this command line
GST_DEBUG=codec*:5,vaapi:5 gst-launch-1.0 filesrc location=stream1.pcap ! pcapparse dst-port=6002 !  "application/x-rtp, payload=127, media=video, clock-rate=90000, encoding-name=H264" ! rtph264depay ! vaapidecodebin ! vaapipostproc ! waylandsink

In the uploaded stream I removed the SPS/PPS for the picture set id = 5 but after a couple of seconds, the decoder will receive picture set id 6 and 7.

Best
Matteo

See also: https://bugzilla.gnome.org/show_bug.cgi?id=796832
Comment 1 Nicolas Dufresne (ndufresne) 2018-07-24 13:34:30 UTC
Just a little more context, there is no concealment in VAAPI, we were passing NULL frames to the codec until recently, which would randomly fails or harm the accelerator. But now, in the current scenario, the decoder will fail for over 10 repeat, and as this decoder uses GST_VIDEO_DECODER_ERROR() error, it will stop the pipeline on GST_VIDEO_DECODER_MAX_ERRORS (or whatever is configured). I consider this API in GstVideoDecode harmful for live pipeline, which could take several hundred frames to recover.
Comment 2 Matteo Valdina 2018-07-24 15:28:36 UTC
I' using GStreamer 1.14.0 and I don't see what you describe.

I see that the slice_parse will continue to try to parse the same NALU each time and it does indefinitely.

I suggest skipping the NALU that doesn't have the SPS/PPS until a new set SPS/PPS is available and next NALU are referring to this set.

It will be useful to raise a custom event (like GstUDPPacketLoss) that inform the pipeline of this error. In this case, the application can request a new I-frame and try to recover from this situation.
Comment 3 Nicolas Dufresne (ndufresne) 2018-07-24 15:54:21 UTC
(In reply to Matteo Valdina from comment #2)
> I' using GStreamer 1.14.0 and I don't see what you describe.
> 
> I see that the slice_parse will continue to try to parse the same NALU each
> time and it does indefinitely.

Sure, this was fixed in master only, see bug #796832. But after this fix, there will be over 10 decoder errors, and the decoder will give up, which is bad for live pipeline. That's what I want to address here, we should ditch this bad API and just try forever.

> 
> I suggest skipping the NALU that doesn't have the SPS/PPS until a new set
> SPS/PPS is available and next NALU are referring to this set.

We can't, because in some application, movement is better then correct images. For this purpose, FFMPEG, which has a very high end error concealment, manages to produce frames in the first part, creating movement on a gray background. So if we add this filtering to h264parse, this behaviour won't be selectable (see output-corrupt knob on avdec_h264).

> 
> It will be useful to raise a custom event (like GstUDPPacketLoss) that
> inform the pipeline of this error. In this case, the application can request
> a new I-frame and try to recover from this situation.

That's all implemented already. But it's decoder driven, the VAAPI decoder is not there yet. When the decoder detects corrupted frames, it sends a custom event upstream, which get wrapped into the RTCP feedback (if enabled). So that the sender can send a new IDR (with headers if not out-of-band). This is common in RTP, since we often only produce keyframes when requested.
Comment 4 Matteo Valdina 2018-07-24 16:13:07 UTC
> We can't, because in some application, movement is better then correct
> images. For this purpose, FFMPEG, which has a very high end error
> concealment, manages to produce frames in the first part, creating movement
> on a gray background. So if we add this filtering to h264parse, this
> behaviour won't be selectable (see output-corrupt knob on avdec_h264).
>

 
From my understanding of H.264, in the case of missing SPS/PPS you cannot decode any frame that depends by these SPS/PPS. There will be missing crucial information for the decoding process.

I compared the behavior with avdec_h264 and this element skip the frame.

0:00:21.894119003 11063 0x7fea2c193c40 ERROR                  libav :0:: non-existing PPS 5 referenced
0:00:21.894476865 11063 0x7fea2c193c40 ERROR                  libav :0:: decode_slice_header error
0:00:21.894522779 11063 0x7fea2c193c40 ERROR                  libav :0:: no frame!
0:00:21.894711233 11063 0x558c54836a30 WARN                   libav gstavviddec.c:1747:gst_ffmpegviddec_frame:<avdec_h264-0> avdec_h264: decoding error (len: -1094995529, have_data: 0)
0:00:21.895379627 11063 0x7fea0016c590 ERROR                  libav :0:: non-existing PPS 5 referenced
0:00:21.895600328 11063 0x7fea0016c590 ERROR                  libav :0:: decode_slice_header error
0:00:21.895649853 11063 0x7fea0016c590 ERROR                  libav :0:: no frame!
0:00:21.895722664 11063 0x558c54836a30 WARN                   libav gstavviddec.c:1747:gst_ffmpegviddec_frame:<avdec_h264-0> avdec_h264: decoding error (len: -1094995529, have_data: 0)
0:00:21.896326457 11063 0x7fea10042d20 ERROR                  libav :0:: non-existing PPS 5 referenced
0:00:21.896486225 11063 0x7fea10042d20 ERROR                  libav :0:: decode_slice_header error
0:00:21.896527583 11063 0x7fea10042d20 ERROR                  libav :0:: no frame!
0:00:21.896584050 11063 0x558c54836a30 WARN                   libav gstavviddec.c:1747:gst_ffmpegviddec_frame:<avdec_h264-0> avdec_h264: decoding error (len: -1094995529, have_data: 0)
...
Comment 5 Nicolas Dufresne (ndufresne) 2018-07-24 16:23:18 UTC
Sorry, but the PPS is not missing in that stream, only the IDR.

0:00:00.230633151 27417 0x56223729a230 DEBUG      codecparsers_h264 gsth264parser.c:242:gst_h264_parse_nalu_header: Nal type 7, ref_idc 3
0:00:00.230686956 27417 0x56223729a230 DEBUG      codecparsers_h264 gsth264parser.c:1749:gst_h264_parse_sps: parsing SPS
0:00:00.230716363 27417 0x56223729a230 DEBUG      codecparsers_h264 gsth264parser.c:416:gst_h264_parse_vui_parameters: parsing "VUI Parameters"
0:00:00.230730428 27417 0x56223729a230 LOG        codecparsers_h264 gsth264parser.c:1596:gst_h264_parse_sps_data: initial width=640, height=368
0:00:00.230743207 27417 0x56223729a230 LOG        codecparsers_h264 gsth264parser.c:1621:gst_h264_parse_sps_data: crop_rectangle x=0 y=0 width=640, height=360
0:00:00.230752786 27417 0x56223729a230 DEBUG      codecparsers_h264 gsth264parser.c:1475:gst_h264_parser_parse_sps: adding sequence parameter set with id: 0 to array
0:00:00.230767807 27417 0x56223729a230 DEBUG      codecparsers_h264 gsth264parser.c:242:gst_h264_parse_nalu_header: Nal type 8, ref_idc 3
0:00:00.230776550 27417 0x56223729a230 DEBUG      codecparsers_h264 gsth264parser.c:1885:gst_h264_parse_pps: parsing PPS
0:00:00.230787916 27417 0x56223729a230 DEBUG      codecparsers_h264 gsth264parser.c:2017:gst_h264_parser_parse_pps: adding picture parameter set with id: 0 to array
0:00:00.231037342 27417 0x56223729a230 DEBUG      codecparsers_h264 gsth264parser.c:242:gst_h264_parse_nalu_header: Nal type 1, ref_idc 3
0:00:00.231060572 27417 0x56223729a230 DEBUG      codecparsers_h264 gsth264parser.c:2079:gst_h264_parser_parse_slice_hdr: parsing "Slice header", slice type 0
Comment 6 Matteo Valdina 2018-07-24 16:34:56 UTC
At the beginning is missing the IDR but at 21 seconds of the stream (or RTP sequence number 37141 is missing), It's missing the SPS/PPS.
Comment 7 Matteo Valdina 2018-07-24 16:49:14 UTC
I think there is a little misunderstanding due to my test are on the 1.14.0 and your test I assume on the current master. Sorry for not clarifying which version I was using.

I checkout the latest master and I the behavior is drastically different due to the https://bugzilla.gnome.org/show_bug.cgi?id=796832 . With master, I'm not able to start the decoding when with the release 1.14.0 I'm able to decode until the 21th seconds.
My issue is after the 21 seconds where it is only missing the packet with SPS/PPS.
Comment 8 Nicolas Dufresne (ndufresne) 2018-07-24 16:56:31 UTC
Sure, I have posted plausible fix for the regression in master, which brings back to the same situation.

Now, an SPS is missing at 21s (no idea why image breaks at 13 for me, but it's later). What do you expect should happen next that is not happening ?
Comment 9 Nicolas Dufresne (ndufresne) 2018-07-24 17:02:39 UTC
(just a side note, GstForceKeyUnit is implemented by RTP payloader/depayloader, so it's decoder agnostic)
Comment 10 Matteo Valdina 2018-07-24 18:20:44 UTC
Thanks for the note about GstForceKeyUnit, I'll give a look.

What I saw that the decode image freeze and the report these logs line

couldn't find associated picture parameter set with id: 5
parse error -1
...
...
couldn't find associated picture parameter set with id: 5
parse error -1


Adding some logs to the gsth264parse and gstvaapidecoder I found that it continues to decode the same NALU until the stream ended

codecparsers_h264 gsth264parser.c:2073:gst_h264_parser_parse_slice_hdr: Parsing NALU 0x7f60c009f6f5 8576
This error condition continue beside that the stream contains new SPS/PPS and new IDR refering these SPS/PPS.

Thanks
Comment 11 Matteo Valdina 2018-07-24 20:44:34 UTC
I observed that If I replace the vaapih264dec with h264parse ! fakesink. The h264 parse is not stuck in the nalu but discard the NALU where is missing the SPS/PPS and resume correctly with the newer SPS/PPS.
Comment 12 Nicolas Dufresne (ndufresne) 2018-07-24 21:30:25 UTC
Yeah, this specific issue seems like a missing jump on the offset in order to skip the broken part.
Comment 13 Matteo Valdina 2018-07-25 14:58:41 UTC
Thanks,
Could you provide any suggestion where/how to fix this.
Comment 14 Nicolas Dufresne (ndufresne) 2018-07-25 17:07:24 UTC
I tried couple of things, but didn't figure out how to get that one working yet.
Comment 15 Nicolas Dufresne (ndufresne) 2018-07-25 18:00:39 UTC
Created attachment 373162 [details] [review]
WIP: vaapidecode: Skip unparsable units from adapter

If the unit could not be parsed, just skip this nal and keep parsing
what is left in the adapter. We need to flush the broken unit in the
decoder specific parser because the generic code does not know about
units boundary. This increases error resilliance.

Before this, the broken unit would stay in the adapter and EOS would be
returned. Which stopped the streaming. Just removing the EOS would have
lead to the adapter size growing indefinitely.
Comment 16 Nicolas Dufresne (ndufresne) 2018-07-25 18:02:04 UTC
Might be your lucky day, this is exactly what I had in mind. This patch is WIP because we need to visit every single decoder to do the same, because it's not possible to know the boundary of the unit (NAL) from the outside.
Comment 17 Nicolas Dufresne (ndufresne) 2018-07-25 18:04:04 UTC
Note, it would be nice to try and see why we reach this condition twice per broken unit:

codecparsers_h264 gsth264parser.c:2086:gst_h264_parser_parse_slice_hdr: couldn't find associated picture parameter set with id: 5

Normally that would mean broken parsing, but slice parsing is only happening in VAAPI code, not in h264parse, so this isn't normal. If we ignore this, you'll see that we hit this condition 9 times, exactly like FFMPEG.
Comment 18 Matteo Valdina 2018-07-25 18:11:06 UTC
I'll give a try today and I'll give a look to the double "couldn't find associated .."

Thanks a lot for the patch :-)
I appreciated that.
Comment 19 Nicolas Dufresne (ndufresne) 2018-07-25 19:03:21 UTC
Sorry, we do call gst_h264_parser_parse_slice_hdr() from h264parse element, so the double trace is caused by the known double parsing. A different subject/issue.
Comment 20 Nicolas Dufresne (ndufresne) 2018-07-25 19:55:05 UTC
Created attachment 373164 [details] [review]
vaapidecode: Skip unparsable units from adapter

If the unit could not be parsed, just skip this nal and keep parsing
what is left in the adapter. We need to flush the broken unit in the
decoder specific parser because the generic code does not know about
units boundary. This increases error resilliance.

Before this, the broken unit would stay in the adapter and EOS would be
returned. Which stopped the streaming. Just removing the EOS would have
lead to the adapter size growing indefinitely.
Comment 21 Nicolas Dufresne (ndufresne) 2018-07-25 19:55:10 UTC
Created attachment 373165 [details] [review]
WIP: decoder_h264: Maybe me leaking a picture

This is just a place holder commit, until I figure-out what we are
supposed to do. The picture is not set as last picture, and it's
also no unref.
Comment 22 Nicolas Dufresne (ndufresne) 2018-07-25 19:56:50 UTC
I went over the other decoder and only h265 had the same issue in the parser. I've added this second commit, but it's just to underline a problem, I have no idea what was the intention, should the skip picture be set as prev_pi or unreffed, in which case setting flags make no sense. But I could be missing something.
Comment 23 Matteo Valdina 2018-07-26 00:50:10 UTC
Thanks,
I test in my real scenario. 
It is working but I need to put an h264parse before the decoder otherwise it continues to fail to find the SPS/PPS. This time fails to find the latest SPS/PPS before was always the same. 

I probably need to do more testing and better understanding what is happening. (I tested backporting the patch to 1.14.0)

Thanks for the patch.
Comment 24 Nicolas Dufresne (ndufresne) 2018-07-26 01:09:29 UTC
Works for me on master branch without h264parse.
Comment 25 Nicolas Dufresne (ndufresne) 2018-07-26 01:10:50 UTC
Because it started to be confusing, I have patches from multiple bugs, you can access by work branch over here:

https://gitlab.collabora.com/nicolas/gstreamer-vaapi/tree/unrecoverrable-error
Comment 26 Matteo Valdina 2018-07-26 01:26:53 UTC
Thanks for sharing the branch.
Comment 27 Nicolas Dufresne (ndufresne) 2018-08-01 23:49:04 UTC
Comment on attachment 373165 [details] [review]
WIP: decoder_h264: Maybe me leaking a picture

The ref has been given to the unit, so there is no leak here.
Comment 28 Nicolas Dufresne (ndufresne) 2018-08-01 23:49:58 UTC
Attachment 373164 [details] pushed as 06aa82f - vaapidecode: Skip unparsable units from adapter
Comment 29 Matteo Valdina 2018-08-02 12:36:47 UTC
Thanks,

I extended your original patch with a request of GstForceKey (upstream) in case of parse and decode errors.
I think this is reasonable and I saw other decoders (openh264) do the same. Do you agree?
Comment 30 Matteo Valdina 2018-08-02 12:38:18 UTC
Here the idea:

diff --git a/gst/vaapi/gstvaapidecode.c b/gst/vaapi/gstvaapidecode.c
index d11b8032..53d08a6b 100644
--- a/gst/vaapi/gstvaapidecode.c
+++ b/gst/vaapi/gstvaapidecode.c
@@ -764,6 +764,13 @@ error_decode:
             ("Decode error %d", status), ret);
         break;
     }
+
+    if (ret != GST_FLOW_NOT_SUPPORTED) {
+      GST_DEBUG ("Requesting a key unit");
+      gst_pad_push_event (GST_VIDEO_DECODER_SINK_PAD (decode),
+          gst_video_event_new_upstream_force_key_unit (GST_CLOCK_TIME_NONE,
+              FALSE, 0));
+    }
     gst_video_decoder_drop_frame (vdec, frame);
     return ret;
   }
@@ -1176,8 +1183,13 @@ gst_vaapidecode_parse_frame (GstVideoDecoder * vdec,
       break;
     default:
       GST_ERROR ("parse error %d", status);
-      ret = GST_FLOW_EOS;
+      /* just keep parsing, the decoder should have flushed the broken unit */
+      ret = GST_VAAPI_DECODE_FLOW_PARSE_DATA;
       decode->current_frame_size = 0;
+      GST_DEBUG ("Requesting a key unit");
+      gst_pad_push_event (GST_VIDEO_DECODER_SINK_PAD (decode),
+          gst_video_event_new_upstream_force_key_unit (GST_CLOCK_TIME_NONE,
+              FALSE, 0));
       break;
   }
   return ret;
Comment 31 Nicolas Dufresne (ndufresne) 2018-08-02 15:26:28 UTC
Looks reasonable, can you make a git commit and git format-patch, and then attach ?
Comment 32 Matteo Valdina 2018-08-02 15:28:33 UTC
yes, for sure.
Comment 33 Matteo Valdina 2018-08-03 00:44:43 UTC
Created attachment 373248 [details] [review]
Requests GstForceKeyUnit to the upstream in case of parse/decode error.

Requests GstForceKeyUnit to the upstream in case of parse/decode error.

In the case of decode error, there is a change that affects other cases.
The "GstFlowret value was never initialized in some condition and so it was undefined.

I saw by experience that it was 0 (GST_FLOW_OK) but this is a random behavior and tied to the specific compiler/cpu/kernel and other factors.
I added a default value that it makes sense for my understanding of this codebase. But I'm probably wrong.

At least I think it is better because it is a defined behavior.

I saw that there are other functions that have this issue like gst_vaapidecode_push_all_decoded_frames where "ret" is returned in the default switch with the un-initialized value.
Comment 34 Víctor Manuel Jáquez Leal 2018-08-20 14:43:21 UTC
@Matteo, could file another bug with your patch? So we could keep track of it. Also, this affects all the codecs, not only h264
Comment 35 Matteo Valdina 2018-08-22 01:40:41 UTC
Yes, no problem.
You can find the new issue here https://bugzilla.gnome.org/show_bug.cgi?id=797006

Thanks
Matteo