GNOME Bugzilla – Bug 745048
vaapidecode: h264, fatal error "Status Unknown" decoding h264 content at gstvaapidecoder_h264.c:decode_current_picture
Last modified: 2015-05-28 09:03:07 UTC
Created attachment 297713 [details] [review] vaapideocder-hack avoiding the infinite loop and pipeline shutdown, sometimes leads to green screens Environment: -------------- kernel: 3.13.0-24-generic #47-Ubuntu x86_64 cpu: Intel(R) Atom(TM) CPU E3845 @ 1.91GHz gstreamer-vaapi: (master)6b73746ab6453253ea4bb82170914e0a1892e135 > 0.5.10 libva: (master)f9741725839ea144e9a6a1827f74503ee39946c3 > 1.4.0 libva-intel-driver: (master)9a20d6c34cb65e5b85dd16d6c8b3a215c5972b18 gstreamer: (master)1f6d5d3ff5cb03527acb99b4059788f53d49567e gst-plugins-base: (master)e2a693656b1a8513adab5e8f7698758c583c8eed > 1.4.0 gst-plugins-good: (master)6b69ef24a1e13f0ad7980fe25d954d37133ba746 > 1.4.0 gst-plugins-bad: (master)8e2964ee599853fff2a7aba9275960b4ae7844fc > 1.4.1 gst-plugins-ugly: (master)86562f56e208d0d7360a9b5f7bc5cf13836f06a3 > 1.4.0 Bug Info: -------------- libva info: VA-API version 0.37.0 libva info: VA-API version 0.37.0 libva info: va_getDriverName() returns 0 libva info: Trying to open /home/fahamne/gst/master/prefix/lib/dri/i965_drv_video.so libva info: va_getDriverName() returns 0 libva info: Trying to open /home/fahamne/gst/master/prefix/lib/dri/i965_drv_video.so libva info: Found init function __vaDriverInit_0_37 libva info: Found init function __vaDriverInit_0_37 libva info: va_openDriver() returns 0 libva info: va_openDriver() returns 0 Setting pipeline to PAUSED ... Pipeline is live and does not need PREROLL ... Got context from element 'vaapidecode1': gst.vaapi.Display=context, display=(GstVaapiDisplay)NULL; Setting pipeline to PLAYING ... New clock: GstSystemClock Setting pipeline to PAUSED ... Pipeline is live and does not need PREROLL ... Got context from element 'vaapidecode1': gst.vaapi.Display=context, display=(GstVaapiDisplay)NULL; Setting pipeline to PLAYING ... New clock: GstSystemClock fahamne@testboard-2164:~ >WARNING: from element /GstPipeline:pipeline0/GstVaapiSink:vaapisink0: A lot of buffers are being dropped. Additional debug info: gstbasesink.c(2794): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstVaapiSink:vaapisink0: There may be a timestamping problem, or this computer is too slow. ERROR: from element /GstPipeline:pipeline0/GstUDPSrc:udpsrc2: Internal data flow error. Additional debug info: gstbasesrc.c(2943): gst_base_src_loop (): /GstPipeline:pipeline0/GstUDPSrc:udpsrc2: streaming task paused, reason error (-5) Execution ended after 0:05:32.950265692 ERROR: from element /GstPipeline:pipeline0/GstUDPSrc:udpsrc2: Internal data flow error. Additional debug info: gstbasesrc.c(2943): gst_base_src_loop (): /GstPipeline:pipeline0/GstUDPSrc:udpsrc2: streaming task paused, reason error (-5) Execution ended after 0:05:33.031227820 Setting pipeline to PAUSED ... Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to READY ... Setting pipeline to NULL ... Setting pipeline to NULL ... Freeing pipeline ... Freeing pipeline ... Steps: -------------- Run 2 instances of the following pipeline (the video content could be coming from a filesrc): udpsrc address=239.4.4.23 port=10000 name=udpsrc1 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_tsdemux_1 ! tsdemux name=dmx1 program-number=3 \ dmx1. ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_videoparse_1 ! h264parse ! vaapidecode name=vaapidecode1 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_videosink_1 ! vaapisink display=x11 \ udpsrc address=239.4.4.23 port=10000 name=udpsrc2 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_tsdemux_2 ! tsdemux name=dmx2 program-number=3 \ dmx2. ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_videoparse_2 ! h264parse ! vaapidecode name=vaapidecode2 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_videosink_2 ! vaapisink display=x11 \ udpsrc address=239.4.4.23 port=10000 name=udpsrc3 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_tsdemux_3 ! tsdemux name=dmx3 program-number=3 \ dmx3. ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_videoparse_3 ! h264parse ! vaapidecode name=vaapidecode3 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_videosink_3 ! vaapisink display=x11 Additional Info: -------------- Tracking down the problem, the apparently internal data flow error, is actually happening after vaapidecode returns an error in gstvaapidecoder_h264.c:decode_current_picture with the following stacktrace. There is a case for the dpb_find_lowest_poc which all fs/pic output_needed flag is set to 0 and therefore it returns -1 which leads to a GST_VAAPI_DECODER_STATUS_ERROR_UNKNOWN at decode_current_picture function in that file: dpb_find_lowest_poc(decoder = 0x7fffc8017580,picture = 0x7fffc8006180,found_picture_ptr = 0x7fffdfffdd10,); dpb_bump(decoder = 0x7fffc8017580,picture = 0x7fffc8006180,); dpb_add(decoder = 0x7fffc8017580,picture = 0x7fffc8006180,); decode_current_picture(decoder = 0x7fffc8017580,); gst_vaapi_decoder_h264_end_frame(base_decoder = 0x7fffc8017580,); do_decode_1(decoder = 0x7fffc8017580,frame = 0x7fffc8015c10,); do_decode(decoder = 0x7fffc8017580,base_frame = 0x7fffd80042e0,); gst_vaapi_decoder_decode(decoder = 0x7fffc8017580,frame = 0x7fffd80042e0,); gst_vaapidecode_decode_frame(vdec = 0x81d6f0,frame = 0x7fffd80042e0,); gst_vaapidecode_handle_frame(vdec = 0x81d6f0,frame = 0x7fffd80042e0,); gst_video_decoder_decode_frame(decoder = 0x81d6f0,frame = 0x7fffd80042e0,); gst_video_decoder_have_frame(decoder = 0x81d6f0,); gst_vaapidecode_parse_frame(vdec = 0x81d6f0,frame = 0x7fffd80042e0,adapter = 0x6935b0,at_eos = 0,); gst_vaapidecode_parse(vdec = 0x81d6f0,frame = 0x7fffd80042e0,adapter = 0x6935b0,at_eos = 0,); gst_video_decoder_parse_available(dec = 0x81d6f0,at_eos = 0,new_buffer = 0,); gst_video_decoder_chain_forward(decoder = 0x81d6f0,buf = 0x7fffbc01c3d0,at_eos = 0,); gst_video_decoder_chain(pad = 0x7ed900,parent = 0x81d6f0,buf = 0x7fffbc01c3d0,); gst_pad_chain_data_unchecked(pad = 0x7ed900,type = 4112,data = 0x7fffbc01c3d0,); gst_pad_push_data(pad = 0x7ed6d0,type = 4112,data = 0x7fffbc01c3d0,); gst_pad_push(pad = 0x7ed6d0,buffer = 0x7fffbc01c3d0,); gst_base_parse_push_frame(parse = 0x80c240,frame = 0x7fffd4001c50,); gst_base_parse_handle_and_push_frame(parse = 0x80c240,frame = 0x7fffd4001c50,); gst_base_parse_finish_frame(parse = 0x80c240,frame = 0x7fffd4001c50,size = 6,); gst_h264_parse_handle_frame(parse = 0x80c240,frame = 0x7fffd4001c50,skipsize = 0x7fffdfffea3c,); gst_base_parse_handle_buffer(parse = 0x80c240,buffer = 0x7fffac03de80,skip = 0x7fffdfffea3c,flushed = 0x7fffdfffeab0,); gst_base_parse_chain(pad = 0x7ed4a0,parent = 0x80c240,buffer = 0x7fff98003370,); gst_pad_chain_data_unchecked(pad = 0x7ed4a0,type = 4112,data = 0x7fff98003370,); gst_pad_push_data(pad = 0x7ed270,type = 4112,data = 0x7fff98003370,); gst_pad_push(pad = 0x7ed270,buffer = 0x7fff98003370,); gst_queue_push_one(queue = 0x7aa720,); gst_queue_loop(pad = 0x7ed270,); gst_task_func(task = 0x9153b0,); default_func(tdata = 0x8212a0,pool = 0x622090,); ??(); ??(); start_thread(arg = 0x7fffdffff700,); clone(); ------------------- Note: the attached patch seem to alleviate the fatal error and shutdown of the pipeline but sometimes lead to a green screen and can't recover from
To more easily reproduce the problem use the following pipeline to multicast a filesrc on port 5000: FILE=~/test-streams/h264-HD-sample.ts gst-launch \ filesrc location=$FILE ! \ tsparse set-timestamps=true smoothing-latency=10 ! udpsink host=224.1.1.1 port=5000 auto-multicast=true & and then run as many instance of the following pipeline to almost max-out the bistream, then you will see some of the gst instances eventually shutdown: gst-launch \ udpsrc address=224.1.1.1 port=5000 name=udpsrc1 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_tsdemux_1 ! tsdemux name=dmx1 program-number=6 dmx1. ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_videoparse_1 ! h264parse ! vaapidecode name=vaapidecode_1 ! vaapisink display=x11 udpsrc address=224.1.1.1 port=5000 name=udpsrc2 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_tsdemux_2 ! tsdemux name=dmx2 program-number=6 dmx2. ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_videoparse_2 ! h264parse ! vaapidecode name=vaapidecode_2 ! vaapisink display=x11 udpsrc address=224.1.1.1 port=5000 name=udpsrc3 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_tsdemux_3 ! tsdemux name=dmx3 program-number=6 dmx3. ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=10000000000 name=q_videoparse_3 ! h264parse ! vaapidecode name=vaapidecode_3 ! vaapisink display=x11
Created attachment 299244 [details] [review] decoder: h264: add initial support for loss of pictures This patch fixes crashes or errors in the process of adding a picture to the DPB. However, since this issue is due to missing packets, the green screen is unavoidable as if some packets of the frame to decode are missing, the rest of the surface won't get filled up.
Gwenole's patch seems to not fix it. I'll try to make a reasonably sized clip I can share.
Also, can't we replace the green screen with the last correctly decoded frame? It would look a lot better.
(In reply to Olivier Crête from comment #3) > Gwenole's patch seems to not fix it. I'll try to make a reasonably sized > clip I can share. 2 possible issues here, wrt. the supplied clip: - POC type == 0 and the POC remains at G_MAXINT32 ; - If the lost frame is actually a lost field, we need to update the first field in an existing frame store to report out that it is now unpaired/unique, so that the check for "complete frame store" can succeed, and the frame/field to be output. Then, an improvement to the patch, as written in there, could be to optimize out the number of iterations to fill in the gaps. IIRC, probably work out modulo the size of the DPB.
(In reply to Olivier Crête from comment #4) > Also, can't we replace the green screen with the last correctly decoded > frame? It would look a lot better. If the frame is in the DPB, this means it was correctly decoded. If a new picture is created without decoded contents, i.e. not cloned from an existing one, this means that we actually did not have any previous frame. :) i.e. we got lost pictures from the very beginning. I think one way to address this specific (additional) problem is to operate like for the MPEG-2 decoder and skip anything before the first I-frame gets decoded. Would this be an acceptable solution?
Skipping to the first I-frames sounds like a good idea. Slightly related, we should also set GST_BUFFER_FLAG_CORRUPTED on any buffer between a loss and the following i-frame, so downstream can decide if it rather have only perfect pictures or have some movement. That said, we saw green artifacts in the middle of the stream with Faham's hack.
(In reply to Gwenole Beauchesne from comment #5) > (In reply to Olivier Crête from comment #3) > > Gwenole's patch seems to not fix it. I'll try to make a reasonably sized > > clip I can share. > > 2 possible issues here, wrt. the supplied clip: > - POC type == 0 and the POC remains at G_MAXINT32 ; > - If the lost frame is actually a lost field, we need to update the first > field in an existing frame store to report out that it is now > unpaired/unique, so that the check for "complete frame store" can succeed, > and the frame/field to be output. > > Then, an improvement to the patch, as written in there, could be to optimize > out the number of iterations to fill in the gaps. IIRC, probably work out > modulo the size of the DPB. Another issue was: PrevFrameNum was used to track gaps/losses of pictures, instead of relying on PrevRefFrameNum. Besides, another condition to check for gaps was missing. The supplied clip does start with an I-frame. Anyway, I will keep that additional patch as displaying frames that are bound to be broken is not very appealing. We could argue for subsequent losses/skip to next I-frame, but at least covering for the initial I-frame case should be enough. Notes: - The Reference Software decoder (JM) also crashes with the supplied clip, for another reason though ; - The FFmpeg decoder, both in SW and HW accelerated mode, produces a non-green artefact. So, we will try to perform at least as good as FFmpeg/vaapi. i.e. no crash, no green frame for the supplied clip.
Created attachment 303033 [details] [review] decoder: h264: skip all pictures prior the first I-frame
Created attachment 303034 [details] [review] decoder: h264: add support for missing field This adds support for recovering from missing fields within a single frame. This is the most important patch for this issue. Others are unrelated since the clip does not support gaps_in_frame_num anyway, so it's best-effort for "unintentional loss of pictures".
Created attachment 303035 [details] [review] decoder: h264: add initial support for loss of pictures Changes: - Fixed regression with some (progressive) clips TODO: - Fix the reference frame to use for missing frames. This must have a valid field for the field of interest, i.e. to be processed. Otherwise, we would be reconstructing from invalid data (greenish frames). Alternative is to fill in RefPicListX[] with a valid entry at all times, possibly the nearest POC field/frame. Normally, the VA driver ought to be able to do that itself anyway.
The hack is not needed. The patchset approaches from the correct way to fix the issue(*). The greenish frames will be an enhancement to finalize. (*) that is, insert missing field in frame, then missing frames.
Review of attachment 297713 [details] [review]: Improved fixes where posted, we need to avoid workarounds. Thanks.
Gwenole, I've tested the patches and it looks good. If you agree, I'll push them asap.
Hi, (In reply to Víctor Manuel Jáquez Leal from comment #14) > Gwenole, I've tested the patches and it looks good. If you agree, I'll push > them asap. Please don't. There are a couple of changes needed and I have not run a full run of regression testing yet. Thanks.
In particular, the video stream of interest features three interesting cases (thanks Olivier!): 1. Missing "second-field" of a frame. This is fixed in the attached patches ; 2. Missing "first-field" of a frame. This is *not* fixed in any of the patches ; 3. Missing frames. This is fixed in the attached patch. A workaround to the "greenish" frames you might see could be to look for a valid (past) frame to be used for reference when find_short_term_reference() returns nothing. However, this is a workaround. The correct fix to the remaining issue relates to (2). We need (i) proper detection of parity order in sequence, (ii) insertion of a dummy "first-field" if needed. (i) could be a little hackish since the h264 standard does not necessarily define a "top-field-first" syntax element. Rather, field parity is explicitly specified through per-picture syntax elements, or (optional) SEI messages. Still, no explicit TFF flag, this needs to be guessed. One assumption that could be made: within the sequence, as in the same activated SPS id, we need to have the same first-field parity. So, we could possibly cache out the computed TFF result. The situation we came into here, in the patches applied, is that we were to assume that the first field we got after missing frames was the first of a frame store. And, in practice, a non-reference picture. The next field that arrives, and in practice, is a reference picture, would be combined to the first. However, since the first was a non-reference picture, it was emitted out of the DPB. So, we got a second-field, which was not really the expected second-field, since the first-field was actually the second-field of a picture with a missing first-first. So the reference field gets lost (released) since no longer valid in the DPB. So, find_short_term_reference() is not able to find it. I foresee at least two additional changes: 1. Proper detection of missing first-field the (2.i) and (2.ii) cases above ; 2. Proper propagation of inserted missing frames by making sure that if this is an interlaced one, then it should have two fields with valid contents. So, we might have to split the frame insertion code into fields insertion code paths whenever necessary. With those, greenish frames could be further reduced, thus possibly rendering the "look for the nearest valid field if reference was not found" workaround useless. However, if you are pressed by time, you may try to implement that workaround/concept (use find_nearest_prev_poc() for instance). I'd focus on the above-mentioned plan though, as it would offer a more globally robust and future-proof solution.
Please try the following branch: https://github.com/gbeauchesne/gstreamer-vaapi/tree/h264.gaps I will push the patches to git master on next week. Changes: - Fixed multiples other regressions in H.264 conformance testing ; - Fixed tracking of the "top-field-first" (TFF) flag to ease gaps tracking ; - Dropped obsolete code to fix other collateral issues ; - Implemented (2.i) and (2.ii) from above discussions, which seems enough.
commit 5abd2b90b6b67403cd12e88496abf28e6fb39dc9 Author: Gwenole Beauchesne <gwenole.beauchesne@intel.com> Date: Thu Mar 12 22:57:22 2015 +0100 decoder: h264: add initial support for loss of pictures. Implement decoding process for gaps in frame_num (8.5.2). This also somewhat supports unintentional loss of pictures. https://bugzilla.gnome.org/show_bug.cgi?id=745048 https://bugzilla.gnome.org/show_bug.cgi?id=703921 Original-patch-by: Wind Yuan <feng.yuan@intel.com> [fixed derivation of POC, ensured clone is valid for reference, actually fixed detection of gaps in FrameNum by PrevRefFrameNum] Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com> commit 8dd93e9c8ab301d17c84fb9a92e30990aff853ad Author: Gwenole Beauchesne <gwenole.beauchesne@intel.com> Date: Fri May 22 11:42:52 2015 +0200 decoder: h264: add support for missing first field. Try to identify missing first fields too, thus disregarding any intermediate gaps in frames. We also assume that we keep the same field sequence, i.e. if previous frames were in top-field-first (TFF) order, then so are subsequent frames. Note that insertion of dummy first fields need to operate in two steps: (i) create the original first field that the current field will inherit from, and (ii) submit that field into the DPB prior to initializing the current (other) field POC values but after any reference flag was set. i.e. copy reference flags from the child (other field) to the parent (first field). https://bugzilla.gnome.org/show_bug.cgi?id=745048 commit efaadfc7c086e97442df28383430957c7f01c337 Author: Gwenole Beauchesne <gwenole.beauchesne@intel.com> Date: Thu May 7 14:00:58 2015 +0200 decoder: h264: add support for missing second field. Interlaced H.264 video frames always have two fields to decode and display. However, in some cases, e.g. packet loss, one of the field can be missing. This perturbs the reference picture marking process, whereby the number of references available in DPB no longer matches the expected value. This patch adds initial support for missing field within a decoded frame. The current strategy taken is to find out the nearest field, by POC value, and with the same parity. https://bugzilla.gnome.org/show_bug.cgi?id=745048 commit 4776138d4aa082baa58ff6c46c6c7456c856df60 Author: Gwenole Beauchesne <gwenole.beauchesne@intel.com> Date: Fri May 22 17:06:11 2015 +0200 decoder: h264: improve tracking of "top-field-first" flag. Try to maintain a "top-field-first" (TFF) flag, even if the H.264 standard does not mandate it. This will be useful for tracking missing fields, and also for more correct _split_fields() implementation for frames in the DPB. commit b4e920843bd9cd6f07eb6ace3017d72eb98e09a0 Author: Gwenole Beauchesne <gwenole.beauchesne@intel.com> Date: Tue May 5 11:56:11 2015 +0200 decoder: h264: skip all pictures prior the first I-frame. Don't try to decode pictures until the first I-frame is received within the currently active sequence. There is no point is decoding and then displaying frames with artifacts. commit 6229ad4f7fa143e89e94e45d38e16b2577ba4af6 Author: Gwenole Beauchesne <gwenole.beauchesne@intel.com> Date: Tue May 12 15:36:10 2015 +0200 decoder: h264: fix processing of EOSEQ NAL. Fix decoding of end_of_seq() NAL unit so that to not submit the current picture for decoding again. This is pretty vintage code that dates back before the existing of the whole decoder units machinery. One issue that could be arising if that code was kept is that we could have submitted a picture, and subsequently a GstVideoCodec frame, twice. Once without the decode_only flag set, and once with that flag set. The end result is that the GstVideoDecoder would release the codec frame twice, thus releasing stale data. In short, the piece of code that is removed by this patch is for once completely obsolete for a while, and secondly error-prone in corner cases. commit a89a8cf1e3a0307c481510b4a43cb8f6ec96b740 Author: Wind Yuan <feng.yuan@intel.com> Date: Thu Feb 28 15:26:36 2013 +0800 decoder: add utility function to clone picture objects. https://bugzilla.gnome.org/show_bug.cgi?id=703921 Signed-off-by: Wind Yuan <feng.yuan@intel.com> [added cosmetic changes, fixed propagation of "one-field" flag to children, fixed per-codec clone modes (h264)] Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>