GNOME Bugzilla – Bug 697806
avdec_h264 Fails during error recovery after packet loss leading to stream corruption
Last modified: 2014-01-02 13:46:12 UTC
When using avdec_h264 with RTP it's fairly common to see a small amount of packet loss. Unfortunately this causes an error in gstreamer caused by the codec attempting to recover from the error, the video passing through slowly degrades. When this error occurs the first time you usually see a completely grey frame or just an outline of objects in the image on a grey background, the video stream can then recover. Usually if this error happens more than a few time the video stream becomes intermittent with sections of grey frames and sections of paused video. The error you see is: 0:00:41.022060000 [331m23892[00m 0x7fd9040a1b70 [33;01mWARN [00m [00m libav gstavviddec.c:655:gst_ffmpegviddec_get_buffer:<h264decode>[00m already alloc'ed output buffer for frame 0:00:41.022079000 [331m23892[00m 0x7fd9040a1b70 [31;01mERROR [00m [00m libav :0::[00m get_buffer() failed (-1 2 0x0) 0:00:41.022087000 [331m23892[00m 0x7fd9040a1b70 [31;01mERROR [00m [00m libav :0::[00m decode_slice_header error 0:00:41.022096000 [331m23892[00m 0x7fd9040a1b70 [33;01mWARN [00m [00m libav gstavviddec.c:655:gst_ffmpegviddec_get_buffer:<h264decode>[00m already alloc'ed output buffer for frame 0:00:41.022119000 [331m23892[00m 0x7fd9040a1b70 [31;01mERROR [00m [00m libav :0::[00m get_buffer() failed (-1 2 0x0) 0:00:41.022129000 [331m23892[00m 0x7fd9040a1b70 [31;01mERROR [00m [00m libav :0::[00m decode_slice_header error 0:00:41.022134000 [331m23892[00m 0x7fd9040a1b70 [31;01mERROR [00m [00m libav :0::[00m mmco: unref short failure 0:00:41.022149000 [331m23892[00m 0x7fd9040a1b70 [36mINFO [00m [00m libav :0::[00m concealing 396 DC, 396 AC, 396 MV errors I have worked out this is because of some code in the libav h264 decoder that allocates extra frames in order to do processing to help cover off the missing frames in the stream - these frames are not used for any output and I am pretty sure are destroyed before libav returns. gstavviddec.c presumes that the allocation function registered with libav will only be called once per output frame. It is not clear from the API documentation on libav how many times the allocator should be called, and am going to create a patch which allows the allocator to be called more than once per frame. The extra allocation only occurs when there is an error in the stream, so for non-RTP cases it's likely that these issues would be very rarely seen. I am not sure but I think the current system could lead to the original packet being unreffed and then continued to be accessed - which could lead to worse problems. I'll be checking this now as I look into patching.
I can reproduce the same error using: gst-launch-1.0 videotestsrc ! 'video/x-raw, width=320, height=240, framerate=15/1'! x264enc speed-preset=ultrafast ! identity drop-probability=0.01 ! avdec_h264 ! videoconvert ! osxvideosink I think I'm getting comfortable with the best way to patch this, but the patch may remove quite a bit of code - 1 structure and most of the code in gst_ffmpegviddec_get_buffer, gst_ffmpegviddec_reget_buffer and gst_ffmpegviddec_release_buffer as it aligns the API intentions closer. I think that the root cause of the issue was vagueness in the FFMPEG docs leading to an incorrect assumption that one buffer allocation is one frame. Whereas FFMPEG assumes 1 AVFrame is 1 frame and that it's safe if needed to allocate multiple buffers per frame. The FFMPEG code has FIXME in the area of code that causes the actual issue as the assumptions that gst-libav do generally hold (if they didn't FFMPEG code would most likely have horrible performance). As far as I can tell fixing FFMPEG probably wouldn't resolve the need for extra allocations in all cases (but I am no expert) and so actually changing this in gstreamer seems a good idea and may resolve some other issues caused by the mismatch.
I have a patch now that I need to tidy up a little, but it improves loss compensation quite a bit. Running the above pipeline often starts up with poor video, but packet loss later on in the stream leaves most of the stream intact. When running before the patch unchanging parts of the stream are often lost and replaced with grey blocks.
Further testing showed improvement even when drop-probability=0.25 with unmoving parts of the video stabilising quite quickly.
Created attachment 241419 [details] [review] Patch reworking allocation The patch is slightly different to what I was expecting before and doesn't simplify quite so much as I didn't take into account mapping the buffer originally. I expect that allocating the AVFrame for each frame processed is less performant, but seems to be what is expected and ensures that we don't get memory leaks for buffers allocated for each AVFrame. Overall I think this aligns better with the expectations within libav.
I have tested my fix on Ubuntu as well now and seems to improve things.
Review of attachment 241419 [details] [review]: Looks good to me in general, just some comments ::: ext/libav/gstavviddec.c @@ +694,3 @@ + * otherwise it should allocate a new buffer + * and copy across to it (which is what + * avcodec_default_reget_buffer does). Yes, want to fix that in another patch on top of this? :) @@ +1130,3 @@ + picture->opaque = NULL; + GST_DEBUG_OBJECT (ffmpegdec, "Now decoding into picture %p opaque %p", + picture, picture->opaque); Why do you set picture->opaque to NULL here? (And then print picture->opaque) @@ +1213,3 @@ + if (picture != NULL) { + av_free (picture); This doesn't seem to be called in the error cases above (no_output, negotiation_error, no_frame, etc) and is causing a leak
Sebastian - thanks so much for looking at the patch. I hope that I'm helping by submitting these patches - I feel that perhaps I'm taking up quite a bit of people's time on bugs, patches and IRC (especially where I don't understand things yet etc). On the first point (patching reget) - I'll be happy to try and fix the first point at some time, but there are only so many hours in the day and it seemed a minor performance issue. I thinking that rather than fixing this a better patch would be to move to using the new get_buffer2 APIs on libav. I think I just about understand this enough to be able to do that. Would it be helpful if I did that? Are there any plans to move to the new APIs? Second point (printing opaque) - I should have removed picture->opaque from that debug, I was printing it because it should have been NULL but wasn't when libav called into get_buffer really this was just part of me working out how it seemed to be getting opaque back from no where. I'll rework the patch for this. Third point (missing av_free) - I'll think about this I think you are right about the free being mismatched. Good job I've got someone looking at what I'm doing! It may take me a could of days to work through this and get the patch right as I have to work!
(In reply to comment #7) > Sebastian - thanks so much for looking at the patch. I hope that I'm helping > by submitting these patches - I feel that perhaps I'm taking up quite a bit of > people's time on bugs, patches and IRC (especially where I don't understand > things yet etc). That's fine, don't worry about that :) > On the first point (patching reget) - I'll be happy to try and fix the first > point at some time, but there are only so many hours in the day and it seemed a > minor performance issue. I thinking that rather than fixing this a better > patch would be to move to using the new get_buffer2 APIs on libav. I think I > just about understand this enough to be able to do that. Would it be helpful > if I did that? Are there any plans to move to the new APIs? Yes, moving to the new API would be preferred if this already fixes this issue even better :) But please do it in a separate patch. Everything else (plus these 2 other changes mentioned before) can be pushed now already and should be a big improvement already.
Created attachment 243793 [details] [review] Patch reworking allocation - fixed for code review I have now done the reworking. av_free now moved into beach as that is called in all cases and the stupid printing of opaque removed. Sorry I didn't get this done sooner - didn't take long was just finding the time.
Created attachment 243831 [details] [review] 0001-avviddec-Don-t-assume-that-exactly-one-buffer-is-req.patch Patch against latest git master. This fails with h264 here, repeating this all the time: 0:00:03.467107400 20173 0x157cb20 WARN libav gstavviddec.c:622:gst_ffmpegviddec_get_buffer:<avdec_h264-0> Get frame when opaque already set to 0x7f155c12a9d0 on picture 0x7f155c2c8ae0 - cannot allocate buffer 0:00:03.467149917 20173 0x7f154807fb50 ERROR libav :0:: get_buffer() failed (-1 2 (nil)) 0:00:03.467171485 20173 0x7f154807fb50 ERROR libav :0:: decode_slice_header error 0:00:03.467183221 20173 0x7f154807fb50 ERROR libav :0:: no frame! 0:00:03.467206550 20173 0x157cb20 WARN libav gstavviddec.c:1269:gst_ffmpegviddec_frame:<avdec_h264-0> avdec_h264: decoding error (len: -1, have_data: 0)
Patch was tested against 1.0.5 I'll try master, but I did have a number of build issues on master before.
master of gst-libav should build fine against releases of everything else (you have to change the gstreamer requirements in configure.ac from 1.1.0 to 1.0.0 though). Biggest difference is probably the newer libav version, which might be the cause for this.
Latest master looks good apart from bad isn't building. I have run this and it looks like this is caused by the change in libav version. It seems (still investigating) that the new libav copies the opaque value between copies of the same frame meaning I loose the association between the buffer object allocated by gstreamer and the buffer used by libav. I think I could fix it by ignoring the opaque already being set, but I am a little nervous that it could mean some error conditions not being detected and even possibly leaking buffers. I think that I will fix this by moving to get_buffer2, as this allows an opaque to be stored against each buffer which is what I was really trying to do with this patch and I am confident that longer term it's the right move. Unfortunately moving to use get_buffer2 will take a little bit of time (especially with testing).
Any progress on this?
Sorry about the lack of progress - I've been snowed under recently, but it's still on my plan. Sorry for the delays, but I can only work on this when I've got spare time. Thanks for the patience.
Created attachment 257422 [details] [review] Patch that applies to 1.2.0 I have made a new patch appropriate for 1.2.0. Unfortunately my original plan for moving to get_buffer2 cannot be done as that didn't make it into libav 9, but the original patch created for 1.0 stream wouldn't quite work either. For a while I couldn't work out how to make this work as when packet loss is encountered libav now requests multiple buffers for each frame - meaning we have nowhere to store the GstFFMpegVidDecVideoFrame. In the end I realised that the buffers were allocated and deallocated in order so when multiple buffers are requested the current buffer can be stored as a pointer within the newly allocated one. During the release call it's restored. I have tested this and checked that release/get calls are matched and as far as I can tell this is always releasing the memory even during packet loss tests. It's important to note that the libav buffer allocation semantics are slightly different for some of the audio decoding methods, please don't apply the same change for audio.
Created attachment 257439 [details] GST_DEBUG=libav:5 log + gdb (In reply to comment #16) 1231 GST_DEBUG_OBJECT (ffmpegdec, "Assigning allocated buffer to output frame"); 1232 out_frame->output_buffer = gst_buffer_ref (out_dframe->buffer); This crashes for me - out_dframe is set earlier from picture->opaque which is NULL. Please see the log for details. It works from gst-launch though.
(In reply to comment #17) > It works from gst-launch though. Sorry, turns out it depends. This will crash on the same line (1232): gst-launch-1.0 -e videotestsrc ! x264enc ! avdec_h264 ! fakesink
I can see what's wrong stupid little bug that didn't impact any of my test cases as this only impacts when GStreamer cannot do the allocation. I'll fix this.
Created attachment 257481 [details] [review] Patch that applies to 1.2.0 - now works with fakesink Patch with fix for the fakesink issue.
(In reply to comment #18) > (In reply to comment #17) > > It works from gst-launch though. > > Sorry, turns out it depends. This will crash on the same line (1232): > gst-launch-1.0 -e videotestsrc ! x264enc ! avdec_h264 ! fakesink Do you think you could try this new patch? I think it should be good for you. If you do have any further issues please let me know.
Works for me. I think this might cover #689936 too. Finally avdec_h264 is usable again! (like it is in 0.10)
Created attachment 257508 [details] GST_DEBUG=libav:5 failed assertion This can happen on state changes.
It looks like it's most likely that this is a condition that existed before except before so perhaps it should be a separate bug. decoder->priv->output_state should be setup by gst_ffmpegviddec_negotiate which calls gst_video_decoder_set_output_state as required. If the decoder doesn't think the output_state has changed or the output_state changes between negotiate and gst_video_decoder_allocate_output_buffer then this could happen. Does this only happen when it has been decoding and changes state? How can I reproduce it? If there is another bug raised I can try to find some more time to fix. BTW I think there is actually an issue with gst_video_decoder_allocate_output_buffer as you cannot tell the difference between failure to allocate and not negotiated. Since the locking is managed in the base class I don't see how the subclass can be 100% thread safe - but I'd like to defer that to someone else. I'd like to add a new method that has a return value similar to allocate_output_frame and passes back the buffer through an output parameter GstBuffer** or NULL on error. If it did return not negotiated from buffer allocation I'm not sure exactly how to handle it - try to negotiate again in a loop with some sort of limit? However this type of change should be separated unless others think this is caused by this patch IMHO. Does that make sense? Please someone tell me what is the best way to proceed. Thanks!!
(In reply to comment #24) > It looks like it's most likely that this is a condition that existed before > except before so perhaps it should be a separate bug. Not sure, assertion failure started to happen only with the patch applied. > Does this only happen when it has been decoding and changes state? Yes, pipeline changes state like this: NULL -> PLAYING -> NULL -> PLAYING -> assertion failure. > How can I reproduce it? The sequence above should do the trick. I will try to make a test program to trigger this. Dunno about the rest, I hope someone more competent will take a closer look. :)
Created attachment 263622 [details] [review] avviddec: improve buffer handling and semantics There is indeed some misalignment in how _get_buffer etc are handled presently, and something also feels not quite right in _reget_buffer. It also seems it is possible to tweak (only) slightly to obtain a behaviour with more aligned semantics and which also follows the (working fine) 0.10 approach closer. This patch tweaks as such without being too intrusive in existing code, and seems to work fine for limited testing, both for the particular (pipeline) case here, as for regular cases. As such, will commit later on pending possible comments and further testing.
Thanks for looking at my bug - I hope my patches were at least a little help. I haven't been able to test it unfortunately as master currently isn't compiling for me (I corrected a couple of problems and gave up - is this expected? I might be able to get it to compile without -Werr) and the patch fails on 1.2. Is there a version that works on 1.2? I'd like to test the video quality and examine how PLC works with H.264 now - I have been surprised how often we use PLC even with NACKs enabled. From a GStreamer point of view it seems to me that this patch means that you are using allocations on a single GstVideoCodecFrame for multiple AVFrame instances as you get it back with the context->reordered_opaque. I'd like to understand this better - it seems a clever solution. Is this an allowed/accepted pattern of usage? I certainly didn't expect that this sort of thing would be expected API usage as it seemed to make a certain presumption about how the GstVideoCodecFrame worked. I've generally tried to follow the API docs and was concerned that it said "The buffer is owned by the frame and references to the frame instead of the buffer should be kept." so I didn't really want to mess with the buffer unless I had originally allocated myself. In my patch I tried to correct what I saw as another potential issue which was an a buffer allocated could continue to be used by libav even after you've called the avcodec_decode_video2 (until the context has been free'd). I don't think this is broken right now (at least for H264), but I think it could be the case in the future as far as I can tell. I based this on what I saw of the API docs and their code - especially the assumptions around the packet loss concealment code. What would happen if the buffer continued to be accessed? Is the buffer ref'd to stop it being released? I may have gone too far in allocating a unique AVFrame instance per frame as that is not the case with the libav video decode example. Have you thought how this will play with the new get_buffer2 API? I think that gstreamer still needs to upgrade libav to do that, but it seemed to me that when that is done we wouldn't need to have references to pictures etc when we get a release and could delete a bit of the code - just ref the buffer when allocated into the libav buffer and release a ref on the associated buffer when the free callback is called.
Thanks for reporting and looking into this. git master compiles ok for me (except for -bad ATM; needs some Wno-error tweaking unfortunately). Cherry-picking some more recent patches might also have it apply to 1.2. As for the API, even after some time, IMO there is still some tweaking to be done it and is as such not final. Even then, "the buffer is owned by the frame", I would say it is up to the subclass to decide when to hand one to the frame. Also, I would prefer to have (give or take) -- GstFlowReturn gst_video_decoder_alloc_buffer (GstVideoDecoder * decoder, GstBuffer * buffer); (in stead of the 2 current variations) -- which roughly does what _alloc_frame does now, i.e. a slightly convenient/smart wrapper around _acquire_buffer (and alike the old _pad_alloc). Subclass should use that and hand a buffer to a frame when it is _finish_frame time (as is done now here). (*) Regarding concerns about a buffer still accessed; a ref on the buffer is held until _release_buffer is issued (by libav), held either by the GstVideoCodecFrame or by GstFFMpegVidDecVideoFrame directly now with this patch. Have not (yet) looked at get_buffer2 API (not yet in libav version in use it seems), and have kept to rather some small steps for now (so we see where/how we go ;) ). We can indeed get around to other changes later on. (*) Also, I am not even sure about output_frame in the codec struct. The base class should do metadata admin (the timestamp stuff in frame struct), hand data to subclass, and then get data/buffer from subclass along with the subclass pointing to the 'suggested' metadata, i.e. frame. That is the essence, and the API should do that without getting in the way and imposing all sorts of restrictions such as dumping buffers in struct field </rant>
Created attachment 263708 [details] [review] avviddec: improve buffer handling and semantics As before, but even slightly simpler (no modification to _reget_buffer needed).
This should at least fix this, dealing with rants at another time ;) commit 47d5c2b6d699e4cb6608693a1ad9d4af4469e0a7 Author: Mark Nauwelaerts <mnauw@users.sourceforge.net> Date: Sat Dec 7 11:35:09 2013 +0100 avviddec: improve buffer handling and semantics ... so as to focus on providing *a* buffer rather than one (too) tied to a frame, in particular allowing multiple allocations related to a frame. Fixes https://bugzilla.gnome.org/show_bug.cgi?id=697806
I have tested it and all seems good - though Mac build of master still has issues (mainly it seems with building tests). In order to build I had to turn off tests, examples and fatal_warnings. Thanks for the patch it's very much appreciated.
(In reply to comment #31) > I have tested it and all seems good - though Mac build of master still has > issues (mainly it seems with building tests). In order to build I had to turn > off tests, examples and fatal_warnings. Can you create a new bug with details about these? :) Thanks for testing and confirming that these fixes work!