GNOME Bugzilla – Bug 750865
avviddec: Crashing if resolution changes without corresponding input caps changes
Last modified: 2015-08-16 13:37:27 UTC
Created attachment 305160 [details] [review] tentative patch I have a file that I cannot share that cause a libav segfault, with the attached patch now print this logs instead of crash: WARN libav gstavviddec.c:1167:get_output_buffer:<avdec_h264-0> avoid to call av_picture_copy with wrong parameters: bytewidth 704 src linesize 352
I forgot to attach the trace
+ Trace 235164
Comment on attachment 305160 [details] [review] tentative patch This should arguably error out completely instead of just doing nothing. Also linesize is an array and contains the stride for each plane, not just a plain integer. So you probably want to compare GST_VIDEO_FRAME_PLACE_STRIDE(frame, X) and outpic->linesize[X] for all planes Any idea why it gives us wrong values here? Do you have a way to reproduce it, does it only happen with that single file, which codec and other properties does it have?
(In reply to Sebastian Dröge (slomo) from comment #2) > Comment on attachment 305160 [details] [review] [review] > tentative patch > > This should arguably error out completely instead of just doing nothing. > vlc,mplayer,avplay does not error out and play the whole file > > Also linesize is an array and contains the stride for each plane, not just a > plain integer. So you probably want to compare > GST_VIDEO_FRAME_PLACE_STRIDE(frame, X) and outpic->linesize[X] for all planes > > > Any idea why it gives us wrong values here? Do you have a way to reproduce > it, does it only happen with that single file, which codec and other > properties does it have? probably because there is a resolution change, for few frames GST_VIDEO_INFO_WIDTH (info) and GST_VIDEO_INFO_HEIGHT (info) report the old resolution while ffmpegdec->picture->width and ffmpegdec->picture->height have the right values, can we use these values at least if we are sending wrong parameters to av_picture_copy?
I don't know, you have the testcase not me :) So you're saying that picture->width/height are correct but picture->linesize is wrong for those? Or is the problem that the resolution has changed and the element does not know about that yet and still wants to allocate something for the old resolution?
(In reply to Sebastian Dröge (slomo) from comment #4) > I don't know, you have the testcase not me :) sorry the file is not sharable :-( > > So you're saying that picture->width/height are correct but > picture->linesize is wrong for those? Or is the problem that the resolution > has changed and the element does not know about that yet and still wants to > allocate something for the old resolution? outpic = (AVPicture *) ffmpegdec->picture; has correct values for all properties, while AVPicture pic created by gstreamer based on the values from gst_video_frame_map allocate based on the old resolution for few frames
Ok, so we allocate frames for the wrong resolution because we don't know yet that it actually changed? That should be fixed then somehow...
yes, seems so: outpic here: http://cgit.freedesktop.org/gstreamer/gst-libav/tree/ext/libav/gstavviddec.c?id=0fc0584210d4bc50233865e67f4debdb87c8666f#n1159 is good pic, populated few lines above, use the old values and so av_picture_copy crash
IIRC outpic is the one we allocated and want to output currently. av_picture_copy() should probably detect that we ask impossible things here. Do you know already why we didn't reconfigure yet with the new resolution? Maybe we should trigger reconfiguration at that point there when we notice changes, and reallocate the output buffer. Maybe http://cgit.freedesktop.org/gstreamer/gst-libav/tree/ext/libav/gstavviddec.c?id=0fc0584210d4bc50233865e67f4debdb87c8666f#n1319 should be called also when other things than interlacing change. Can you test that?
Created attachment 305286 [details] [review] resolution change fix this should be the correct patch, the patch reset ffmpegdec->context->width and ffmpegdec->context->height when needed, after that call gst_ffmpegviddec_negotiate that call update_video_context with force setted to TRUE that set ffmpegdec->ctx_width and ffmpegdec->ctx_height from context
So only the picture is changing here, nothing in the context? Because if the context also has the correct values, you can just call gst_ffmpegviddec_negotiate() all the time. It won't do anything unless something changed (or force==TRUE). Also we should probably check the picture in http://cgit.freedesktop.org/gstreamer/gst-libav/tree/ext/libav/gstavviddec.c?id=0fc0584210d4bc50233865e67f4debdb87c8666f#n596 too then, not just the context.
Also I think changing the values inside the context is not ok, those should only be modified by libav.
If I change the log line to: GST_WARNING ("Change in resolution ! picture: %dx%d, recorded: %dx%d, context: %dx%d", ffmpegdec->picture->width, ffmpegdec->picture->height, ffmpegdec->context->width, ffmpegdec->context->height,ffmpegdec->ctx_width, ffmpegdec->ctx_height); I have these logs: 0:00:01.995888065 8697 0x7fee140734f0 INFO libav :0:: Reinit context to 704x576, pix_fmt: 0 0:00:02.003345875 8697 0x204e990 WARN libav gstavviddec.c:1337:gst_ffmpegviddec_video_frame: Change in resolution ! picture: 352x288, recorded: 704x576, context: 704x576 0:00:02.005423979 8697 0x204e990 WARN libav gstavviddec.c:1337:gst_ffmpegviddec_video_frame: Change in resolution ! picture: 352x288, recorded: 704x576, context: 704x576 0:00:02.006906592 8697 0x204e990 WARN libav gstavviddec.c:1337:gst_ffmpegviddec_video_frame: Change in resolution ! picture: 352x288, recorded: 704x576, context: 704x576 0:00:02.120064807 8697 0x7fee08072670 INFO libav :0:: Reinit context to 352x288, pix_fmt: 0 so the picture has a resolution different from the one on the context, seems that after libav reinit the context with the new resolution, gst send some frames with old resolution. max-threads=1 make no difference. Am I reading these logs correctly?
Created attachment 305306 [details] [review] drop wrong frames this should be the right way to drop these wrong frames. I see no better solution for now. In vlc I see lost frames too playing the same file
How can there be a resolution change without h264parse telling the decoder ?
Review of attachment 305306 [details] [review]: Dropping frames if the stream is valid does not make sense. If the stream is not valid, then maybe, but without providing us a test case, we can't validate your observation, the bug could still be elsewhere.
Created attachment 305329 [details] [review] renegotiate when needed here is a file that reproduce part of the problem: http://195.250.34.59/temp/750865.mkv if you use this pipeline: filesrc ... ! matroskademux ! h264parse (you can omit) ! avdec_h264 max-threads=0 ! xvimagesink sync=false you'll get a crash, this time av context has correct values while gstreamer not, this will work with max-threads=1, attached patch fix the problem. I'll try to produce a file that reproduce the initial issue without sensitive content too (with my other test file av context resolution does not match the picture one)
Is only the last patch needed, or both? They are almost the same, except that with the first patch you also set the context fields. The last one makes sense to me, the first one: setting context fields seems like working around libav bugs.
(In reply to Sebastian Dröge (slomo) from comment #17) > Is only the last patch needed, or both? They are almost the same, except > that with the first patch you also set the context fields. > > The last one makes sense to me, the first one: setting context fields seems > like working around libav bugs. for the sample file only the last one is nedeed for other files I have locally is needed the first one too, I'll try to produce a test file without sensible content that need the first patch too
for now I'm not able to find another file that need the first patch, about the second one: I have a 1920x1080 file, so height is not divisible bt 16, on windows only I have context height different from picture height for every frame (1080 vs 1088), so comparing the heights must consider this or a renegotiation is triggered for every frame. GStreamer has something builtin for properly compare heights or have I to add this check?
All this is very suspicious. Note that the patch (with the context hack) seems to make sense to me if not racy. Though, we are using parsers to effectively avoid this situation. To me the bug is that the resolution change is unnoticed by matroskademux and h264parse. If that file is invalid, it could be explained by matroskademux fooling h264parse. If the stream is valid, then I suspect a bug in the demuxer. If the demuxer don't know, there should be a way to ensure h264parse will catch it up. In GStreamer we rarely rely on decoder to tell us when the resolution have changed (because it's often racy with threading).
Nicolas, I don't know if this add some more infos, however if you do something like this with the provided file: filesrc location= /tmp/750865.mkv ! matroskademux ! gdppay ! filesink location=/tmp/test.gdp and then (with or without the parser) filesrc location= /tmp/test.gdp ! gdpdepay ! h264parse ! avdec_h264 ! xvimagesink sync=false you'll reproduce the crash without matroskademux too. I suspect there is a race condition too and maybe is related to this commit https://bugzilla.gnome.org/show_bug.cgi?id=726020 however I have no time to check now. I'll try to find more time this weekend, for now in my branch I use the latest two patches without the check for heights that cause problem with some files on windows. This way we are able to handle all the types of files on both linux and windows
Created attachment 305693 [details] [review] proposed fix the problem can be reproduced with the given file enabling multithread decoding. The roundup to 16 is to avoid to renegotiate for different height such as 1080 vs 1088
Created attachment 305694 [details] [review] workaround this is a workaround for a bug that seems to be in libav. Probably we should try to remove the deprecated libav api get_buffer/reget_buffer and replace them with get_buffer2 that is not deprecated and try if the bug go away
using the file here: http://195.250.34.59/temp/750865_1.mkv with this pipeline: filesrc ! matroskademux ! avdec_h264 max-threads=1 ! xvimagesink sync=false you can reproduce the bug solved by the workaround patch. It does not happen always and is really difficult to reproduce if you replace xvimagesink with fakesink
Did you report this to libav people?
Review of attachment 305693 [details] [review]: ::: ext/libav/gstavviddec.c @@ +1323,3 @@ + if (G_UNLIKELY (ffmpegdec->picture->width != ffmpegdec->ctx_width + || GST_ROUND_UP_16 (ffmpegdec->picture->height) != + GST_ROUND_UP_16 (ffmpegdec->ctx_height))) { How does it end up being 1080 vs 1088? And in which order, is the picture or the context information bigger? It doesn't make much sense in any case, libav should tell us somewhere how many lines have to be cropped in the end... and we should use that information instead of arbitrarily rounding here.
(In reply to Sebastian Dröge (slomo) from comment #25) > Did you report this to libav people? the problem is that avplay does not crash and handle the size change without problems
(In reply to Sebastian Dröge (slomo) from comment #26) > Review of attachment 305693 [details] [review] [review]: > > ::: ext/libav/gstavviddec.c > @@ +1323,3 @@ > + if (G_UNLIKELY (ffmpegdec->picture->width != ffmpegdec->ctx_width > + || GST_ROUND_UP_16 (ffmpegdec->picture->height) != > + GST_ROUND_UP_16 (ffmpegdec->ctx_height))) { > > How does it end up being 1080 vs 1088? And in which order, is the picture or > the context information bigger? I don't remember exactly (probably the picture), I'll retest tommorrow to be sure > > It doesn't make much sense in any case, libav should tell us somewhere how > many lines have to be cropped in the end... and we should use that > information instead of arbitrarily rounding here. It happens only on windows, on linux the reported sizes are consistent
(In reply to Sebastian Dröge (slomo) from comment #26) > Review of attachment 305693 [details] [review] [review]: > > ::: ext/libav/gstavviddec.c > @@ +1323,3 @@ > + if (G_UNLIKELY (ffmpegdec->picture->width != ffmpegdec->ctx_width > + || GST_ROUND_UP_16 (ffmpegdec->picture->height) != > + GST_ROUND_UP_16 (ffmpegdec->ctx_height))) { > > How does it end up being 1080 vs 1088? And in which order, is the picture or > the context information bigger? picture: 1920x1088, context size: 1920x1080, recorded size: 1920x1080 so is the picture that reports 1088. As noted this happen on windows but non on linux > > It doesn't make much sense in any case, libav should tell us somewhere how > many lines have to be cropped in the end... and we should use that > information instead of arbitrarily rounding here.
This problem also happens with this stream (on Linux!): http://dash.edgesuite.net/dash264/TestCasesHEVC/2b/2/BBB_Live_HEVC_MultiRes.mpd It switches between 1920x1080 and 1280x720 all the time.
So what happens here is that for some reason parser or whatever is not updating the caps when the size changes, plus libav detects the size change and requests renegotiation in get_buffer() while some old sized buffers are still pending for a while. The renegotiation in get_buffer() is a bit troubling, it will always be wrong (chances are slim that we only get buffers with the new size as decoded output) but we have to do it to allocate proper sized buffers.
(In reply to Sebastian Dröge (slomo) from comment #31) > So what happens here is that for some reason parser or whatever is not > updating the caps when the size changes, plus libav detects the size change > and requests renegotiation in get_buffer() while some old sized buffers are > still pending for a while. > > The renegotiation in get_buffer() is a bit troubling, it will always be > wrong (chances are slim that we only get buffers with the new size as > decoded output) but we have to do it to allocate proper sized buffers. yes the reviewed patch should handle this, however we need to decide how to handle the size mismatch between picture and context that can trigger a renegotiation for each buffer (for example the case 1080 vs 1088 that is not a real size change)
Yeah that size mismatch is nothing we can handle properly, I have an idea for another fix. Testing now
FWIW, the coded_width/height in the context can be used for comparing with the picture.
(In reply to Sebastian Dröge (slomo) from comment #34) > FWIW, the coded_width/height in the context can be used for comparing with > the picture. yes, my workaround patch (has a typo in the heigth check) check for this, but I'll wait for your fix, will be better than mine for sure :-)
When using coded_width/height, the ROUND_UP_16 should be unnecessary. Is it still necessary for you?
if you check ffmpegdec->ctx_width with ffmpegdec->context->width and renegotiate if different the roundup is not needed, it should fix the multithread decoding for the first linked file and should continue to crash with the second one
And picture->width is always the same as context->coded_width? No rounding needed there?
I mean it's the same unless it is really different by more than rounding :)
(In reply to Sebastian Dröge (slomo) from comment #39) > I mean it's the same unless it is really different by more than rounding :) yes, as said in comment 21, we are using a patch that check only for width in gst-libav distributed with our application
OK so it seems that we can't trust the context values at all that we get in get_buffer() and should always look at the frame only. That was probably added together with the addition of get_buffer2(). Looking into porting that...
commit 8cb8461f7e3c5ad0162e7caaca120f6b6577489a Author: Sebastian Dröge <sebastian@centricular.com> Date: Fri Jun 26 16:50:16 2015 +0200 avviddec: Wrap the original AVBufferRef in our own buffer for the destroy function Just adding a dummy buffer at the very end might not be enough as there already might be too many buffers. commit 9b2c0c2dbcda3cd08aee98bc4bfdaabb1db11c63 Author: Sebastian Dröge <sebastian@centricular.com> Date: Fri Jun 26 15:38:38 2015 +0200 avviddec: libav will already copy the reordered_opaque pointer for us If we do it ourselves, it might get the wrong value if our assumptions are broken by libav at a later time. commit 5b6c8ee8c96d773850cc213be00b5f43f60dbd55 Author: Sebastian Dröge <sebastian@centricular.com> Date: Fri Jun 26 15:34:30 2015 +0200 avviddec: Negotiate based on the AVFrame information, not the context information The context contains the information from the latest input frame, we're however interested in the information from the latest output frame. As we have to negotiate for the buffer that is about to come next. This should fix some crashes that happened when both information got out of sync. If that happens now, we will do fallback allocation until the output is renegotiated too. https://bugzilla.gnome.org/show_bug.cgi?id=750865
works fine, thanks!
Also ported everything to get_buffer2() now, got rid of some deprecated API usage and did the same for the audio decoder.
great work Sebastian!