After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 750865 - avviddec: Crashing if resolution changes without corresponding input caps changes
avviddec: Crashing if resolution changes without corresponding input caps cha...
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-libav
git master
Other Linux
: Normal critical
: 1.5.90
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2015-06-12 15:51 UTC by Nicola
Modified: 2015-08-16 13:37 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
tentative patch (1.17 KB, patch)
2015-06-12 15:51 UTC, Nicola
none Details | Review
resolution change fix (1.24 KB, patch)
2015-06-15 12:23 UTC, Nicola
none Details | Review
drop wrong frames (1.51 KB, patch)
2015-06-15 14:13 UTC, Nicola
none Details | Review
renegotiate when needed (1.35 KB, patch)
2015-06-15 17:23 UTC, Nicola
none Details | Review
proposed fix (1.86 KB, patch)
2015-06-19 12:25 UTC, Nicola
rejected Details | Review
workaround (1.52 KB, patch)
2015-06-19 12:28 UTC, Nicola
rejected Details | Review

Description Nicola 2015-06-12 15:51:11 UTC
Created attachment 305160 [details] [review]
tentative patch

I have a file that I cannot share that cause a libav segfault, with the attached patch now print this logs instead of crash:

WARN                   libav gstavviddec.c:1167:get_output_buffer:<avdec_h264-0> avoid to call av_picture_copy with wrong parameters: bytewidth 704 src linesize 352
Comment 1 Nicola 2015-06-13 16:57:46 UTC
I forgot to attach the trace

  • #0 __memcpy_avx_unaligned
    from /usr/lib/libc.so.6
  • #1 av_image_copy_plane
    at libavutil/imgutils.c line 261
  • #2 av_image_copy
    at libavutil/imgutils.c line 295
  • #3 av_picture_copy
    at libavcodec/avpicture.c line 122
  • #4 get_output_buffer
    at gstavviddec.c line 1161
  • #5 gst_ffmpegviddec_video_frame
    at gstavviddec.c line 1317
  • #6 gst_ffmpegviddec_frame
    at gstavviddec.c line 1430
  • #7 gst_ffmpegviddec_handle_frame
    at gstavviddec.c line 1540
  • #8 gst_video_decoder_decode_frame
    at gstvideodecoder.c line 3233
  • #9 gst_video_decoder_chain_forward
    at gstvideodecoder.c line 2094
  • #10 gst_video_decoder_chain
    at gstvideodecoder.c line 2396
  • #11 gst_pad_chain_data_unchecked
    at gstpad.c line 4038
  • #12 gst_pad_push_data
    at gstpad.c line 4271
  • #13 gst_pad_push
    at gstpad.c line 4383

Comment 2 Sebastian Dröge (slomo) 2015-06-15 08:07:02 UTC
Comment on attachment 305160 [details] [review]
tentative patch

This should arguably error out completely instead of just doing nothing.


Also linesize is an array and contains the stride for each plane, not just a plain integer. So you probably want to compare GST_VIDEO_FRAME_PLACE_STRIDE(frame, X) and outpic->linesize[X] for all planes


Any idea why it gives us wrong values here? Do you have a way to reproduce it, does it only happen with that single file, which codec and other properties does it have?
Comment 3 Nicola 2015-06-15 08:18:20 UTC
(In reply to Sebastian Dröge (slomo) from comment #2)
> Comment on attachment 305160 [details] [review] [review]
> tentative patch
> 
> This should arguably error out completely instead of just doing nothing.
> 

vlc,mplayer,avplay does not error out and play the whole file

> 
> Also linesize is an array and contains the stride for each plane, not just a
> plain integer. So you probably want to compare
> GST_VIDEO_FRAME_PLACE_STRIDE(frame, X) and outpic->linesize[X] for all planes
> 
> 
> Any idea why it gives us wrong values here? Do you have a way to reproduce
> it, does it only happen with that single file, which codec and other
> properties does it have?

probably because there is a resolution change, for few frames GST_VIDEO_INFO_WIDTH (info) and GST_VIDEO_INFO_HEIGHT (info) report the old resolution while ffmpegdec->picture->width and ffmpegdec->picture->height have the right values, can we use these values at least if we are sending wrong parameters to av_picture_copy?
Comment 4 Sebastian Dröge (slomo) 2015-06-15 08:20:29 UTC
I don't know, you have the testcase not me :)

So you're saying that picture->width/height are correct but picture->linesize is wrong for those? Or is the problem that the resolution has changed and the element does not know about that yet and still wants to allocate something for the old resolution?
Comment 5 Nicola 2015-06-15 08:30:38 UTC
(In reply to Sebastian Dröge (slomo) from comment #4)
> I don't know, you have the testcase not me :)

sorry the file is not sharable :-(

> 
> So you're saying that picture->width/height are correct but
> picture->linesize is wrong for those? Or is the problem that the resolution
> has changed and the element does not know about that yet and still wants to
> allocate something for the old resolution?


outpic = (AVPicture *) ffmpegdec->picture;

has correct values for all properties, 

while 

AVPicture pic 

created by gstreamer based on the values from gst_video_frame_map allocate based on the old resolution for few frames
Comment 6 Sebastian Dröge (slomo) 2015-06-15 08:35:02 UTC
Ok, so we allocate frames for the wrong resolution because we don't know yet that it actually changed? That should be fixed then somehow...
Comment 7 Nicola 2015-06-15 10:09:43 UTC
yes, seems so:

outpic here:

http://cgit.freedesktop.org/gstreamer/gst-libav/tree/ext/libav/gstavviddec.c?id=0fc0584210d4bc50233865e67f4debdb87c8666f#n1159

is good

pic, populated few lines above, use the old values and so av_picture_copy crash
Comment 8 Sebastian Dröge (slomo) 2015-06-15 10:27:39 UTC
IIRC outpic is the one we allocated and want to output currently. av_picture_copy() should probably detect that we ask impossible things here.


Do you know already why we didn't reconfigure yet with the new resolution? Maybe we should trigger reconfiguration at that point there when we notice changes, and reallocate the output buffer.


Maybe http://cgit.freedesktop.org/gstreamer/gst-libav/tree/ext/libav/gstavviddec.c?id=0fc0584210d4bc50233865e67f4debdb87c8666f#n1319 should be called also when other things than interlacing change. Can you test that?
Comment 9 Nicola 2015-06-15 12:23:56 UTC
Created attachment 305286 [details] [review]
resolution change fix

this should be the correct patch, the patch reset

ffmpegdec->context->width and ffmpegdec->context->height when needed, 

after that call 

gst_ffmpegviddec_negotiate 

that call 

update_video_context with force setted to TRUE

that set ffmpegdec->ctx_width and ffmpegdec->ctx_height from context
Comment 10 Sebastian Dröge (slomo) 2015-06-15 12:41:41 UTC
So only the picture is changing here, nothing in the context? Because if the context also has the correct values, you can just call gst_ffmpegviddec_negotiate() all the time. It won't do anything unless something changed (or force==TRUE).


Also we should probably check the picture in http://cgit.freedesktop.org/gstreamer/gst-libav/tree/ext/libav/gstavviddec.c?id=0fc0584210d4bc50233865e67f4debdb87c8666f#n596 too then, not just the context.
Comment 11 Sebastian Dröge (slomo) 2015-06-15 12:42:09 UTC
Also I think changing the values inside the context is not ok, those should only be modified by libav.
Comment 12 Nicola 2015-06-15 13:06:08 UTC
If I change the log line to:

GST_WARNING ("Change in resolution ! picture: %dx%d, recorded: %dx%d, context: %dx%d",
        ffmpegdec->picture->width, ffmpegdec->picture->height,
        ffmpegdec->context->width, ffmpegdec->context->height,ffmpegdec->ctx_width,
        ffmpegdec->ctx_height);

I have these logs:

0:00:01.995888065  8697 0x7fee140734f0 INFO                   libav :0:: Reinit context to 704x576, pix_fmt: 0
0:00:02.003345875  8697      0x204e990 WARN                   libav gstavviddec.c:1337:gst_ffmpegviddec_video_frame: Change in resolution ! picture: 352x288, recorded: 704x576, context: 704x576
0:00:02.005423979  8697      0x204e990 WARN                   libav gstavviddec.c:1337:gst_ffmpegviddec_video_frame: Change in resolution ! picture: 352x288, recorded: 704x576, context: 704x576
0:00:02.006906592  8697      0x204e990 WARN                   libav gstavviddec.c:1337:gst_ffmpegviddec_video_frame: Change in resolution ! picture: 352x288, recorded: 704x576, context: 704x576
0:00:02.120064807  8697 0x7fee08072670 INFO                   libav :0:: Reinit context to 352x288, pix_fmt: 0

so the picture has a resolution different from the one on the context, seems that after libav reinit the context with the new resolution, gst send some frames with old resolution. max-threads=1 make no difference. Am I reading these logs correctly?
Comment 13 Nicola 2015-06-15 14:13:46 UTC
Created attachment 305306 [details] [review]
drop wrong frames

this should be the right way to drop these wrong frames. I see no better solution for now. In vlc I see lost frames too playing the same file
Comment 14 Nicolas Dufresne (ndufresne) 2015-06-15 14:32:40 UTC
How can there be a resolution change without h264parse telling the decoder ?
Comment 15 Nicolas Dufresne (ndufresne) 2015-06-15 14:36:25 UTC
Review of attachment 305306 [details] [review]:

Dropping frames if the stream is valid does not make sense. If the stream is not valid, then maybe, but without providing us a test case, we can't validate your observation, the bug could still be elsewhere.
Comment 16 Nicola 2015-06-15 17:23:08 UTC
Created attachment 305329 [details] [review]
renegotiate when needed

here is a file that reproduce part of the problem:

http://195.250.34.59/temp/750865.mkv

if you use this pipeline:

filesrc ... ! matroskademux ! h264parse (you can omit) ! avdec_h264 max-threads=0 ! xvimagesink sync=false

you'll get a crash, this time av context has correct values while gstreamer not, this will work with max-threads=1, attached patch fix the problem.

I'll try to produce a file that reproduce the initial issue without sensitive content too (with my other test file av context resolution does not match the picture one)
Comment 17 Sebastian Dröge (slomo) 2015-06-15 17:56:22 UTC
Is only the last patch needed, or both? They are almost the same, except that with the first patch you also set the context fields.

The last one makes sense to me, the first one: setting context fields seems like working around libav bugs.
Comment 18 Nicola 2015-06-15 20:12:07 UTC
(In reply to Sebastian Dröge (slomo) from comment #17)
> Is only the last patch needed, or both? They are almost the same, except
> that with the first patch you also set the context fields.
> 
> The last one makes sense to me, the first one: setting context fields seems
> like working around libav bugs.

for the sample file only the last one is nedeed for other files I have locally is needed the first one too, I'll try to produce a test file without sensible content that need the first patch too
Comment 19 Nicola 2015-06-16 14:17:27 UTC
for now I'm not able to find another file that need the first patch, about the second one:

I have a 1920x1080 file, so height is not divisible bt 16, on windows only I have context height different from picture height for every frame (1080 vs 1088), so comparing the heights must consider this or a renegotiation is triggered for every frame. GStreamer has something builtin for properly compare heights or have I to add this check?
Comment 20 Nicolas Dufresne (ndufresne) 2015-06-16 14:46:50 UTC
All this is very suspicious. Note that the patch (with the context hack) seems to make sense to me if not racy. Though, we are using parsers to effectively avoid this situation. To me the bug is that the resolution change is unnoticed by matroskademux and h264parse. If that file is invalid, it could be explained by matroskademux fooling h264parse. If the stream is valid, then I suspect a bug in the demuxer. If the demuxer don't know, there should be a way to ensure h264parse will catch it up. In GStreamer we rarely rely on decoder to tell us when the resolution have changed (because it's often racy with threading).
Comment 21 Nicola 2015-06-16 15:01:36 UTC
Nicolas, I don't know if this add some more infos, however if you do something like this with the provided file:

filesrc location= /tmp/750865.mkv ! matroskademux ! gdppay ! filesink location=/tmp/test.gdp

and then (with or without the parser)

filesrc location= /tmp/test.gdp ! gdpdepay ! h264parse ! avdec_h264 ! xvimagesink sync=false

you'll reproduce the crash without matroskademux too.

I suspect there is a race condition too and maybe is related to this commit

https://bugzilla.gnome.org/show_bug.cgi?id=726020

however I have no time to check now. I'll try to find more time this weekend, for now in my branch I use the latest two patches without the check for heights that cause problem with some files on windows. This way we are able to handle all the types of files on both linux and windows
Comment 22 Nicola 2015-06-19 12:25:55 UTC
Created attachment 305693 [details] [review]
proposed fix

the problem can be reproduced with the given file enabling multithread decoding.

The roundup to 16 is to avoid to renegotiate for different height such as 1080 vs 1088
Comment 23 Nicola 2015-06-19 12:28:32 UTC
Created attachment 305694 [details] [review]
workaround

this is a workaround for a bug that seems to be in libav.

Probably we should try to remove the deprecated libav api get_buffer/reget_buffer and replace them with get_buffer2 that is not deprecated and try if the bug go away
Comment 24 Nicola 2015-06-19 13:16:35 UTC
using the file here:

http://195.250.34.59/temp/750865_1.mkv

with this pipeline:

filesrc ! matroskademux ! avdec_h264 max-threads=1 ! xvimagesink sync=false

you can reproduce the bug solved by the workaround patch. It does not happen always and is really difficult to reproduce if you replace xvimagesink with fakesink
Comment 25 Sebastian Dröge (slomo) 2015-06-22 16:39:39 UTC
Did you report this to libav people?
Comment 26 Sebastian Dröge (slomo) 2015-06-22 16:45:55 UTC
Review of attachment 305693 [details] [review]:

::: ext/libav/gstavviddec.c
@@ +1323,3 @@
+  if (G_UNLIKELY (ffmpegdec->picture->width != ffmpegdec->ctx_width
+          || GST_ROUND_UP_16 (ffmpegdec->picture->height) !=
+          GST_ROUND_UP_16 (ffmpegdec->ctx_height))) {

How does it end up being 1080 vs 1088? And in which order, is the picture or the context information bigger?

It doesn't make much sense in any case, libav should tell us somewhere how many lines have to be cropped in the end... and we should use that information instead of arbitrarily rounding here.
Comment 27 Nicola 2015-06-22 19:48:33 UTC
(In reply to Sebastian Dröge (slomo) from comment #25)
> Did you report this to libav people?

the problem is that avplay does not crash and handle the size change without problems
Comment 28 Nicola 2015-06-22 19:50:21 UTC
(In reply to Sebastian Dröge (slomo) from comment #26)
> Review of attachment 305693 [details] [review] [review]:
> 
> ::: ext/libav/gstavviddec.c
> @@ +1323,3 @@
> +  if (G_UNLIKELY (ffmpegdec->picture->width != ffmpegdec->ctx_width
> +          || GST_ROUND_UP_16 (ffmpegdec->picture->height) !=
> +          GST_ROUND_UP_16 (ffmpegdec->ctx_height))) {
> 
> How does it end up being 1080 vs 1088? And in which order, is the picture or
> the context information bigger?

I don't remember exactly (probably the picture), I'll retest tommorrow to be sure

> 
> It doesn't make much sense in any case, libav should tell us somewhere how
> many lines have to be cropped in the end... and we should use that
> information instead of arbitrarily rounding here.

It happens only on windows, on linux the reported sizes are consistent
Comment 29 Nicola 2015-06-23 09:11:10 UTC
(In reply to Sebastian Dröge (slomo) from comment #26)
> Review of attachment 305693 [details] [review] [review]:
> 
> ::: ext/libav/gstavviddec.c
> @@ +1323,3 @@
> +  if (G_UNLIKELY (ffmpegdec->picture->width != ffmpegdec->ctx_width
> +          || GST_ROUND_UP_16 (ffmpegdec->picture->height) !=
> +          GST_ROUND_UP_16 (ffmpegdec->ctx_height))) {
> 
> How does it end up being 1080 vs 1088? And in which order, is the picture or
> the context information bigger?

picture: 1920x1088, context size: 1920x1080, recorded size: 1920x1080

so is the picture that reports 1088. As noted this happen on windows but non on linux

> 
> It doesn't make much sense in any case, libav should tell us somewhere how
> many lines have to be cropped in the end... and we should use that
> information instead of arbitrarily rounding here.
Comment 30 Sebastian Dröge (slomo) 2015-06-23 09:50:11 UTC
This problem also happens with this stream (on Linux!): http://dash.edgesuite.net/dash264/TestCasesHEVC/2b/2/BBB_Live_HEVC_MultiRes.mpd

It switches between 1920x1080 and 1280x720 all the time.
Comment 31 Sebastian Dröge (slomo) 2015-06-24 10:38:25 UTC
So what happens here is that for some reason parser or whatever is not updating the caps when the size changes, plus libav detects the size change and requests renegotiation in get_buffer() while some old sized buffers are still pending for a while.

The renegotiation in get_buffer() is a bit troubling, it will always be wrong (chances are slim that we only get buffers with the new size as decoded output) but we have to do it to allocate proper sized buffers.
Comment 32 Nicola 2015-06-24 10:47:46 UTC
(In reply to Sebastian Dröge (slomo) from comment #31)
> So what happens here is that for some reason parser or whatever is not
> updating the caps when the size changes, plus libav detects the size change
> and requests renegotiation in get_buffer() while some old sized buffers are
> still pending for a while.
> 
> The renegotiation in get_buffer() is a bit troubling, it will always be
> wrong (chances are slim that we only get buffers with the new size as
> decoded output) but we have to do it to allocate proper sized buffers.

yes the reviewed patch should handle this, however we need to decide how to handle the size mismatch between picture and context that can trigger a renegotiation for each buffer (for example the case 1080 vs 1088 that is not a real size change)
Comment 33 Sebastian Dröge (slomo) 2015-06-24 10:49:50 UTC
Yeah that size mismatch is nothing we can handle properly, I have an idea for another fix. Testing now
Comment 34 Sebastian Dröge (slomo) 2015-06-24 10:57:04 UTC
FWIW, the coded_width/height in the context can be used for comparing with the picture.
Comment 35 Nicola 2015-06-24 11:00:20 UTC
(In reply to Sebastian Dröge (slomo) from comment #34)
> FWIW, the coded_width/height in the context can be used for comparing with
> the picture.

yes, my workaround patch (has a typo in the heigth check) check for this, but I'll wait for your fix, will be better than mine for sure :-)
Comment 36 Sebastian Dröge (slomo) 2015-06-24 11:01:25 UTC
When using coded_width/height, the ROUND_UP_16 should be unnecessary. Is it still necessary for you?
Comment 37 Nicola 2015-06-24 11:20:33 UTC
if you check ffmpegdec->ctx_width with ffmpegdec->context->width and renegotiate if different the roundup is not needed, it should fix the multithread decoding for the first linked file and should continue to crash with the second one
Comment 38 Sebastian Dröge (slomo) 2015-06-24 11:22:32 UTC
And picture->width is always the same as context->coded_width? No rounding needed there?
Comment 39 Sebastian Dröge (slomo) 2015-06-24 11:23:04 UTC
I mean it's the same unless it is really different by more than rounding :)
Comment 40 Nicola 2015-06-24 11:26:29 UTC
(In reply to Sebastian Dröge (slomo) from comment #39)
> I mean it's the same unless it is really different by more than rounding :)

yes, as said in comment 21, we are using a patch that check only for width in gst-libav distributed with our application
Comment 41 Sebastian Dröge (slomo) 2015-06-24 15:29:13 UTC
OK so it seems that we can't trust the context values at all that we get in get_buffer() and should always look at the frame only. That was probably added together with the addition of get_buffer2(). Looking into porting that...
Comment 42 Sebastian Dröge (slomo) 2015-06-26 14:59:42 UTC
commit 8cb8461f7e3c5ad0162e7caaca120f6b6577489a
Author: Sebastian Dröge <sebastian@centricular.com>
Date:   Fri Jun 26 16:50:16 2015 +0200

    avviddec: Wrap the original AVBufferRef in our own buffer for the destroy function
    
    Just adding a dummy buffer at the very end might not be enough as there
    already might be too many buffers.

commit 9b2c0c2dbcda3cd08aee98bc4bfdaabb1db11c63
Author: Sebastian Dröge <sebastian@centricular.com>
Date:   Fri Jun 26 15:38:38 2015 +0200

    avviddec: libav will already copy the reordered_opaque pointer for us
    
    If we do it ourselves, it might get the wrong value if our assumptions are
    broken by libav at a later time.

commit 5b6c8ee8c96d773850cc213be00b5f43f60dbd55
Author: Sebastian Dröge <sebastian@centricular.com>
Date:   Fri Jun 26 15:34:30 2015 +0200

    avviddec: Negotiate based on the AVFrame information, not the context information
    
    The context contains the information from the latest input frame, we're
    however interested in the information from the latest output frame. As we have
    to negotiate for the buffer that is about to come next.
    
    This should fix some crashes that happened when both information got out of
    sync. If that happens now, we will do fallback allocation until the output
    is renegotiated too.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=750865
Comment 43 Nicola 2015-06-26 15:20:36 UTC
works fine, thanks!
Comment 44 Sebastian Dröge (slomo) 2015-06-28 10:47:29 UTC
Also ported everything to get_buffer2() now, got rid of some deprecated API usage and did the same for the audio decoder.
Comment 45 Nicola 2015-06-29 04:16:05 UTC
great work Sebastian!