GNOME Bugzilla – Bug 754120
avdec_hevc: Segfault in hevc decode with glimagesink and AVX2 CPU feature
Last modified: 2015-09-15 15:46:47 UTC
Created attachment 310033 [details] sample h265 stream gst-launch-1.0 filesrc location= SLIST_D_Sony_9.bin ! h265parse ! avdec_h265 ! glimagesink There are other samples too which are failing.
While comparing setup, Tim found that it crash in a routine using AVX2 extension. It makes a lot of sense that I can't reproduce as this routine is not called (I have avx, but not avx2 here). Looks like it's i7 specific crash.
The big question now: does it also crash when using ffplay (from the same ffmpeg version!) on this file?
Crashes here, but only when using glimagesink, not when using xvimagesink. Also not when using videoconvert to convert to e.g. RGBA and then glimagesink. So I would assume it's something special to glimagesink here, maybe alignment isn't properly handled. ffplay does not crash.
AVX2 enabled CPU owner will have to debug this, or provide me with more information as I don't own such a PC (sorry).
I forgot to mention, but with glimagesink, we have one buffer per plane, which is less forgiving then 1 allocation for all planes (like ffmpeg itself and most of our sink would do).
Created attachment 310157 [details] [review] avvidec: fix crash in AVX2-based H.265 decoder code path with ximagesink/glimagesink This fixes it for me.
Waiting for sree to confirm. I don't fully understand why this is required though, it might just be coincidence that this fixes it. I didn't find anything that says that 128-bit alignment is required for vinserti128. It crashes here: ff_hevc_transform_add32_8_avx2 () at libavcodec/x86/hevc_res_add.asm:186 186 TR_ADD_SSE_16_32_8 0, r0, r0+r2 which is defined as: %macro TR_ADD_SSE_16_32_8 3 mova xm2, [r1+%1 ] mova xm6, [r1+%1+16] %if cpuflag(avx2) vinserti128 m2, m2, [r1+%1+32], 1 vinserti128 m6, m6, [r1+%1+48], 1 %endif %if cpuflag(avx) psubw m1, m0, m2 psubw m5, m0, m6 %else mova m1, m0 mova m5, m0 psubw m1, m2 psubw m5, m6 %endif packuswb m2, m6 packuswb m1, m5 mova xm4, [r1+%1+mmsize*2 ] mova xm6, [r1+%1+mmsize*2+16] %if cpuflag(avx2) vinserti128 m4, m4, [r1+%1+96 ], 1 vinserti128 m6, m6, [r1+%1+112], 1 %endif %if cpuflag(avx) psubw m3, m0, m4 psubw m5, m0, m6 %else mova m3, m0 mova m5, m0 psubw m3, m4 psubw m5, m6 %endif packuswb m4, m6 packuswb m3, m5 paddusb m2, [%2] paddusb m4, [%3] psubusb m2, m1 psubusb m4, m3 mova [%2], m2 mova [%3], m4 %endmacro
I would expect an instruction that works on 128 bit data requires the data to be aligned to 128 bits :)
Even then, that's 16 bytes (what we currently use for alignment), not 32 bytes. Need to dig a bit more.
Tested, and it is working fine with vaapisink and glimagesink... I am not sure about the alignment issue , but what I can see is that, all Intel platforms >= IVB are using 128 bit alignment in vaapi-intel-driver..
Review of attachment 310157 [details] [review]: Looks fine to me. Arguably the problem should be reported to ffmpeg, as the align2 function should deal with that.
I'm not really convinced yet that this is a proper fix and not just something that makes it work by coincidence.
Did some more poking around. Does not seem to be threading related (setting max-threads=1 and adding locking to the function doesn't help. If I comment out the two 'mova' at the end of TR_ADD_SSE_16_32_8 it works: http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/x86/hevc_res_add.asm;hb=ed450d4acfc9fea0dae2df2b0a543ec0d602d31a#l132 Can't see any obvious alignment issues with the pointers passed to it, judging from printfs (and it doesn't even seem to require 16-byte alignment).
Any news here, new insights?
Not really. I am not sure if the alignment thing is not a red herring. This also seems to "fix" it for me: @@ -700,7 +700,7 @@ gst_ffmpegviddec_ensure_internal_pool (GstFFMpegVidDec * ffmpegdec, config = gst_buffer_pool_get_config (ffmpegdec->internal_pool); caps = gst_video_info_to_caps (&info); - gst_buffer_pool_config_set_params (config, caps, info.size, 2, 0); + gst_buffer_pool_config_set_params (config, caps, info.size, 10, 0); gst_buffer_pool_config_set_allocator (config, NULL, ¶ms); gst_buffer_pool_config_add_option (config, GST_BUFFER_POOL_OPTION_VIDEO_META); @@ -805,10 +805,8 @@ gst_ffmpegviddec_get_buffer2 (AVCodecContext * context, AVFrame * picture, if (c < GST_VIDEO_INFO_N_PLANES (&ffmpegdec->pool_info)) { picture->data[c] = GST_VIDEO_FRAME_PLANE_DATA (&dframe->vframe, c); picture->linesize[c] = GST_VIDEO_FRAME_PLANE_STRIDE (&dframe->vframe, c); - if (ffmpegdec->stride[c] == -1) ffmpegdec->stride[c] = picture->linesize[c]; - /* libav does not allow stride changes, decide allocation should check * before replacing the internal pool with a downstream pool. * https://bugzilla.gnome.org/show_bug.cgi?id=704769 @@ -1802,7 +1800,7 @@ gst_ffmpegviddec_decide_allocation (GstVideoDecoder * decoder, GstQuery * query) } config = gst_buffer_pool_get_config (pool); - gst_buffer_pool_config_set_params (config, state->caps, size, min, max); + gst_buffer_pool_config_set_params (config, state->caps, size, MAX (10, min), max); gst_buffer_pool_config_set_allocator (config, allocator, ¶ms); have_videometa = @@ -1873,7 +1871,7 @@ gst_ffmpegviddec_decide_allocation (GstVideoDecoder * decoder, GstQuery * query) gst_object_unref (pool); pool = gst_video_buffer_pool_new (); config = gst_buffer_pool_get_config (pool); - gst_buffer_pool_config_set_params (config, state->caps, size, min, max); + gst_buffer_pool_config_set_params (config, state->caps, size, MAX (10, min), max); gst_buffer_pool_config_set_allocator (config, NULL, ¶ms); gst_buffer_pool_set_config (pool, config); update_pool = TRUE; Not sure what to conclude from that :)
I'm really surprise of that one, he answer might be in the way the libc heap is implemented. Allocating them all together (instead of on-demand) might result in having consecutive memory where it simply don't crash if you read passed the buffer boundary. Other then that, that pool can create buffer as needed, so it should have no impact.
Or the GstVideoAlignment stuff in glimagesink is not 100% correct yet. IIRC Tim said that it only fails with glimagesink but not with xvimagesink?
(In reply to Sebastian Dröge (slomo) from comment #17) > Or the GstVideoAlignment stuff in glimagesink is not 100% correct yet. IIRC > Tim said that it only fails with glimagesink but not with xvimagesink? I was trying to say this can be explained by xvimagesink having 1 big buffer allocated, while glimagesink have N independent allocations. If you overshoot xvimagesink allocation, you will generally not crash, but with glimagesink, if you fall out of the last page, you are likely to crash. Some SIMD instructions can confuse valgrind, so I would not trust it here. Another fact, is that xvimagesink (and generic videopool) will add some extra padding lines between planes, glimagesink does not do that, there is no obvious reason to (upstream should be requesting that if needed).
Interesting observation here. It also crashes with the GstVideoAlignment used by videoconvert: gst-launch-1.0 filesrc location= SLIST_D_Sony_9.bin ! h265parse ! avdec_h265 ! videoconvert ! "video/x-raw,format=AYUV" ! glimagesink But not with the same and "videoconvert ! xvimagesink" as the sink. Not sure what to make out of that :)
gstavviddec.c:663:gst_ffmpegvideodec_prepare_dr_pool:<avdec_h265-0> aligned dimension 832x480 -> 864x512 padding t:16 l:16 r:16 b:17, stride_align 31:31:31:31 gstvideopool.c:181:video_buffer_pool_set_config: valign 31 31 31 31 gstvideopool.c:236:video_buffer_pool_alloc:<videobufferpool0> alloc 740160 align 15 ...
Created attachment 311188 [details] [review] avvideodec: ensure required mem alignment fixing avdec_h265 crashes with ximagesink/glimagesink Make sure the alignment requirement in GstAllocationParams matches the GstVideoAlignment requirements. This fixes issues with avdec_h265 crashing in the avx2 code path when used with playbin and ximagesink/glimagesink as videosink. The internal video pool would allocate buffers with an alignment of 15 even though GstVideoAlignment specified a stride_align requirement of 31 (which comes from ffmpeg).
Created attachment 311189 [details] [review] videopool: ensure allocation alignment is consistent with video alignment requirements Make sure GstAllocationParams alignment is not less than any alignment requirement specified via GstVideoAlignment.
All of these patches fix the issue for me (on their own).
Olivier suggested on IRC to move the loop that checks the alignments into a utility function. This means a caller would still have to do get_allocator() to get the AllocationParams and then call set_allocator() again if it was updated. It's the most transparent way of doing things though. Alternatively, one could also simply do all of this in gst_buffer_pool_config_set_video_alignment(). This means that the GstAllocationParams may change underneath the caller, which may or may not be a problem in practice. Needs to be documented I guess.
(In reply to Tim-Philipp Müller from comment #24) > Olivier suggested on IRC to move the loop that checks the alignments into a > utility function. This means a caller would still have to do get_allocator() > to get the AllocationParams and then call set_allocator() again if it was > updated. It's the most transparent way of doing things though. Is it going to be needed in many places (more than 2) that it makes sense to have an utility function for it? > Alternatively, one could also simply do all of this in > gst_buffer_pool_config_set_video_alignment(). This means that the > GstAllocationParams may change underneath the caller, which may or may not > be a problem in practice. Needs to be documented I guess. I think this should be handled explicitly by the caller of that API. The fallback in videopool is IMHO just to prevent stupid mistakes and get a GST_ERROR() out there while still doing something sensible that most likely does not crash :)
(In reply to Sebastian Dröge (slomo) from comment #25) > (In reply to Tim-Philipp Müller from comment #24) > > Olivier suggested on IRC to move the loop that checks the alignments into a > > utility function. This means a caller would still have to do get_allocator() > > to get the AllocationParams and then call set_allocator() again if it was > > updated. It's the most transparent way of doing things though. > > Is it going to be needed in many places (more than 2) that it makes sense to > have an utility function for it? And with that I mean, a utility function doesn't seem very useful for this. Code that uses the videoalignment API has to set the separate stride aligns anyway, it could as well just set the allocation parameters accordingly then. I don't think a utility function will make much of a difference here.
Interesting, I had this reflection that the stride alignment was incompletly applied. I think doing so in the pools is best. Note that set_video_alignment() can't really change the alloc params silently, as for most base class the parameters need to be updated in the query, otherwise they get reset. We are missing patches for the glpool, which is different. No need for a loop in this case, since we have N allocation, just pick the max between the parameters et de shocked stride align. It's also likely xnimagesink need to be checked.
Created attachment 311338 [details] [review] avvideodec: ensure required mem alignment fixing avdec_h265 crashes with ximagesink/glimagesink (v2) Slightly different approach: figure out max_alignment first, then make it consistent. This means that if the alignment requirement in the GstAllocationParams is higher than the ones in GstVideoAlignment, the stride alignments get updated accordingly too. Before it was just one way.
Created attachment 311340 [details] [review] videopool: ensure allocation alignment is consistent with video alignment requirements (v2) Different approach: figure out max alignment requirement first, then make sure both the GstAllocationParams alignment and the GstVideoAlignment stride requirements are consistent. That means the max will be configured everywhere, also if the params alignment is higher than the strides one. Before the fix-up was only one way, now it's both ways.
Created attachment 311347 [details] [review] gl: memory: take into account video stride alignment requirements This fixes it with glimagesink for me (without any the other patches). Not that we fix up mem->valign.stride_align[n] here, which seems the right thing to do but I'm not sure it's actually looked at for anything but the padding.
Review of attachment 311347 [details] [review]: Looks mostly ok to me. ::: gst-libs/gst/gl/gstglmemory.c @@ +646,3 @@ + /* update to match max alignment requirement */ + align_params.align = max_align; + mem->valign.stride_align[plane] = max_align; Why don't we update the entire structure?
Created attachment 311358 [details] [review] gl: bufferpool take into account video stride alignment requirements (v2) Updated patch: moved the checking/adjusting into glbufferpool, so that it's all done before gst_video_info_align() is called. Apparently calling it multiple times is troublesome. In glmemory we just do sanity checking now but don't adjust anything (maybe we should just drop this part?).
Review of attachment 311358 [details] [review]: Looks fine by me.
Review of attachment 311358 [details] [review]: Why don't gst_video_info_align() maximize the stride alignment then ? We could add a new method, to which we pass a allocation alignment, and get this helper to update everything at once no ?
Ok, pushed after further discussion on IRC. The patches may not be optimal, but they err in favour of being safe. We can improve things some more and add some utility functions after the release. There's bug #755068 to track this. commit 6fadf448de1cfe1b379b0fff8fa3c0f2eb6a4a79 Author: Tim-Philipp Müller <tim@centricular.com> Date: Fri Aug 28 09:38:53 2015 +0100 avvidec: increase default alignment to 32 bytes Change default alignment from 16 to 32 bytes, which fixes crashes when decoding H.265 using AVX2-based decoder code paths and when using ximagesink/glimagesink. https://bugzilla.gnome.org/show_bug.cgi?id=754120 commit a0ebef96372a61f8dc658995fb4f1480512b365b Author: Tim-Philipp Müller <tim@centricular.com> Date: Fri Sep 11 23:19:21 2015 +0100 avvideodec: ensure required mem alignment fixing avdec_h265 crashes with ximagesink/glimagesink Make sure the alignment requirement in GstAllocationParams matches the GstVideoAlignment requirements. This fixes issues with avdec_h265 crashing in the avx2 code path when used with playbin and ximagesink/glimagesink as videosink. The internal video pool would allocate buffers with an alignment of 15 even though GstVideoAlignment specified a stride_align requirement of 31 (which comes from ffmpeg). https://bugzilla.gnome.org/show_bug.cgi?id=754120 commit 8b96b52a62567d70ce827db00fd0f50f79133ee5 Author: Tim-Philipp Müller <tim@centricular.com> Date: Fri Sep 11 23:36:47 2015 +0100 videopool: ensure allocation alignment is consistent with video alignment requirements Make sure GstAllocationParams alignment is not less than any alignment requirement specified via GstVideoAlignment. https://bugzilla.gnome.org/show_bug.cgi?id=754120 commit fe8de1857a704af9ed6160dfdf8f93131e78ba19 Author: Tim-Philipp Müller <tim@centricular.com> Date: Tue Sep 15 11:34:12 2015 +0100 gl: bufferpool take into account video stride alignment requirements when allocating memory. Fixes crashes with avdec_h265 in the AVX2 code path which requires 32-byte stride alignment, but the GstAllocationParams only specified a 16-byte alignment. https://bugzilla.gnome.org/show_bug.cgi?id=754120