GNOME Bugzilla – Bug 790752
msdk: supports bufferpool
Last modified: 2019-07-21 02:32:47 UTC
Adds GstMsdkBufferPool, GstMsdkAllocator and GstMsdkMemory.
Created attachment 364259 [details] [review] msdk: supports bufferpool Adds initial support for GstMsdkMemory backed buffer pool. The memory object currently holds a pointer to mfxFrameSurface.
Created attachment 364260 [details] [review] msdkenc: use msdkbufferpool Adjust to use GstMsdkBufferPool. 1. Propose a GstMsdkBufferPool to upstream. 2. If upstream has accepted the proposed msdk bufferpool (eg. videotestsrc): This is the usual case that upstream could do map via GstMsdkMemory. 3. If upstream hasn't accepted the proposed msdk bufferpool (eg. filesrc, appsrc), 3.1. If upstream buffer is aligned to 32: Map directly from the pointer of user space to a surface. Doesn't create msdk bufferpool. 3.2. If upstream buffer is not aligned to 32: Creates new msdk bufferpool and acquire a buffer from the pool. Has to copy from upstream buffer to new buffer.
Created attachment 364261 [details] [review] msdk: remove unnecessary code anymore Since GstMsdkMemory landed, the logic of frame copy is not necessary any more.
Review of attachment 364259 [details] [review]: Ok, I've seen enough, I'd say just no, this need to be rewritten, it's really messy. The lazy allocation approach is just completly bad, moving the allocation hit at run-time rather then at preroll time. This is just a variant of a system memory allocator. You need to manage it so the GstVideoInfo matches what you want. As a side effect, combined with appropriate GstAllocationParams for the alignment, you'll be able to derive from the sysmem allocator, because it's the same thing. The hidden 32 pixel padding need to be reflected by you GstVideoInfo, hence your GstVideoMeta (this is not the case). ::: sys/msdk/gstmsdkbufferpool.c @@ +49,3 @@ +{ + GstAllocator *allocator; + guint options; Unused ? @@ +69,3 @@ + static const gchar *g_options[] = { + GST_BUFFER_POOL_OPTION_VIDEO_META, + GST_BUFFER_POOL_OPTION_VIDEO_ALIGNMENT, You should make this a GST_TYPE_VIDEO_BUFFER_POOL, and then drop this code. Though, you didn't really implement VIDEO_ALIGNMENT. @@ +109,3 @@ + GST_BUFFER_POOL_OPTION_VIDEO_META)) + msdk_pool->add_videometa = TRUE; + You are missing gst_video_info_align() call somewhere. You wonder if there is a way to get it from the VideoPool. @@ +171,3 @@ + + if (!(buf = gst_buffer_new ())) + goto no_buffer; GLib will abort if that fails, remove that check please. @@ +187,3 @@ + GST_VIDEO_INFO_WIDTH (info), GST_VIDEO_INFO_HEIGHT (info)); + vmeta->map = gst_video_meta_map_msdk_memory; + vmeta->unmap = gst_video_meta_unmap_msdk_memory; I strongly discourage using these two functions. Any reason why the map/unmap from the memory does not work ? Are your offsets wrong maybe ? @@ +211,3 @@ + GST_MSDK_BUFFER_POOL_GET_PRIVATE (pool); + + pool->priv = priv; Unused ? ::: sys/msdk/gstmsdkbufferpool.h @@ +66,3 @@ + MsdkContext *context; + mfxFrameInfo frame_info; + gboolean add_videometa; Why are those 3 public if you have a private structure ? ::: sys/msdk/gstmsdkmemory.c @@ +38,3 @@ +#define GST_CAT_DEFAULT gst_debug_msdkvideomemory + +#ifndef GST_VIDEO_INFO_FORMAT_STRING Why would it already be defined ? Namespace pollution caused by copy pasting ? @@ +48,3 @@ +/* ------------------------------------------------------------------------ */ +/* --- GstMsdkMemory --- */ +/* ------------------------------------------------------------------------ */ This is a bit useless as a comment. @@ +161,3 @@ + if (!mem->surface->Data.Locked) + return TRUE; + g_usleep (10); What !?! That's horribly wrong, how would design such a backend API like this ? @@ +195,3 @@ +unmap_msdk_memory (GstMsdkMemory * mem) +{ + g_return_if_fail (mem->refcount > 0); That's not atomic. @@ +198,3 @@ + + if (g_atomic_int_dec_and_test (&mem->refcount)) { + /* Release sth */ Why do you care testing, if you don't do anything here ? @@ +199,3 @@ + if (g_atomic_int_dec_and_test (&mem->refcount)) { + /* Release sth */ + } GstMemory already implements this correctly, respecting read/write rules, why do you add this layer in top ? @@ +284,3 @@ + gst_video_info_init (&mem->surface_info); + gst_video_info_set_format (&mem->surface_info, vip->finfo->format, + GST_ROUND_UP_32 (vip->width), GST_ROUND_UP_32 (vip->height)); This padding is completly random, you need the GstVideoInfo to reflect this.
Created attachment 364660 [details] [review] msdk: supports bufferpool Adds initial support for GstMsdkMemory backed buffer pool. The memory object currently holds a pointer to mfxFrameSurface.
Created attachment 364661 [details] [review] msdkenc: use msdkbufferpool Adjust to use GstMsdkBufferPool. 1. Propose a GstMsdkBufferPool to upstream. 2. If upstream has accepted the proposed msdk bufferpool (eg. videotestsrc): This is the usual case that upstream could do map via GstMsdkMemory. 3. If upstream hasn't accepted the proposed msdk bufferpool (eg. filesrc, appsrc), 3.1. If upstream buffer is aligned to 32: Map directly from the pointer of user space to a surface. Doesn't create msdk bufferpool. 3.2. If upstream buffer is not aligned to 32: Creates new msdk bufferpool and acquire a buffer from the pool. Has to copy from upstream buffer to new buffer.
(In reply to Nicolas Dufresne (stormer) from comment #4) Thanks for the detailed review, Nicolas :) I reworked following your comments, but > Review of attachment 364259 [details] [review] [review]: > > > @@ +187,3 @@ > + GST_VIDEO_INFO_WIDTH (info), GST_VIDEO_INFO_HEIGHT (info)); > + vmeta->map = gst_video_meta_map_msdk_memory; > + vmeta->unmap = gst_video_meta_unmap_msdk_memory; > > I strongly discourage using these two functions. Any reason why the > map/unmap from the memory does not work ? Are your offsets wrong maybe ? > The pipeline (videotestsrc ! msdkenc ) is not working properly if I don't override video meta map/unmap methods. IIUC, videoinfo in videotestsrc seems not to care about the alignment.
Review of attachment 364660 [details] [review]: ::: sys/msdk/gstmsdkbufferpool.c @@ +49,3 @@ +struct _GstMsdkBufferPoolPrivate +{ + MsdkContext *context; If the allocator doesn't need a context, the pool neither @@ +50,3 @@ +{ + MsdkContext *context; + mfxFrameInfo frame_info; if the mxfFrameInfo can be generated internally by the allocator suing GstVideoInfo, there's no need to keep one here @@ +120,3 @@ + GST_BUFFER_POOL_OPTION_VIDEO_ALIGNMENT)) { + GstVideoAlignment alignment; + priv->add_videometa = TRUE; I don't think this is the right approach. Look at the gstvideopool.c code The alignment is added if and only if the pool has GST_BUFFER_POOL_VIDEO_ALIGNMENT option set, AND the GST_BUFFER_VIDEO_META option set video meta is not added only if GST_BUFFER_POOL_VIDEO_ALIGNMENT option is set It would be great if we can asses we can reuse GstVideoBuffer as is, just with our allocator in the configuration ::: sys/msdk/gstmsdkmemory.c @@ +44,3 @@ +_init_msdk_memory_debug (void) +{ +#ifndef GST_DISABLE_GST_DEBUG since you are not logging anything, why do you create this debug category? @@ +56,3 @@ + +static inline void * +_aligned_alloc (size_t alignment, size_t size) rather than create a third function, just define posix_memalign() in windows and use posix_memalign in the code. @@ +324,3 @@ + +GstAllocator * +gst_msdk_allocator_new (MsdkContext * context, mfxFrameInfo * frame_info) @frame_info can be generated from the data in GstVideoInfo, there is no need to pass it around. In this way we'll centralize the the mfx frame configuration in a single point. @@ +335,3 @@ + return NULL; + + allocator->context = context; the context is not a reference-counted structure. Thus I'm afraid that we could end up with crashes when the context is deleted in a race condition. we need to rethink the MsdkContext structure, perhaps using GDestroyNotify for all the subscriptors of the structure. Anyway, what's the use of the context here? I don't see any.
(In reply to Hyunjun Ko from comment #7) > (In reply to Nicolas Dufresne (stormer) from comment #4) > Thanks for the detailed review, Nicolas :) > I reworked following your comments, but > > > Review of attachment 364259 [details] [review] [review] [review]: > > > > > > @@ +187,3 @@ > > + GST_VIDEO_INFO_WIDTH (info), GST_VIDEO_INFO_HEIGHT (info)); > > + vmeta->map = gst_video_meta_map_msdk_memory; > > + vmeta->unmap = gst_video_meta_unmap_msdk_memory; > > > > I strongly discourage using these two functions. Any reason why the > > map/unmap from the memory does not work ? Are your offsets wrong maybe ? > > > > The pipeline (videotestsrc ! msdkenc ) is not working properly if I don't > override video meta map/unmap methods. > IIUC, videoinfo in videotestsrc seems not to care about the alignment. It means you have a bug in your offsets / strides array which you are hacking away with this. I'll be very stiff on this, I want to deprecate this two functions pointers.
Review of attachment 364660 [details] [review]: ::: sys/msdk/gstmsdkmemory.c @@ +324,3 @@ + +GstAllocator * +gst_msdk_allocator_new (MsdkContext * context, mfxFrameInfo * frame_info) Good idea! @@ +335,3 @@ + return NULL; + + allocator->context = context; um... my bad. mfxContext is not needed.
(In reply to Nicolas Dufresne (stormer) from comment #9) > (In reply to Hyunjun Ko from comment #7) > > (In reply to Nicolas Dufresne (stormer) from comment #4) > > Thanks for the detailed review, Nicolas :) > > I reworked following your comments, but > > > > > Review of attachment 364259 [details] [review] [review] [review] [review]: > > > > > > > > > @@ +187,3 @@ > > > + GST_VIDEO_INFO_WIDTH (info), GST_VIDEO_INFO_HEIGHT (info)); > > > + vmeta->map = gst_video_meta_map_msdk_memory; > > > + vmeta->unmap = gst_video_meta_unmap_msdk_memory; > > > > > > I strongly discourage using these two functions. Any reason why the > > > map/unmap from the memory does not work ? Are your offsets wrong maybe ? > > > > > > > The pipeline (videotestsrc ! msdkenc ) is not working properly if I don't > > override video meta map/unmap methods. > > IIUC, videoinfo in videotestsrc seems not to care about the alignment. > > It means you have a bug in your offsets / strides array which you are > hacking away with this. I'll be very stiff on this, I want to deprecate this > two functions pointers. I mean videotestsrc can't do gst_video_frame_map correctly when it's filling since it uses plain GstVideoInfo just following the caps (not with alignment)
(In reply to Hyunjun Ko from comment #11) > > > > It means you have a bug in your offsets / strides array which you are > > hacking away with this. I'll be very stiff on this, I want to deprecate this > > two functions pointers. > > I mean videotestsrc can't do gst_video_frame_map correctly when it's filling > since it uses plain GstVideoInfo just following the caps (not with alignment) Please let me know if there's something that I don't understand correctly.
videotestsrc Will enable videometa option. If it does not, you should reject the config in your pool (then the source will use another pool, leaving to your element the task of copying). Then yes, videotestsrc will pass a videoinfo from caps to frame_map, but frame_map will update it's internal video_info from the VideoMeta. As your pool is producing memory that has absolutely nothing special other then custom stride, offset, alignment, the default map/unmap should do the right thing.
Just so it even more obvious, you should save your aligned video info, and then use it to call gst_buffer_add_video_meta_full(). Read this function doc attentively, you might start understanding my review.
(In reply to Nicolas Dufresne (stormer) from comment #14) > Just so it even more obvious, you should save your aligned video info, and > then use it to call gst_buffer_add_video_meta_full(). Read this function doc > attentively, you might start understanding my review. Arm.. this comment makes me clear. I should've found this. Thanks!
Created attachment 364888 [details] [review] msdk: move the util function msdk_video_alignment Moves the msdk_video_alignment function from decoder to msdk.c so that others could call this function without duplicated definition.
Created attachment 364889 [details] [review] msdk: adds new utility functions for conversion from gstreamer to libmfx
Created attachment 364890 [details] [review] msdk: supports bufferpool Adds initial support for GstMsdkMemory backed buffer pool and GstMsdkAllocator. The memory object currently holds a pointer to mfxFrameSurface.
Created attachment 364891 [details] [review] msdkenc: use msdkbufferpool Adjust to use GstMsdkBufferPool. 1. Propose a GstMsdkBufferPool to upstream. 2. If upstream has accepted the proposed msdk bufferpool (eg. videotestsrc): This is the usual case that upstream could do map via GstMsdkMemory. 3. If upstream hasn't accepted the proposed msdk bufferpool (eg. filesrc, appsrc), 3.1. If upstream buffer is aligned to 32: Map directly from the pointer of user space to a surface. Doesn't create msdk bufferpool. 3.2. If upstream buffer is not aligned to 32: Creates new msdk bufferpool and acquire a buffer from the pool. Has to copy from upstream buffer to new buffer.
Created attachment 364892 [details] [review] msdk: remove unnecessary code anymore
Review of attachment 364888 [details] [review]: ::: sys/msdk/msdk.h @@ +67,3 @@ const gchar *msdk_status_to_string (mfxStatus status); +void msdk_video_alignment (GstVideoAlignment * alignment, GstVideoInfo * info); I don't like the function name, it is not auto-documented. Think something like gst_msdk_set_video_alignment. Also, if you are going to need only the number of planes, there is no need to pass all the GstVideoInfo structure pointer, just pass the guint planes_num,
Review of attachment 364889 [details] [review]: ::: sys/msdk/msdk.c @@ +381,3 @@ alignment->padding_bottom = 32 - (height & 31); } + I am not fancy of switch structures for this kind of operations. I prefer to define static structures which map one data with the other. Thus, the functions are a loop that looks the search item and returns the associated one in the map table. @@ +383,3 @@ + +gint +msdk_get_mfx_color_format (GstVideoFormat format) as it is an exported function I would call it gst_msdk_get_mfx_chroma_from_format() chroma is different from color format @@ +405,3 @@ + +gint +msdk_get_mfx_fourcc (GstVideoFormat format) This function is useless since we already have gst_video_format_to_fourcc () @@ +428,3 @@ +void +msdk_convert_video_info_to_mfx_info (GstVideoInfo * info, + mfxFrameInfo * mfx_info) I'd rename this function to gst_msdk_set_mfx_frame_info_from_video_info (mfxFrameInfo *, GstVideoInfo *) @@ +440,3 @@ + mfx_info->AspectRatioW = info->par_n; + mfx_info->AspectRatioH = info->par_d; + mfx_info->PicStruct = MFX_PICSTRUCT_PROGRESSIVE; you can get this also from the gstvideoinfo: info->interlace_mode
Review of attachment 364890 [details] [review]: ::: sys/msdk/gstmsdkbufferpool.c @@ +145,3 @@ + GstMsdkBufferPool *pool; + + pool = g_object_new (GST_TYPE_MSDK_BUFFER_POOL, NULL); a simpler version of this is return g_object_new (GST_TYPE_MSDK_BUFFER_POOL, NULL); ::: sys/msdk/gstmsdkmemory.c @@ +141,3 @@ +{ + if (!ensure_surface (mem)) + goto error_no_surface; since it is not expected to have a complex implementation, there is no need to handle this single error with the goto pattern. @@ +155,3 @@ +gst_msdk_allocator_create_surface (GstAllocator * allocator) +{ + mfxFrameInfo frame_info; as this is a complete structure in the stack, you can initialized here by = { 0, } or something similar, avoiding the memset done afterwards. Thus, it is done at compilation time and not at runtime... @@ +175,3 @@ +gst_msdk_memory_new (GstAllocator * base_allocator) +{ + GstMsdkAllocator *const allocator = GST_MSDK_ALLOCATOR_CAST (base_allocator); I would do this cast after the g_return_val_if_fail(), just for clarity @@ +232,3 @@ + g_slice_free (mfxFrameSurface1, mem->surface); + mem->cache = NULL; + mem->surface = NULL; have you have looked g_clear_pointer () ? @@ +274,3 @@ + return NULL; + + allocator->image_info = *image_info; what about, instead of storing the gstvideoinfo, you store the mfxFrameInfo directy? I mean, receive as parameter GstVideoInfo, but convert it and store it in the allocator structure as mfxFrameInfo, then you don't need to execute the conversion every time a surface is allocated.
(In reply to Víctor Manuel Jáquez Leal from comment #22) > Review of attachment 364889 [details] [review] [review]: > @@ +405,3 @@ > + > +gint > +msdk_get_mfx_fourcc (GstVideoFormat format) > > This function is useless since we already have gst_video_format_to_fourcc () > Using gst_video_format_to_fourcc causes failure of vpp.(in case of I420 and BGRA) Seems that the required fourcc for the driver might be different. > @@ +440,3 @@ > + mfx_info->AspectRatioW = info->par_n; > + mfx_info->AspectRatioH = info->par_d; > + mfx_info->PicStruct = MFX_PICSTRUCT_PROGRESSIVE; > > you can get this also from the gstvideoinfo: > > info->interlace_mode I want to set MFX_PICSTRUCT_PROGRESSIVE by default. Because the values of GstVideoInterlaceMode and values of MFX_PICSTRUCT are very different. Probably the PicStruct might be set when needed after this call.
(In reply to Víctor Manuel Jáquez Leal from comment #23) > Review of attachment 364890 [details] [review] [review]: > > @@ +232,3 @@ > + g_slice_free (mfxFrameSurface1, mem->surface); > + mem->cache = NULL; > + mem->surface = NULL; > > have you have looked g_clear_pointer () ? > How do you use g_clear_pointer with g_slice_free? I would remove NULL assignment since it's not necessary if you don't want to see them. > @@ +274,3 @@ > + return NULL; > + > + allocator->image_info = *image_info; > > what about, instead of storing the gstvideoinfo, you store the mfxFrameInfo > directy? > > I mean, receive as parameter GstVideoInfo, but convert it and store it in > the allocator structure as mfxFrameInfo, then you don't need to execute the > conversion every time a surface is allocated. The image_info is needed from msdk memory to map. If you don't like to call conversion at every allocation, we should have both. But I think this is fine since allocation happens only several times because it's managed by bufferpool.
Created attachment 364988 [details] [review] msdk: move and rename the function msdk_video_alignment Move the msdk_video_alignment function from decoder to msdk.c and rename so that others could call this function without duplicated declaration.
Created attachment 364989 [details] [review] msdk: adds new utility functions for conversion from gstreamer to libmfx
Created attachment 364990 [details] [review] msdk: supports bufferpool Adds initial support for GstMsdkMemory backed buffer pool and GstMsdkAllocator. The memory object currently holds a pointer to mfxFrameSurface.
Created attachment 364991 [details] [review] msdkenc: use msdkbufferpool Adjust to use GstMsdkBufferPool. 1. Propose a GstMsdkBufferPool to upstream. 2. If upstream has accepted the proposed msdk bufferpool (eg. videotestsrc): This is the usual case that upstream could do map via GstMsdkMemory. 3. If upstream hasn't accepted the proposed msdk bufferpool (eg. filesrc, appsrc), 3.1. If upstream buffer is aligned to 32: Map directly from the pointer of user space to a surface. Doesn't create msdk bufferpool. 3.2. If upstream buffer is not aligned to 32: Creates new msdk bufferpool and acquire a buffer from the pool. Has to copy from upstream buffer to new buffer.
Created attachment 364992 [details] [review] msdk: supports bufferpool
Review of attachment 364988 [details] [review]: ::: sys/msdk/msdk.c @@ +378,3 @@ + + if (width & 31) + alignment->padding_right = 32 - (width & 31); What do you think about a macro #define GST_MSDK_ALIGNMENT_PADDING(num) (32 - ((num) & 31)) ??
Review of attachment 364989 [details] [review]: ::: sys/msdk/msdk.c @@ +53,3 @@ + { GST_VIDEO_FORMAT_BGRA, MFX_CHROMAFORMAT_YUV444, MFX_FOURCC_RGB4 }, + { 0, 0, 0 } +/* *INDENT-ON* */ you can fancy up this code with macros ;) @@ +404,3 @@ + +static const struct map * +_map_lookup_format (const struct map *m, GstVideoFormat format) as it is a static structure and the function is not recursive, there is no need to pass it as argument @@ +439,3 @@ + g_return_if_fail (info && mfx_info); + + mfx_info->Width = GST_ROUND_UP_32 (info->width); what about use the respective macros to get the width, height, fps, par, and format?
Review of attachment 364992 [details] [review]: ::: sys/msdk/gstmsdkmemory.c @@ +157,3 @@ + surface = (mfxFrameSurface1 *) g_slice_new0 (mfxFrameSurface1); + + g_return_val_if_fail (surface, NULL); don't use g_return_val_if_fail() for system errors. It is only for misuse of API from users. use the normal if (!surface) { GST_ERROR() return NULL; } @@ +177,3 @@ + + allocator = GST_MSDK_ALLOCATOR_CAST (base_allocator); + g_return_val_if_fail (GST_IS_MSDK_ALLOCATOR (allocator), NULL); again, this doesn't make sense since the allocator is already casted as a msdk allocator this should be called above ::: sys/msdk/gstmsdkmemory.h @@ +59,3 @@ + GST_MINI_OBJECT_FLAG_SET (mem, flag) +#define GST_MSDK_MEMORY_FLAG_UNSET(mem, flag) \ + GST_MEMORY_FLAG_UNSET (mem, flag) you don't need to define these macros, afaiu
Review of attachment 364992 [details] [review]: ::: sys/msdk/gstmsdkbufferpool.c @@ +93,3 @@ + GST_INFO_OBJECT (pool, "created new allocator %" GST_PTR_FORMAT, allocator); + + gst_buffer_pool_config_set_allocator (config, allocator, NULL); shouldn't you set the proper GstAllocationParams here?
Review of attachment 364991 [details] [review]: shouldn't this bufferpool be used by decoders too??
Review of attachment 364991 [details] [review]: ::: sys/msdk/gstmsdkenc.c @@ +426,3 @@ + thiz->can_direct_map = FALSE; + } else { + thiz->can_direct_map = TRUE; This distinction doesn't make much sense for me. That's something that shall be handled inside the bufferpool... am I wrong? @@ +946,3 @@ + if (GST_IS_MSDK_MEMORY (mem)) { + /* This is the case 1.1 */ + surface = GST_MSDK_MEMORY_CAST (mem)->surface; why don't use the early return here (goto done), instead of nesting more the code?? @@ +959,3 @@ + msdk_get_free_surface (thiz->vpp_surfaces, thiz->num_vpp_surfaces); + else + surface = msdk_get_free_surface (thiz->surfaces, thiz->num_surfaces); all these free surfaces shall be handled, in my opinion, by the bufferpool, because this is a buffer pool. The problem here is the use of VPP... perhaps we could move that logic into the bufferpool.. dunno... The interesting part here it that we could reuse the input buffer avoiding a memcopy. For that we could use GstBufferPoolAcquireFlags in GstBufferPoolAcquireParams saying "don't allocate memory, just grab the surface" the number of surface to create are defined in gst_buffer_pool_config_set_params() @@ +1204,3 @@ + + if (need_pool) { + if (thiz->msdk_pool) don't reuse pool in propose_allocation(), that breaks the renegotiation. create one every time it is requested. **that's a bug in gstreamer-vaapi @@ +1206,3 @@ + if (thiz->msdk_pool) + gst_object_unref (thiz->msdk_pool); + thiz->msdk_pool = pool; why this assignation is needed? @@ +1220,3 @@ + if (pool) { + GstStructure *config; + GstAllocationParams params = { 0, 31, 0, 0, }; in the buffer pool you don't use these params.. why do you define them?
Review of attachment 364992 [details] [review]: not rejected, it needs work
(In reply to Hyunjun Ko from comment #24) > (In reply to Víctor Manuel Jáquez Leal from comment #22) > > Review of attachment 364889 [details] [review] [review] [review]: > > @@ +405,3 @@ > > + > > +gint > > +msdk_get_mfx_fourcc (GstVideoFormat format) > > > > This function is useless since we already have gst_video_format_to_fourcc () > > > > Using gst_video_format_to_fourcc causes failure of vpp.(in case of I420 and > BGRA) > Seems that the required fourcc for the driver might be different. Yeah, it seems for I420 and YV12 it uses YV12, and for BGRA it uses RGB4. Perhaps we should remark this as exceptions. > > > > @@ +440,3 @@ > > + mfx_info->AspectRatioW = info->par_n; > > + mfx_info->AspectRatioH = info->par_d; > > + mfx_info->PicStruct = MFX_PICSTRUCT_PROGRESSIVE; > > > > you can get this also from the gstvideoinfo: > > > > info->interlace_mode > > I want to set MFX_PICSTRUCT_PROGRESSIVE by default. > Because the values of GstVideoInterlaceMode and values of MFX_PICSTRUCT are > very different. Probably the PicStruct might be set when needed after this > call. Yes, they are different, you would need a function that convert from GstVideoInterlaceMode and GstVideoFieldOrder the according MFX_PICSTRUCT_
(In reply to Hyunjun Ko from comment #25) > (In reply to Víctor Manuel Jáquez Leal from comment #23) > > Review of attachment 364890 [details] [review] [review] [review]: > > > > @@ +232,3 @@ > > + g_slice_free (mfxFrameSurface1, mem->surface); > > + mem->cache = NULL; > > + mem->surface = NULL; > > > > have you have looked g_clear_pointer () ? > > > > How do you use g_clear_pointer with g_slice_free? yea, in the case of GSlice is not possible to use g_clear_pointer() > I would remove NULL assignment since it's not necessary if you don't want to > see them. > > > > @@ +274,3 @@ > > + return NULL; > > + > > + allocator->image_info = *image_info; > > > > what about, instead of storing the gstvideoinfo, you store the mfxFrameInfo > > directy? > > > > I mean, receive as parameter GstVideoInfo, but convert it and store it in > > the allocator structure as mfxFrameInfo, then you don't need to execute the > > conversion every time a surface is allocated. > > The image_info is needed from msdk memory to map. > If you don't like to call conversion at every allocation, we should have > both. > But I think this is fine since allocation happens only several times because > it's managed by bufferpool. D'accord
(In reply to Víctor Manuel Jáquez Leal from comment #31) > Review of attachment 364988 [details] [review] [review]: > > ::: sys/msdk/msdk.c > @@ +378,3 @@ > + > + if (width & 31) > + alignment->padding_right = 32 - (width & 31); > > What do you think about a macro > > #define GST_MSDK_ALIGNMENT_PADDING(num) (32 - ((num) & 31)) > > ?? Good. I'll do it later, or u can do it.
(In reply to Víctor Manuel Jáquez Leal from comment #37) > Review of attachment 364991 [details] [review] [review]: > > shouldn't this bufferpool be used by decoders too?? Sure. I'll propose it later.
(In reply to Víctor Manuel Jáquez Leal from comment #38) > Review of attachment 364991 [details] [review] [review]: > > ::: sys/msdk/gstmsdkenc.c > @@ +426,3 @@ > + thiz->can_direct_map = FALSE; > + } else { > + thiz->can_direct_map = TRUE; > > This distinction doesn't make much sense for me. That's something that shall > be handled inside the bufferpool... am I wrong? > Um.. can_direct_map == TRUE flag can be used when upstream doesn't use the proposed msdk bufferpool. Probably I should change something about this to avoid being ambiguous. > > @@ +959,3 @@ > + msdk_get_free_surface (thiz->vpp_surfaces, > thiz->num_vpp_surfaces); > + else > + surface = msdk_get_free_surface (thiz->surfaces, > thiz->num_surfaces); > > all these free surfaces shall be handled, in my opinion, by the bufferpool, > because this is a buffer pool. > > The problem here is the use of VPP... perhaps we could move that logic into > the bufferpool.. dunno... > > The interesting part here it that we could reuse the input buffer avoiding a > memcopy. For that we could use GstBufferPoolAcquireFlags in > GstBufferPoolAcquireParams saying "don't allocate memory, just grab the > surface" > > the number of surface to create are defined in > gst_buffer_pool_config_set_params() I agree with that we should be using another bufferpool for it. I'll be rethinking about it. > > @@ +1204,3 @@ > + > + if (need_pool) { > + if (thiz->msdk_pool) > > don't reuse pool in propose_allocation(), that breaks the renegotiation. > create one every time it is requested. **that's a bug in gstreamer-vaapi > Thanks for useful advice. > @@ +1206,3 @@ > + if (thiz->msdk_pool) > + gst_object_unref (thiz->msdk_pool); > + thiz->msdk_pool = pool; > > why this assignation is needed? That is a stupid mistake :(
(In reply to Hyunjun Ko from comment #44) > > > > @@ +959,3 @@ > > + msdk_get_free_surface (thiz->vpp_surfaces, > > thiz->num_vpp_surfaces); > > + else > > + surface = msdk_get_free_surface (thiz->surfaces, > > thiz->num_surfaces); > > > > all these free surfaces shall be handled, in my opinion, by the bufferpool, > > because this is a buffer pool. > > > > The problem here is the use of VPP... perhaps we could move that logic into > > the bufferpool.. dunno... > > > > The interesting part here it that we could reuse the input buffer avoiding a > > memcopy. For that we could use GstBufferPoolAcquireFlags in > > GstBufferPoolAcquireParams saying "don't allocate memory, just grab the > > surface" > > Thanks for good advice, I think I should implement acquire_buffer in msdk bufferpool to do that.
Victor, I think replacing the array of surfaces with bufferpool could be proposed as other patches just like "Add option XXX_NO_ALLOC" and "Replacing the array of surfaces with msdkbufferpool". So I'd start working on this once the proposed patches are accepted. What do you think?
(In reply to Hyunjun Ko from comment #46) > Victor, I think replacing the array of surfaces with bufferpool could be > proposed as other patches just like "Add option XXX_NO_ALLOC" and "Replacing > the array of surfaces with msdkbufferpool". > > So I'd start working on this once the proposed patches are accepted. > What do you think? What worries me is that if we found out that our design of bufferpool is not correct once we added the array of buffers, would be a waste of commits
what about leave the VPP conversion out of the bufferpool, only handle the encoder surfaces??
(In reply to Víctor Manuel Jáquez Leal from comment #48) > what about leave the VPP conversion out of the bufferpool, only handle the > encoder surfaces?? That wouldn't be so differnt imo.
@Hynjun: I just want to clarify a few things. IIUC, you are trying to get all buffer pools stuffs implemented with system memory first, right? I am a bit afraid that there could be rewrite needed in the way we treating memory create/map/unmap once there is VIDEO_MEMORY support. But honestly, I am not a pro in msdk, so could be mistaken too. :) We need to consider a number of things for designing the buffer pool. Mainly, 1: It requires vasurface & vaimage handling (gstreamer-vaapi code in a simplified way if possible) for VIDEO_MEMORY support. 2: Tasks can be submitted together if they share contexts (joined sessions). 3. There is a third memory type called "opaque memory". We might not need this for our use cases though. Just wondering what could be the best way to proceed here. Option 1: Just focus on system memory in this bug and implement all pool sharing between decoder+vpp+encoder. Later we can do/discuss(rewrite needed most probably) all video memory stuffs in a separate bug. Option2: Discuss all features (videomory, task session sharing, and dmabuf support too) as a part of this bug itself and do all development in a separate branch before pushing to master.
(In reply to sreerenj from comment #50) > @Hynjun: I just want to clarify a few things. > > IIUC, you are trying to get all buffer pools stuffs implemented with system > memory first, right? Right > I am a bit afraid that there could be rewrite needed in > the way we treating memory create/map/unmap once there is VIDEO_MEMORY > support. This is also what I'm worried about since I'm still not an expert of msdk. > But honestly, I am not a pro in msdk, so could be mistaken too. :) > > We need to consider a number of things for designing the buffer pool. > Mainly, > > 1: It requires vasurface & vaimage handling (gstreamer-vaapi code in a > simplified way if possible) for VIDEO_MEMORY support. Right. > > 2: Tasks can be submitted together if they share contexts (joined sessions). > Not sure about this. If you mean mfxThreadTask and XXXFrameSubmit, this is for mfx plugin writer iiuc. Are these necessary to be considered in gstreamer? > 3. There is a third memory type called "opaque memory". We might not need > this for our use cases though. Since you mentioned this in the last meeting, I looked into mfx opaque memory. Basically it doesn't fit gstreamer design imho. But I guess we could use this memory in the current encoder sicne vpp feature landed in encoder. (Using opaque surface in VPP output and encoder input) Anyway, why don't we focus on handling SYSTEM/VIDEO memory first? > > Just wondering what could be the best way to proceed here. > > Option 1: Just focus on system memory in this bug and implement all pool > sharing between decoder+vpp+encoder. Later we can do/discuss(rewrite needed > most probably) all video memory stuffs in a separate bug. > > Option2: Discuss all features (videomory, task session sharing, and dmabuf > support too) as a part of this bug itself and do all development in a > separate branch before pushing to master. I'd choose the number 2 since handling video memory by gstbufferpool was original goal of this issue actually. I can work on my github branch. What do you (and others) think?
I think it's extremely important to note that the management of VASurfaces / D3D11 surfaces is taken care of by the MFX frame allocation mechanism, which means you will have to design and implement your bufferpool around the MSDK framework. The MFX frame allocator can be only initialized after the MFX task is instantiated, thus making it awkward to fit within the GStreamer bufferpool design, since it is not within GStreamer's control regarding the allocation, freeing, locking and unlocking of the video memory surfaces by the MFX allocator. You should also note that you can only decide the final number of surfaces to be shared between tasks when you have determined the properties of each task, such as whether one of the joined tasks can only utilize system memory based on platform support or when using a MFX SW plugin in the downstream element, or in the case of the VPP, the input / output memory combination can be supported (video memory in / system memory out is not well supported). There are also a few other quirks to worry about, but just worry about video memory support in a separate branch first. Indeed, imo the way this is going, quite a big re-write on this would need to be done with the introduction of video memory support. In GST-MFX, we couldn't deal with this cleanly as well due to the above-mentioned constraint, but nevertheless we handled all the quirks well enough. It will be interesting how some fundamental design changes could be done on the GStreamer framework to deal with the abovementioned constraint cleanly. I suggest you guys deal with the video memory aspect first before anything else, a good start would be to refer to this: https://github.com/ishmael1985/gstreamer-media-SDK/blob/master/gst-libs/mfx/gstmfxallocator_vaapi.c Please feel free to re-write in a much cleaner way in your version of the gst-msdk plugin :)
Created attachment 368107 [details] [review] msdk: move and rename the function msdk_video_alignment Move the msdk_video_alignment function from decoder to msdk.c and rename so that others could call this function without duplicated declaration.
Created attachment 368108 [details] [review] msdk: adds new utility functions for conversion from gstreamer to libmfx
Created attachment 368109 [details] [review] msdk: libva: adds utility function between mfx and libva
Created attachment 368110 [details] [review] msdk: implements GstMsdkContext Makes GstMsdkContext to be a descendant of GstObject so that we could track the life-cycle of the session of the driver. Also replaces MsdkContext with this one. Keeps msdk_d3d.c alive for the future.
Created attachment 368111 [details] [review] msdkdec: fix typo
Created attachment 368112 [details] [review] msdk: adds frame allocator using libva Implements msdk frame allocator which is required from the driver. Also makes these functions global so that GstMsdkAllocator could use the allocated video memory later and couple with GstMsdkMemory. GstMsdkContext keeps allocation information such as mfxFrameAllocRequest and mfxFrameAllocResponse after allocation.
Created attachment 368113 [details] [review] msdk: supports bufferpool Implements 2 memory allocators: 1\ GstMsdkSystemAllocator: This will allocate system memory. 2\ GstMsdkVideoAllocator: This will allocate device memory depending on the platform. (eg. VASurface) Currently GstMsdkBufferPool uses video allocator currently by default only on linux. On Windows, we should use system memory until d3d allocator is implemented.
Created attachment 368114 [details] [review] msdkenc: use bufferpool 1\ Proposes msdk bufferpool to upstream. - If upstream has accepted the proposed msdk bufferpool, encoder can get msdk surface from the buffer directly. - If not, encoder get msdk surface its own msdk bufferpool and copy from upstream's frame to the surface. 2\ Replace arrays of surfaces with msdk bufferpool. 3\ In case of using VPP, there should be another msdk bufferpool with NV12 info so that it could convert first and encode. Calls gst_msdk_set_frame_allocator and uses video memory only on linux. and uses system memory on Windows until d3d allocator is implemented.
Created attachment 368115 [details] [review] msdkdec: use bufferpool 1\ In decide_allocation, it makes its own msdk bufferpool. - If downstream supports video meta, it just replace it with the msdk bufferpool. - If not, it uses the msdk bufferpool as a side pool, which will be decoded into. and will copy it to downstream's bufferpool. 2\ Decide if using video memory or system memory. - This is not completed in this patch. - It might be decided in update_src_caps. - But tested for both system memory and video memory cases.
Created attachment 368116 [details] [review] msdk: adds util functions to handle GstContext To share GstMsdkContext with each msdk element, it will be using GstContext. Most common code is from gstreamer-vaapi.
Created attachment 368117 [details] [review] msdk: context: add job type to figure out if joining session is necessary According to the driver's instruction, if there are two or more encoders or decoders in a process, the session should be joined by MFXJoinSession. To achieve this successfully by GstContext, this patch adds job type specified if it's encoder, decoder or vpp. If a msdk element gets to know if joining session is needed by the shared context, it should create another instance of GstContext with joined session, which is not shared.
Created attachment 368118 [details] [review] msdkdec/enc: query GstContext to share GstMsdkContext How to share/create GstMsdkcontext is the following: - Search GstMsdkContext if there's in the pipeline. - If found, check if it's decoder, encoder or vpp by job type. - If it's same job type, it creates another instance of GstMsdkContext. - Otherwise just use the shared GstMsdkContext. - If not found, just crate new instance of GstMsdkContext.
Created attachment 368119 [details] [review] msdk: adds async depth from each msdk element to GstMsdkContext to be shared In case that pipeline is like ".. ! decoder ! encoder ! ..." with using video memory, decoder needs to know the async depth of the following msdk element so that it could allocate the correct number of video memory. Otherwise, decoder's memory is exhausted while processing.
Created attachment 368120 [details] [review] msdkdec: use video memory if there's another MSDK context in a pipeline 1\ If downstream's pool is MSDK bufferpool, 2\ If there's shared GstMsdkContext in the pipeline, a decoder decides to use video memory. This policy should be improved to handle more cases.
Created attachment 368121 [details] [review] msdk: Avoid build failures on Windows until d3d allocator is implemented
Review of attachment 368107 [details] [review]: Not issue with this one, just a question. ::: sys/msdk/msdk.c @@ +379,3 @@ + + if (width & 31) + alignment->padding_right = GST_MSDK_ALIGNMENT_PADDING (width); This is redundant with the stride_alignment, can you explain why you need this ?
Review of attachment 368110 [details] [review]: ::: sys/msdk/gstmsdkcontext.c @@ +178,3 @@ + GstMsdkContext *obj = g_object_new (GST_TYPE_MSDK_CONTEXT, NULL); + + return gst_msdk_context_open (obj, hardware) ? obj : NULL; Here's a memory leak: if open() fails, NULL is returned leaking the context object.
Review of attachment 368111 [details] [review]: lgtm
Review of attachment 368112 [details] [review]: lgtm
Review of attachment 368113 [details] [review]: I would like a function to set the video info to the allocator, so it we could change the resolution of the allocator without reinstantiating it. ::: sys/msdk/gstmsdkbufferpool.c @@ +95,3 @@ + && (g_strcmp0 (allocator->mem_type, GST_MSDK_SYSTEM_MEMORY_NAME) != 0 + && g_strcmp0 (allocator->mem_type, + GST_MSDK_VIDEO_MEMORY_NAME) != 0)) { I've just learned that, as mem_type points to the defined string macro, it more efficient to use g_str_equal. But it is still safe use g_strcmp0. (I will change it for gstreamer-vaapi) @@ +124,3 @@ + allocator = + gst_msdk_video_allocator_new (priv->context, &video_info, + priv->alloc_response); what if the API use pases NULL to this parameters in buffer_pool_new()?? isn't better to fail on set_config() requesting for a real allocator? ::: sys/msdk/gstmsdksystemmemory.c @@ +137,3 @@ + &msdk_system_allocator->image_info); + + memcpy (&surface->Info, &frame_info, sizeof (mfxFrameInfo)); is there need of this memcpy??? just fill the structure directly, isn't? @@ +215,3 @@ + GstAllocationParams * params) +{ + return gst_msdk_system_memory_new (allocator); you need to pass @size to gst_msdk_system_memory_new() in order to use it in gst_memory_init(), not the video info size, if the request size is not enough for the frame it is better to inform the user. ::: sys/msdk/gstmsdkvideomemory.c @@ +74,3 @@ + &msdk_video_allocator->image_info); + + memcpy (&surface->Info, &frame_info, sizeof (mfxFrameInfo)); ditto (no need for memcopy imo)
Review of attachment 368108 [details] [review]: ::: sys/msdk/msdk.c @@ +42,3 @@ + GstVideoFormat format; + gint mfx_chroma_format; + gint mfx_fourcc; Would it be better to use the MFX defined types here? mfxU16 ChromaFormat; mfxU32 FourCC; @@ +446,3 @@ + mfx_info->AspectRatioW = GST_VIDEO_INFO_PAR_N (info); + mfx_info->AspectRatioH = GST_VIDEO_INFO_PAR_D (info); +gint We should fix this. But I am fine adding it later in a separate patch.
Review of attachment 368110 [details] [review]: lgtm ::: sys/msdk/gstmsdkcontext.c @@ +178,3 @@ + GstMsdkContext *obj = g_object_new (GST_TYPE_MSDK_CONTEXT, NULL); + + return gst_msdk_context_open (obj, hardware) ? obj : NULL; Actually gst_msdk_context_open() is un-reffing the context in all failure cases.
Review of attachment 368112 [details] [review]: ::: sys/msdk/gstmsdkallocator_libva.c @@ +65,3 @@ + MFX_MEMTYPE_VIDEO_MEMORY_PROCESSOR_TARGET))) + return MFX_ERR_UNSUPPORTED; + I am bit confused here. Could you please add some comments? 1: Consider the pipeline: "videotestsrc ! msdkh264enc ", is it supposed to fail (in won't in practice)? I am wondering why msdk has added MFX_MEMTYPE_VIDEO_MEMORY_DECODER_TARGET into the req->type even if there is no decoder in the pipeline! 2: According to my testing there is too much memory (number of surfaces) allocation for a simple transcode pipeline (mfxh264dec async-depth=1 ! mfxh264enc async-depth=1). You can test with the video.mp4 which I have sent recently :), It is allocating 20+ surfaces (without counting the internal allocation) @@ +106,3 @@ + * + * See https://github.com/Intel-Media-SDK/samples/issues/13 for more details. + */ Ya, this is so nasty, but we have no option other than doing the same! The OpenSource msdk may fix this soon, but we can't remove this(unless we put a tight dependency on the next msdk opensource release) because of backward compatibility and all most all the customers are currently using Closed source version of msdk I believe. BTW, MSDK developers saying that it requires for all encoders, but I am not sure :) @@ +195,3 @@ + + if (mem_id->fourcc != MFX_FOURCC_P8) { + va_status = vaDeriveImage (dpy, *va_surface, &mem_id->image); I am still not 100% sure whether is safe to do it like this. What happens if msdk call frame_lock on a surface which we already mapped by a GStreamer element? But we haven't seen any case like that and hopefully, msdk never call this except for the internal frames. For now, I am fine to keep this code as it is since our main target is to get the code in before upstream feature freeze :)
@Hynjun: I would recommend you to file separate bugs for any missing pieces of stuff so that people reviewing can know that we are aware of those (eg: avoid explicit syncing) Or you can put a "Fixme" comment in code.
Review of attachment 368112 [details] [review]: ::: sys/msdk/gstmsdkallocator_libva.c @@ +159,3 @@ + + if (mem_id->fourcc != MFX_FOURCC_P8) { + va_status = I think you need to make sure the Surface is not mapped. Or at least the vaDestoryImage is required to make sure the VAImage Structure is not leaking. right?
(In reply to Nicolas Dufresne (stormer) from comment #69) > Review of attachment 368107 [details] [review] [review]: > > Not issue with this one, just a question. > > ::: sys/msdk/msdk.c > @@ +379,3 @@ > + > + if (width & 31) > + alignment->padding_right = GST_MSDK_ALIGNMENT_PADDING (width); > > This is redundant with the stride_alignment, can you explain why you need > this ? Actually I just didn't know it's redundant. Afer looking into gst_video_info_align, seems it's not neccessary.
(In reply to Víctor Manuel Jáquez Leal from comment #70) > Review of attachment 368110 [details] [review] [review]: > > ::: sys/msdk/gstmsdkcontext.c > @@ +178,3 @@ > + GstMsdkContext *obj = g_object_new (GST_TYPE_MSDK_CONTEXT, NULL); > + > + return gst_msdk_context_open (obj, hardware) ? obj : NULL; > > Here's a memory leak: > > if open() fails, NULL is returned leaking the context object. Thanks for pointing this out. Fixed.
(In reply to Víctor Manuel Jáquez Leal from comment #74) > Review of attachment 368113 [details] [review] [review]: > > I would like a function to set the video info to the allocator, so it we > could change the resolution of the allocator without reinstantiating it. Okay, I'll add this to my to-do list. > > ::: sys/msdk/gstmsdkbufferpool.c > @@ +95,3 @@ > + && (g_strcmp0 (allocator->mem_type, GST_MSDK_SYSTEM_MEMORY_NAME) != 0 > + && g_strcmp0 (allocator->mem_type, > + GST_MSDK_VIDEO_MEMORY_NAME) != 0)) { > > I've just learned that, as mem_type points to the defined string macro, it > more efficient to use g_str_equal. But it is still safe use g_strcmp0. (I > will change it for gstreamer-vaapi) > > @@ +124,3 @@ > + allocator = > + gst_msdk_video_allocator_new (priv->context, &video_info, > + priv->alloc_response); > > what if the API use pases NULL to this parameters in buffer_pool_new()?? > isn't better to fail on set_config() requesting for a real allocator? Good point. I'm working on this. > > ::: sys/msdk/gstmsdksystemmemory.c > @@ +137,3 @@ > + &msdk_system_allocator->image_info); > + > + memcpy (&surface->Info, &frame_info, sizeof (mfxFrameInfo)); > > is there need of this memcpy??? just fill the structure directly, isn't? right! > > @@ +215,3 @@ > + GstAllocationParams * params) > +{ > + return gst_msdk_system_memory_new (allocator); > > you need to pass @size to gst_msdk_system_memory_new() in order to use it in > gst_memory_init(), not the video info size, if the request size is not > enough for the frame it is better to inform the user. Okay. > > ::: sys/msdk/gstmsdkvideomemory.c > @@ +74,3 @@ > + &msdk_video_allocator->image_info); > + > + memcpy (&surface->Info, &frame_info, sizeof (mfxFrameInfo)); > > ditto (no need for memcopy imo)
(In reply to sreerenj from comment #75) > Review of attachment 368108 [details] [review] [review]: > > ::: sys/msdk/msdk.c > @@ +42,3 @@ > + GstVideoFormat format; > + gint mfx_chroma_format; > + gint mfx_fourcc; > > Would it be better to use the MFX defined types here? > mfxU16 ChromaFormat; > mfxU32 FourCC; > That would be better for me too. > @@ +446,3 @@ > + mfx_info->AspectRatioW = GST_VIDEO_INFO_PAR_N (info); > + mfx_info->AspectRatioH = GST_VIDEO_INFO_PAR_D (info); > +gint > > We should fix this. But I am fine adding it later in a separate patch. Agree. Currently the interlace mode that msdk elements allow is only progressive. We can file about this.
(In reply to sreerenj from comment #77) > Review of attachment 368112 [details] [review] [review]: > > ::: sys/msdk/gstmsdkallocator_libva.c > @@ +65,3 @@ > + MFX_MEMTYPE_VIDEO_MEMORY_PROCESSOR_TARGET))) > + return MFX_ERR_UNSUPPORTED; > + > > I am bit confused here. Could you please add some comments? > > 1: Consider the pipeline: "videotestsrc ! msdkh264enc ", is it supposed to > fail (in won't in practice)? I am wondering why msdk has added > MFX_MEMTYPE_VIDEO_MEMORY_DECODER_TARGET into the req->type even if there is > no decoder in the pipeline! I'm not sure either, to be honest. :) But I guess MFX_MEMTYPE_VIDEO_MEMORY_DECODER_TARGET is for both encoder and decoder, MFX_MEMTYPE_VIDEO_MEMORY_PROCESSOR_TARGET is for VPP, otherwise MFX_MEMTYPE_SYSTEM_MEMORY. Regarding this, the document says that "The VA API does not define any surface types and the application can use either MFX_MEMTYPE_VIDEO_MEMORY_DECODER_TARGET or MFX_MEMTYPE_VIDEO_MEMORY_PROCESSOR_TARGET to indicate data in video memory." I'll comment this out. > > 2: According to my testing there is too much memory (number of surfaces) > allocation for a simple transcode pipeline (mfxh264dec async-depth=1 ! > mfxh264enc async-depth=1). > You can test with the video.mp4 which I have sent recently :), It is > allocating 20+ surfaces (without counting the internal allocation) > That's normal if using video memory especially for h264/5 decoding (not encoding). I don't know why the driver requires that many surfaces. > @@ +106,3 @@ > + * > + * See https://github.com/Intel-Media-SDK/samples/issues/13 for more > details. > + */ > > Ya, this is so nasty, but we have no option other than doing the same! > The OpenSource msdk may fix this soon, but we can't remove this(unless we > put a tight dependency on the next msdk opensource release) > because of backward compatibility and all most all the customers are > currently using Closed source version of msdk I believe. > > BTW, MSDK developers saying that it requires for all encoders, but I am not > sure :) > > @@ +195,3 @@ > + > + if (mem_id->fourcc != MFX_FOURCC_P8) { > + va_status = vaDeriveImage (dpy, *va_surface, &mem_id->image); > > I am still not 100% sure whether is safe to do it like this. What happens if > msdk call frame_lock on a surface > which we already mapped by a GStreamer element? But we haven't seen any case > like that and hopefully, msdk never > call this except for the internal frames. That's also what I'm worried about though I've never seen it's called from the driver for external frames. Let's see what's happening here. > > For now, I am fine to keep this code as it is since our main target is to > get the code in before upstream feature freeze :)
(In reply to sreerenj from comment #78) > @Hynjun: > > I would recommend you to file separate bugs for any missing pieces of stuff > so that people reviewing can know that we are aware of those (eg: avoid > explicit syncing) > > Or you can put a "Fixme" comment in code. I'll be opening new bugs as the following: 1. avoid explicit syncing 2. some leaks when seek. 3. session close failure if there are multiple sessions.
(In reply to sreerenj from comment #79) > Review of attachment 368112 [details] [review] [review]: > > ::: sys/msdk/gstmsdkallocator_libva.c > @@ +159,3 @@ > + > + if (mem_id->fourcc != MFX_FOURCC_P8) { > + va_status = > > I think you need to make sure the Surface is not mapped. > Or at least the vaDestoryImage is required to make sure the VAImage > Structure is not leaking. right? Good point. I would simply call vaDestroyImage for each surface.
I'm still collecting other suggestions/opinions/recomendations. And I re-updated all patches. This is to avoid noise since here are lots of patches proposed.
Review of attachment 368114 [details] [review]: lgtm
(In reply to Hyunjun Ko from comment #84) > (In reply to sreerenj from comment #77) > > Review of attachment 368112 [details] [review] [review] [review]: > > > > ::: sys/msdk/gstmsdkallocator_libva.c > > @@ +65,3 @@ > > + MFX_MEMTYPE_VIDEO_MEMORY_PROCESSOR_TARGET))) > > + return MFX_ERR_UNSUPPORTED; > > + > > > > I am bit confused here. Could you please add some comments? > > > > 1: Consider the pipeline: "videotestsrc ! msdkh264enc ", is it supposed to > > fail (in won't in practice)? I am wondering why msdk has added > > MFX_MEMTYPE_VIDEO_MEMORY_DECODER_TARGET into the req->type even if there is > > no decoder in the pipeline! > > I'm not sure either, to be honest. :) > But I guess MFX_MEMTYPE_VIDEO_MEMORY_DECODER_TARGET is for both encoder and > decoder, MFX_MEMTYPE_VIDEO_MEMORY_PROCESSOR_TARGET is for VPP, otherwise > MFX_MEMTYPE_SYSTEM_MEMORY. > > Regarding this, the document says that "The VA API does not define any > surface types and the application can use either > MFX_MEMTYPE_VIDEO_MEMORY_DECODER_TARGET or > MFX_MEMTYPE_VIDEO_MEMORY_PROCESSOR_TARGET to indicate data in video memory." > > I'll comment this out. > > > > > > 2: According to my testing there is too much memory (number of surfaces) > > allocation for a simple transcode pipeline (mfxh264dec async-depth=1 ! > > mfxh264enc async-depth=1). > > You can test with the video.mp4 which I have sent recently :), It is > > allocating 20+ surfaces (without counting the internal allocation) > > > > That's normal if using video memory especially for h264/5 decoding (not > encoding). I don't know why the driver requires that many surfaces. > Strange!, We should compare with msdk sample applications. I have a guess, we don't do the "DecodeHeader" before querying for required IO surface, instead, we just fill the mfxVideoParam manually and there could a possibility that we are not providing all the information needed. But it is fine for now, We can fix this later (both decoder and encoder require a bunch of fixes I believe). Would be nice to file a bug to keep track of this.
(In reply to Hyunjun Ko from comment #87) > I'm still collecting other suggestions/opinions/recomendations. > And I re-updated all patches. This is to avoid noise since here are lots of > patches proposed. Yup. that's better. Keep all these patches as it is until we finish the review. You can provide the updated patches later.
> Strange!, We should compare with msdk sample applications. > > I have a guess, we don't do the "DecodeHeader" before querying for required > IO surface, instead, we just fill the mfxVideoParam manually and there could > a possibility that we are not providing all the information needed. But it > is fine for now, We can fix this later (both decoder and encoder require a > bunch of fixes I believe). Would be nice to file a bug to keep track of this. Jeez, everything can be easily guessed when you have such a good reference :) Your guess is right.
Review of attachment 368115 [details] [review]: ::: sys/msdk/gstmsdkdec.c @@ +110,3 @@ + frame->output_buffer = NULL; + goto retry; + * We should be keep these buffers and check if it's unlocked. I remember you were talking about the possibility to keep a separate surface pool and writing our own acquire_buffer() for the bufferpool. How about just overriding the acquire_buffer() as: gst_msdk_buffer_pool_acqurre_buffer{ buffer = GST_BUFFER_POOL_CLASS (gst_msdk_buffer_pool_parent_class)->acquire_buffer() if (buffer.surface.Data.locked) release_buffer() and try again. else ret buffer; } @@ +731,3 @@ + return NULL; + } + I think we should move/refactor this function as a common utility method for both encode and decoder. It is almost the same in both, you may just need to pass few more parameters. @@ +818,3 @@ + + memcpy (&thiz->output_info, &info_from_caps, sizeof (GstVideoInfo)); + memcpy (&info_aligned, &info_from_caps, sizeof (info_aligned)); Just assign the struct, no need of memcpy.
Review of attachment 368118 [details] [review]: Assume we have vpp. How about the pipeline: msdkh264dec ! msdkvpp ! textoverlay ! msdkvpp". Here is decoder is supposed to create a child session, but it won't I believe. WDT?
Review of attachment 368120 [details] [review]: Would it be better to use videomemory always(default)? According to msdk spec,system-mem is only preferred if software decoder is in use.
(In reply to sreerenj from comment #92) > Review of attachment 368115 [details] [review] [review]: > > ::: sys/msdk/gstmsdkdec.c > @@ +110,3 @@ > + frame->output_buffer = NULL; > + goto retry; > + * We should be keep these buffers and check if it's unlocked. > > I remember you were talking about the possibility to keep a separate surface > pool and writing our own acquire_buffer() for the bufferpool. How about just > overriding the acquire_buffer() as: > > gst_msdk_buffer_pool_acqurre_buffer{ > buffer = GST_BUFFER_POOL_CLASS > (gst_msdk_buffer_pool_parent_class)->acquire_buffer() > if (buffer.surface.Data.locked) > release_buffer() and try again. > else > ret buffer; > } > That's exactly what i was thinking about. and that's why I was confused since you said it would be complicated. I'm planning to implement it after these patches are merged. > @@ +731,3 @@ > + return NULL; > + } > + > > I think we should move/refactor this function as a common utility method for > both encode and decoder. > It is almost the same in both, you may just need to pass few more parameters. > Right. There's something more to refactor also. (eg. GstMsdkSurface in encoder/decoder) > @@ +818,3 @@ > + > + memcpy (&thiz->output_info, &info_from_caps, sizeof (GstVideoInfo)); > + memcpy (&info_aligned, &info_from_caps, sizeof (info_aligned)); > > Just assign the struct, no need of memcpy. ok though this is not from my patch.
(In reply to sreerenj from comment #93) > Review of attachment 368118 [details] [review] [review]: > > Assume we have vpp. > > How about the pipeline: msdkh264dec ! msdkvpp ! textoverlay ! msdkvpp". > > Here is decoder is supposed to create a child session, but it won't I > believe. WDT? As per my undestanding, the decoder in this pipeline doesn't need to join the session from vpp but just share it, does it? But for vpps in this pipeline, seems it should create its own session seperately because there is an element that should be using system memory while the patch doesn't do that. Can we think about this kind of complexed pipelines later or we should handle this in this thread too?
(In reply to sreerenj from comment #94) > Review of attachment 368120 [details] [review] [review]: > > Would it be better to use videomemory always(default)? > According to msdk spec,system-mem is only preferred if software decoder is > in use. Um.. I was thinking so but copying from vaDeriveImage to other user space is sometimes very slow. For example, in this pipeline msdkh264dec ! videoconvert ! xvimagesink, if the decoder use video memory, playing is very slow probably on skylake. I already sent this thing with some links to be referenced. How about doing that (video memory by default in decoder) after we finish the task for handling dmabuf?
(In reply to Hyunjun Ko from comment #95) > (In reply to sreerenj from comment #92) > > Review of attachment 368115 [details] [review] [review] [review]: > > > > ::: sys/msdk/gstmsdkdec.c > > @@ +110,3 @@ > > + frame->output_buffer = NULL; > > + goto retry; > > + * We should be keep these buffers and check if it's unlocked. > > > > I remember you were talking about the possibility to keep a separate surface > > pool and writing our own acquire_buffer() for the bufferpool. How about just > > overriding the acquire_buffer() as: > > > > gst_msdk_buffer_pool_acqurre_buffer{ > > buffer = GST_BUFFER_POOL_CLASS > > (gst_msdk_buffer_pool_parent_class)->acquire_buffer() > > if (buffer.surface.Data.locked) > > release_buffer() and try again. > > else > > ret buffer; > > } > > > > That's exactly what i was thinking about. and that's why I was confused > since you said it would be complicated. > I'm planning to implement it after these patches are merged. > > Aha, okay. Sorry about the confusion :) Hope we don't need one more pool and a referenced-object in msdk video meta..
(In reply to Hyunjun Ko from comment #96) > (In reply to sreerenj from comment #93) > > Review of attachment 368118 [details] [review] [review] [review]: > > > > Assume we have vpp. > > > > How about the pipeline: msdkh264dec ! msdkvpp ! textoverlay ! msdkvpp". > > > > Here is decoder is supposed to create a child session, but it won't I > > believe. WDT? > > As per my undestanding, the decoder in this pipeline doesn't need to join > the session from vpp but just share it, does it? The decoder tries to find a context from downstream first. right? So it could be receiving a context from downstream (2nd vpp) element. Please correct me if I am wrong, Only one dec, enc & vpp are allowed in a single session. So, the decoder and first vpp should join the session by creating its own child. > But for vpps in this pipeline, seems it should create its own session > seperately because there is an element that should be using system memory > while the patch doesn't do that. textoverlay will try to map, and we do the vaDeriveImgae internally. Is there anything to do with session sharing here? > > Can we think about this kind of complexed pipelines later or we should > handle this in this thread too? This is something related to the fundamental design. I don't want to re-write the whole thing when we add VPP. So just trying to make sure that we understand the possible use cases :) If you think it can handle the complex pipelines with this approach then it is more than enough for now. Implementation can be done later of course (after the release of 1.14).
(In reply to Hyunjun Ko from comment #97) > (In reply to sreerenj from comment #94) > > Review of attachment 368120 [details] [review] [review] [review]: > > > > Would it be better to use videomemory always(default)? > > According to msdk spec,system-mem is only preferred if software decoder is > > in use. > > Um.. I was thinking so but copying from vaDeriveImage to other user space is > sometimes very slow. > > For example, in this pipeline msdkh264dec ! videoconvert ! xvimagesink, > if the decoder use video memory, playing is very slow probably on skylake. > > I already sent this thing with some links to be referenced. > > How about doing that (video memory by default in decoder) after we finish > the task for handling dmabuf? Yup, I am fine with that. We can do the benchmarking and take a decision based on that.
Review of attachment 368119 [details] [review]: For a note, when you add a mechanism to avoid "explicit syncing", accumulating the async depth from different msdk components should be handled in a better way. Because for eg: vpp+enc only need one increment of "async-depth" in those cases. Otherwise patch lgtm.
Review of attachment 368116 [details] [review]: lgtm.
Review of attachment 368121 [details] [review]: lgtm
Review of attachment 368117 [details] [review]: Same comments as for 368118. Otherwise lgtm.
(In reply to sreerenj from comment #99) > (In reply to Hyunjun Ko from comment #96) > > (In reply to sreerenj from comment #93) > > > Review of attachment 368118 [details] [review] [review] [review] [review]: > > > > > > Assume we have vpp. > > > > > > How about the pipeline: msdkh264dec ! msdkvpp ! textoverlay ! msdkvpp". > > > > > > Here is decoder is supposed to create a child session, but it won't I > > > believe. WDT? > > > > As per my undestanding, the decoder in this pipeline doesn't need to join > > the session from vpp but just share it, does it? > > The decoder tries to find a context from downstream first. right? > So it could be receiving a context from downstream (2nd vpp) element. Right. > Please correct me if I am wrong, Only one dec, enc & vpp are allowed in a > single session. So, the decoder and first vpp should join the session by > creating its own child. Yes, it's correct and then only vpp1 created a child context with joined session by this patch. So you're saying that decoder should do the same as vpp1? I'm not sure about it. > > > But for vpps in this pipeline, seems it should create its own session > > seperately because there is an element that should be using system memory > > while the patch doesn't do that. > > textoverlay will try to map, and we do the vaDeriveImgae internally. > Is there anything to do with session sharing here? Ah sorry. Forget about this comment plz. I was confused with other thing. > > > > > Can we think about this kind of complexed pipelines later or we should > > handle this in this thread too? > > This is something related to the fundamental design. I don't want to > re-write the whole thing when we add VPP. So just trying to make sure that > we understand the possible use cases :) > If you think it can handle the complex pipelines with this approach then it > is more than enough for now. Implementation can be done later of course > (after the release of 1.14). Yeah, current way is so simple and I don't belive it can handle all cases. Let's see what happens.
(In reply to sreerenj from comment #101) > Review of attachment 368119 [details] [review] [review]: > > For a note, when you add a mechanism to avoid "explicit syncing", > accumulating the async depth from different msdk components should be > handled in a better way. Because for eg: vpp+enc only need one increment of > "async-depth" in those cases. > Otherwise patch lgtm. Thansk for the note. This will be really helpful to me. Could you tell me where I can find a document saying this kind of things if exists?
(In reply to Ishmael Visayana Sameen from comment #91) > > Strange!, We should compare with msdk sample applications. > > > > I have a guess, we don't do the "DecodeHeader" before querying for required > > IO surface, instead, we just fill the mfxVideoParam manually and there could > > a possibility that we are not providing all the information needed. But it > > is fine for now, We can fix this later (both decoder and encoder require a > > bunch of fixes I believe). Would be nice to file a bug to keep track of this. > > Jeez, everything can be easily guessed when you have such a good reference > :) Your guess is right. Hmm but current sample code in MediaSDK also require over 20 surfaces even though it calls DecodeHeader. That's still what is confusing me.
(In reply to Hyunjun Ko from comment #107) > (In reply to Ishmael Visayana Sameen from comment #91) > > > Strange!, We should compare with msdk sample applications. > > > > > > I have a guess, we don't do the "DecodeHeader" before querying for required > > > IO surface, instead, we just fill the mfxVideoParam manually and there could > > > a possibility that we are not providing all the information needed. But it > > > is fine for now, We can fix this later (both decoder and encoder require a > > > bunch of fixes I believe). Would be nice to file a bug to keep track of this. > > > > Jeez, everything can be easily guessed when you have such a good reference > > :) Your guess is right. > > Hmm but current sample code in MediaSDK also require over 20 surfaces even > though it calls DecodeHeader. > That's still what is confusing me. Maybe sample might be not correct, I'll check it later :)
Created attachment 368287 [details] [review] msdk: move and rename the function msdk_video_alignment Move the msdk_video_alignment function from decoder to msdk.c and rename so that others could call this function without duplicated declaration.
Created attachment 368288 [details] [review] msdk: adds new utility functions for conversion from gstreamer to libmfx
Created attachment 368289 [details] [review] msdk: libva: adds utility function between mfx and libva
Created attachment 368290 [details] [review] msdk: implements GstMsdkContext. Makes GstMsdkContext to be a descendant of GstObject so that we could track the life-cycle of the session of the driver. Also replaces MsdkContext with this one. Keeps msdk_d3d.c alive for the future.
Created attachment 368291 [details] [review] msdkdec: fix typo
Created attachment 368292 [details] [review] msdk: adds frame allocator using libva Implements msdk frame allocator which is required from the driver. Also makes these functions global so that GstMsdkAllocator could use the allocated video memory later and couple with GstMsdkMemory. GstMsdkContext keeps allocation information such as mfxFrameAllocRequest and mfxFrameAllocResponse after allocation.
Created attachment 368293 [details] [review] msdk: supports bufferpool Implements 2 memory allocators: 1\ GstMsdkSystemAllocator: This will allocate system memory. 2\ GstMsdkVideoAllocator: This will allocate device memory depending on the platform. (eg. VASurface) Currently GstMsdkBufferPool uses video allocator currently by default only on linux. On Windows, we should use system memory until d3d allocator is implemented.
Created attachment 368294 [details] [review] msdkenc: use bufferpool 1\ Proposes msdk bufferpool to upstream. - If upstream has accepted the proposed msdk bufferpool, encoder can get msdk surface from the buffer directly. - If not, encoder get msdk surface its own msdk bufferpool and copy from upstream's frame to the surface. 2\ Replace arrays of surfaces with msdk bufferpool. 3\ In case of using VPP, there should be another msdk bufferpool with NV12 info so that it could convert first and encode. Calls gst_msdk_set_frame_allocator and uses video memory only on linux. and uses system memory on Windows until d3d allocator is implemented.
Created attachment 368295 [details] [review] msdkdec: use bufferpool 1\ In decide_allocation, it makes its own msdk bufferpool. - If downstream supports video meta, it just replace it with the msdk bufferpool. - If not, it uses the msdk bufferpool as a side pool, which will be decoded into. and will copy it to downstream's bufferpool. 2\ Decide if using video memory or system memory. - This is not completed in this patch. - It might be decided in update_src_caps. - But tested for both system memory and video memory cases.
Created attachment 368296 [details] [review] msdk: adds util functions to handle GstContext To share GstMsdkContext with each msdk element, it will be using GstContext. Most of code is from gstreamer-vaapi.
Created attachment 368297 [details] [review] msdk: context: add job type to figure out if joining session is necessary According to the driver's instruction, if there are two or more encoders or decoders in a process, the session should be joined by MFXJoinSession. To achieve this successfully by GstContext, this patch adds job type specified if it's encoder, decoder or vpp. If a msdk element gets to know if joining session is needed by the shared context, it should create another instance of GstContext with joined session, which is not shared.
Created attachment 368298 [details] [review] msdkdec/enc: query GstContext to share GstMsdkContext How to share/create GstMsdkcontext is the following: - Search GstMsdkContext if there's in the pipeline. - If found, check if it's decoder, encoder or vpp by job type. - If it's same job type, it creates another instance of GstMsdkContext with joined-session. - Otherwise just use the shared GstMsdkContext. - If not found, just creates new instance of GstMsdkContext.
Created attachment 368299 [details] [review] msdk: adds async depth from each msdk element to GstMsdkContext to be shared In case that pipeline is like ".. ! decoder ! encoder ! ..." with using video memory, decoder needs to know the async depth of the following msdk element so that it could allocate the correct number of video memory. Otherwise, decoder's memory is exhausted while processing.
Created attachment 368300 [details] [review] msdkdec: use video memory if there's another MSDK context in a pipeline 1\ If downstream's pool is MSDK bufferpool, 2\ If there's shared GstMsdkContext in the pipeline, a decoder decides to use video memory. This policy should be improved to handle more cases.
Created attachment 368301 [details] [review] msdk: Avoid build failures on Windows until d3d allocator is implemented
(In reply to Hyunjun Ko from comment #105) > (In reply to sreerenj from comment #99) > > (In reply to Hyunjun Ko from comment #96) > > > (In reply to sreerenj from comment #93) > > > > Review of attachment 368118 [details] [review] [review] [review] [review] [review]: > > > > > > > > Assume we have vpp. > > > > > > > > How about the pipeline: msdkh264dec ! msdkvpp ! textoverlay ! msdkvpp". > > > > > > > > Here is decoder is supposed to create a child session, but it won't I > > > > believe. WDT? > > > > > > As per my undestanding, the decoder in this pipeline doesn't need to join > > > the session from vpp but just share it, does it? > > > > The decoder tries to find a context from downstream first. right? > > So it could be receiving a context from downstream (2nd vpp) element. > > Right. > > > Please correct me if I am wrong, Only one dec, enc & vpp are allowed in a > > single session. So, the decoder and first vpp should join the session by > > creating its own child. > > Yes, it's correct and then only vpp1 created a child context with joined > session by this patch. So you're saying that decoder should do the same as > vpp1? > I'm not sure about it. K, I think your logic is correct (and nice too) and it should work I believe. Thanks for the clarification.
(In reply to Hyunjun Ko from comment #106) > (In reply to sreerenj from comment #101) > > Review of attachment 368119 [details] [review] [review] [review]: > > > > For a note, when you add a mechanism to avoid "explicit syncing", > > accumulating the async depth from different msdk components should be > > handled in a better way. Because for eg: vpp+enc only need one increment of > > "async-depth" in those cases. > > Otherwise patch lgtm. > > Thansk for the note. This will be really helpful to me. > Could you tell me where I can find a document saying this kind of things if > exists? I look into the msdk spec more than the sample applications. There is a msdk specification included in community edition of IntelMediaServer studio or with the MediaSamples.
Review of attachment 368287 [details] [review]: Pushed:commit 8f0450dad4d861fcae9e264dd672d276f04f5181
Review of attachment 368288 [details] [review]: commit b45147a6d4876bb076a35489b60987570ef8fc5e
Review of attachment 368289 [details] [review]: commit 3f9d0fcaa97b47a2afda863a4dcd7f008f61a43a
Review of attachment 368290 [details] [review]: commit 6ce9a66b80c6b5d72feff1d74bf08f3e71e18330
Review of attachment 368291 [details] [review]: commit 3ed9f7c5f136f4eadeacd3118171f63559527942
Review of attachment 368292 [details] [review]: commit 4d860ec82b7513e7b858b27f259098b4b6b31e3d
Review of attachment 368293 [details] [review]: commit 2542b2d34da53f44b259780acba19907aab3e04d
Review of attachment 368294 [details] [review]: commit 580a52ec4971c70f90df088f5c0b3fff9be3c639
Review of attachment 368295 [details] [review]: commit a66d5620f31f5559da7ebfbef6eaba7cb1fe24d3
Review of attachment 368296 [details] [review]: commit dddb84897f75560d01792fe9a51d432b3f99a020
Review of attachment 368297 [details] [review]: commit dc82ccb9a295862346d785d6496939e6ab2e8139
Review of attachment 368298 [details] [review]: commit f2b35abcabeafe5de0bc78bfcde4e5615deb166d
Review of attachment 368299 [details] [review]: commit 375a50a876f028d967532a84e745fbd38a983299
Review of attachment 368300 [details] [review]: commit 72c6cd55453fd913ca0fb60192ef649b290bc187
Review of attachment 368301 [details] [review]: commit 76a82feae7b541281b7d39822a62a7bfb6993bec
I've found a serious memory leak in msdk plugin when decoding h264, on windows. GstBuffer, GstMemory and MSDK buffer pool all leak, making the plugin unusable - if anybody has any ideas on how best to track down the leak, please let me know. I've already fixed two caps leaks, but the buffer leak causes system memory to be exhausted in a few minutes. To detect the leaks, I use GST_LEAKS_TRACER_STACK_TRACE=1 GST_DEBUG="GST_TRACER:7" GST_TRACERS="leaks" gst-launch-1.0 ....