GNOME Bugzilla – Bug 745372
vpxdec: Should try to avoid copies at output
Last modified: 2015-12-04 22:33:05 UTC
Currently our implementation of vp8dec always copies the output images. Even though I haven't had a very deep look yet, I notice that there is APIs to control the allocation and life time of a vp8 image. Of course VP8 requires a certain alignment to be respected (just like gst-libav) but overall nothing very special that would explain why we would need to copy the entire frames.
If there's API now that would be great :) Back when the element was written, there didn't seem to be any useable API for that.
There was http://www.webmproject.org/docs/vp8-sdk/group__cap__xma.html but it was removed in libvpx git. I don't see any replacement. Apparently it was removed because the capability was not implemented in the decoder itself, it was just theoretical code that couldn't be used.
In newer libvpx versions there is vpx_codec_set_frame_buffer_functions()
I wonder what the vpx image api is for then ....
I went to take a look at this now that libvpx 1.4 is out. Either I'm missing something or it isn't as useful as it seems. It allows you to allocate the memory and release it yourself but it doesn't allow to use arbitrary strides. So you need to be lucky enough to match what libvpx needs with what your video output needs. So, for example, when decoding a video where the Y plane had a width of 512, libvpx wants a stride of 576 while xvimagesink wants 1024. I also tried with a video width 1024 and libvpx wantes 1088 and xvimagesink 2048. For the Y plane at least, it seems libvpx uses width+64 as the stride. Hopefully I'm missing something here.
Don't you just provide a VPXImage for allocation, where you then provide strides and per-plane pointers?
http://cgit.freedesktop.org/~thiagoss/gst-plugins-good/log/?h=vpx-1.4 My unfinished attempt can be found here, just in case
You provide this struct: typedef struct vpx_codec_frame_buffer { uint8_t *data; /**< Pointer to the data buffer */ size_t size; /**< Size of data in bytes */ void *priv; /**< Frame's private data */ } vpx_codec_frame_buffer_t; It is just a memory area and its size, along with an 'user data pointer' This is in vpx_frame_buffer.h
How super useless...
Bug in webm: https://code.google.com/p/webm/issues/detail?id=988
Created attachment 316563 [details] [review] WIP: vp9dec: Use GstMemory to avoid copies With the VPX decoders it's not simple to use downstream buffer pool, because we don't know the image size and alignment when buffers get allocated. We can though use GstAllocator (for downstream, or the system allocator) to avoid a copy before pushing if downstream supports GstVideoMeta. This would still cause a copy for sink that requires specialized memory, though greatly improve performance for sink that down like glimagesink and cluttersink.
This is unfinished. I use a difference approach, where I simply use the GstMemory to carry the memory further downstream (allowing to avoid one copy for few sinks like the GL one). This give 15-20% improvement when decoding a 720p VP9 video, though it has the same side effect as FFMPEG where READWIRTE mapped memory get pushed.
If we go that way, we should fix bug 758877 and remove the ugly CUSTOM_ALLOC check. We should not have to check this in my opinion.
Created attachment 316629 [details] [review] vpxdec: Use GstMemory to avoid copies With the VPX decoders it's not simple to use downstream buffer pool, because we don't know the image size and alignment when buffers get allocated. We can though use GstAllocator (for downstream, or the system allocator) to avoid a copy before pushing if downstream supports GstVideoMeta. This would still cause a copy for sink that requires specialized memory and does not have a GstAllocator for that, though it will greatly improve performance for sink like glimagesink and cluttersink.
Remains to remove the CUSTOM alloc check.
Created attachment 316631 [details] [review] vpxdec: Use GstMemory to avoid copies With the VPX decoders it's not simple to use downstream buffer pool, because we don't know the image size and alignment when buffers get allocated. We can though use GstAllocator (for downstream, or the system allocator) to avoid a copy before pushing if downstream supports GstVideoMeta. This would still cause a copy for sink that requires specialized memory and does not have a GstAllocator for that, though it will greatly improve performance for sink like glimagesink and cluttersink.
Comment on attachment 316631 [details] [review] vpxdec: Use GstMemory to avoid copies Looks good to me, but I don't like that we push write-mapped memory downstream :) That's the same as with gst-libav though, so we support it well enough already... just should fix that part sooner or later.
Agreed,
Comment on attachment 316631 [details] [review] vpxdec: Use GstMemory to avoid copies Ok, found a slight issue. When I append the memory, the buffer will try to get an exclusive reference (_memory_get_exclusive_reference()). That means it will GST_LOCK_FLAG_EXCLUSIVE and gst_memory_ref(). Though, the buffer is already write locked, which prevent the exclusive lock, so the gst_buffer_append() call will copy the memory. So we effectively endup with the same performance. That means for this use case, I strictly need to be able to downgrade to a read lock.
Created attachment 316796 [details] [review] vpxdec: Use GstMemory to avoid copies With the VPX decoders it's not simple to use downstream buffer pool, because we don't know the image size and alignment when buffers get allocated. We can though use GstAllocator (for downstream, or the system allocator) to avoid a copy before pushing if downstream supports GstVideoMeta. This would still cause a copy for sink that requires specialized memory and does not have a GstAllocator for that, though it will greatly improve performance for sink like glimagesink and cluttersink. To avoid allocating for every buffer, we also use a internal buffer pool.
Created attachment 316797 [details] [review] vpxdec: Use GstMemory to avoid copies With the VPX decoders it's not simple to use downstream buffer pool, because we don't know the image size and alignment when buffers get allocated. We can though use GstAllocator (for downstream, or the system allocator) to avoid a copy before pushing if downstream supports GstVideoMeta. This would still cause a copy for sink that requires specialized memory and does not have a GstAllocator for that, though it will greatly improve performance for sink like glimagesink and cluttersink. To avoid allocating for every buffer, we also use a internal buffer pool.
Sorry for porting twice, in last version I fix a small difference between vp8/vp9. I added FIXME that referes to memory_remap() needed feature.
Oh, and that version uses a buffer pool, creates buffer with video meta already in place.
Attachment 316797 [details] pushed as 189c291 - vpxdec: Use GstMemory to avoid copies