Bug 745372 – vpxdec: Should try to avoid copies at output

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 745372 - vpxdec: Should try to avoid copies at output


Summary:	vpxdec: Should try to avoid copies at output


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	git master
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	1.7.1
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:	754826 758877
Blocks:

Reported:	2015-03-01 16:28 UTC by Nicolas Dufresne (ndufresne)
Modified:	2015-12-04 22:33 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
WIP: vp9dec: Use GstMemory to avoid copies (5.68 KB, patch) 2015-12-01 00:24 UTC, Nicolas Dufresne (ndufresne)	needs-work	Details \| Review
vpxdec: Use GstMemory to avoid copies (11.41 KB, patch) 2015-12-01 22:09 UTC, Nicolas Dufresne (ndufresne)	none	Details \| Review
vpxdec: Use GstMemory to avoid copies (11.09 KB, patch) 2015-12-01 23:22 UTC, Nicolas Dufresne (ndufresne)	none	Details \| Review
vpxdec: Use GstMemory to avoid copies (14.39 KB, patch) 2015-12-04 22:25 UTC, Nicolas Dufresne (ndufresne)	none	Details \| Review
vpxdec: Use GstMemory to avoid copies (14.52 KB, patch) 2015-12-04 22:30 UTC, Nicolas Dufresne (ndufresne)	committed	Details \| Review

Description Nicolas Dufresne (ndufresne) 2015-03-01 16:28:57 UTC

Currently our implementation of vp8dec always copies the output images. Even though I haven't had a very deep look yet, I notice that there is APIs to control the allocation and life time of a vp8 image. Of course VP8 requires a certain alignment to be respected (just like gst-libav) but overall nothing very special that would explain why we would need to copy the entire frames.

Comment 1 Sebastian Dröge (slomo) 2015-03-01 17:38:29 UTC

If there's API now that would be great :) Back when the element was written, there didn't seem to be any useable API for that.

Comment 2 Thiago Sousa Santos 2015-03-09 15:22:39 UTC

There was http://www.webmproject.org/docs/vp8-sdk/group__cap__xma.html but it was removed in libvpx git. I don't see any replacement.

Apparently it was removed because the capability was not implemented in the decoder itself, it was just theoretical code that couldn't be used.

Comment 3 Sebastian Dröge (slomo) 2015-03-09 15:38:39 UTC

In newer libvpx versions there is vpx_codec_set_frame_buffer_functions()

Comment 4 Nicolas Dufresne (ndufresne) 2015-03-09 15:39:19 UTC

I wonder what the vpx image api is for then ....

Comment 5 Thiago Sousa Santos 2015-04-17 13:38:17 UTC

I went to take a look at this now that libvpx 1.4 is out. Either I'm missing something or it isn't as useful as it seems. It allows you to allocate the memory and release it yourself but it doesn't allow to use arbitrary strides. So you need to be lucky enough to match what libvpx needs with what your video output needs.

So, for example, when decoding a video where the Y plane had a width of 512, libvpx wants a stride of 576 while xvimagesink wants 1024.
I also tried with a video width 1024 and libvpx wantes 1088 and xvimagesink 2048.

For the Y plane at least, it seems libvpx uses width+64 as the stride.

Hopefully I'm missing something here.

Comment 6 Sebastian Dröge (slomo) 2015-04-17 13:43:25 UTC

Don't you just provide a VPXImage for allocation, where you then provide strides and per-plane pointers?

Comment 7 Thiago Sousa Santos 2015-04-17 14:06:42 UTC

http://cgit.freedesktop.org/~thiagoss/gst-plugins-good/log/?h=vpx-1.4

My unfinished attempt can be found here, just in case

Comment 8 Thiago Sousa Santos 2015-04-17 14:07:48 UTC

You provide this struct:
typedef struct vpx_codec_frame_buffer {
  uint8_t *data;  /**< Pointer to the data buffer */
  size_t size;  /**< Size of data in bytes */
  void *priv;  /**< Frame's private data */
} vpx_codec_frame_buffer_t;


It is just a memory area and its size, along with an 'user data pointer'

This is in vpx_frame_buffer.h

Comment 9 Sebastian Dröge (slomo) 2015-04-17 14:35:42 UTC

How super useless...

Comment 10 Thiago Sousa Santos 2015-04-17 18:30:39 UTC

Bug in webm: https://code.google.com/p/webm/issues/detail?id=988

Comment 11 Nicolas Dufresne (ndufresne) 2015-12-01 00:24:18 UTC

Created attachment 316563 [details] [review]
WIP: vp9dec: Use GstMemory to avoid copies

With the VPX decoders it's not simple to use downstream buffer pool,
because we don't know the image size and alignment when buffers get
allocated. We can though use GstAllocator (for downstream, or the system
allocator) to avoid a copy before pushing if downstream supports
GstVideoMeta. This would still cause a copy for sink that requires
specialized memory, though greatly improve performance for sink that
down like glimagesink and cluttersink.

Comment 12 Nicolas Dufresne (ndufresne) 2015-12-01 00:29:17 UTC

This is unfinished. I use a difference approach, where I simply use the GstMemory to carry the memory further downstream (allowing to avoid one copy for few sinks like the GL one). This give 15-20% improvement when decoding a 720p VP9 video, though it has the same side effect as FFMPEG where READWIRTE mapped memory get pushed.

Comment 13 Nicolas Dufresne (ndufresne) 2015-12-01 00:31:12 UTC

If we go that way, we should fix bug 758877 and remove the ugly CUSTOM_ALLOC check. We should not have to check this in my opinion.

Comment 14 Nicolas Dufresne (ndufresne) 2015-12-01 22:09:17 UTC

Created attachment 316629 [details] [review]
vpxdec: Use GstMemory to avoid copies

With the VPX decoders it's not simple to use downstream buffer pool,
because we don't know the image size and alignment when buffers get
allocated. We can though use GstAllocator (for downstream, or the system
allocator) to avoid a copy before pushing if downstream supports
GstVideoMeta. This would still cause a copy for sink that requires
specialized memory and does not have a GstAllocator for that, though
it will greatly improve performance for sink like glimagesink and
cluttersink.

Comment 15 Nicolas Dufresne (ndufresne) 2015-12-01 22:10:44 UTC

Remains to remove the CUSTOM alloc check.

Comment 16 Nicolas Dufresne (ndufresne) 2015-12-01 23:22:59 UTC

Created attachment 316631 [details] [review]
vpxdec: Use GstMemory to avoid copies

With the VPX decoders it's not simple to use downstream buffer pool,
because we don't know the image size and alignment when buffers get
allocated. We can though use GstAllocator (for downstream, or the system
allocator) to avoid a copy before pushing if downstream supports
GstVideoMeta. This would still cause a copy for sink that requires
specialized memory and does not have a GstAllocator for that, though
it will greatly improve performance for sink like glimagesink and
cluttersink.

Comment 17 Sebastian Dröge (slomo) 2015-12-02 08:21:27 UTC

Comment on attachment 316631 [details] [review]
vpxdec: Use GstMemory to avoid copies

Looks good to me, but I don't like that we push write-mapped memory downstream :) That's the same as with gst-libav though, so we support it well enough already... just should fix that part sooner or later.

Comment 18 Nicolas Dufresne (ndufresne) 2015-12-02 14:01:08 UTC

Agreed,

Comment 19 Nicolas Dufresne (ndufresne) 2015-12-03 20:57:09 UTC

Comment on attachment 316631 [details] [review]
vpxdec: Use GstMemory to avoid copies

Ok, found a slight issue. When I append the memory, the buffer will try to get an exclusive reference (_memory_get_exclusive_reference()). That means it will GST_LOCK_FLAG_EXCLUSIVE and gst_memory_ref(). Though, the buffer is already write locked, which prevent the exclusive lock, so the gst_buffer_append() call will copy the memory. So we effectively endup with the same performance.

That means for this use case, I strictly need to be able to downgrade to a read lock.

Comment 20 Nicolas Dufresne (ndufresne) 2015-12-04 22:25:20 UTC

Created attachment 316796 [details] [review]
vpxdec: Use GstMemory to avoid copies

With the VPX decoders it's not simple to use downstream buffer pool,
because we don't know the image size and alignment when buffers get
allocated. We can though use GstAllocator (for downstream, or the system
allocator) to avoid a copy before pushing if downstream supports
GstVideoMeta. This would still cause a copy for sink that requires
specialized memory and does not have a GstAllocator for that, though
it will greatly improve performance for sink like glimagesink and
cluttersink. To avoid allocating for every buffer, we also use a
internal buffer pool.

Comment 21 Nicolas Dufresne (ndufresne) 2015-12-04 22:30:01 UTC

Created attachment 316797 [details] [review]
vpxdec: Use GstMemory to avoid copies

With the VPX decoders it's not simple to use downstream buffer pool,
because we don't know the image size and alignment when buffers get
allocated. We can though use GstAllocator (for downstream, or the system
allocator) to avoid a copy before pushing if downstream supports
GstVideoMeta. This would still cause a copy for sink that requires
specialized memory and does not have a GstAllocator for that, though
it will greatly improve performance for sink like glimagesink and
cluttersink. To avoid allocating for every buffer, we also use a
internal buffer pool.

Comment 22 Nicolas Dufresne (ndufresne) 2015-12-04 22:31:37 UTC

Sorry for porting twice, in last version I fix a small difference between vp8/vp9. I added FIXME that referes to memory_remap() needed feature.

Comment 23 Nicolas Dufresne (ndufresne) 2015-12-04 22:32:31 UTC

Oh, and that version uses a buffer pool, creates buffer with video meta already in place.

Comment 24 Nicolas Dufresne (ndufresne) 2015-12-04 22:32:46 UTC

Attachment 316797 [details] pushed as 189c291 - vpxdec: Use GstMemory to avoid copies