Bug 759209 – cluttersink: DMABuf support

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 759209 - cluttersink: DMABuf support


Summary:	cluttersink: DMABuf support


Status:	RESOLVED OBSOLETE

Product:	clutter-gst
Classification:	Other
Component:	general
Version:	3.0.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	clutter-gst-maint
QA Contact:	clutter-gst-maint

URL:	https://github.com/ndufresne/clutter-...
Whiteboard:

Depends on:	759358 760363
Blocks:

Reported:	2015-12-08 19:59 UTC by Nicolas Dufresne (ndufresne)
Modified:	2021-05-25 17:32 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
video-sink: Propose a minimum amout of buffers (1.41 KB, patch) 2015-12-08 20:00 UTC, Nicolas Dufresne (ndufresne)	none	Details \| Review
video-sink: Add DMA-Buf support (14.50 KB, patch) 2015-12-08 20:00 UTC, Nicolas Dufresne (ndufresne)	none	Details \| Review
video-sink: Add DMA-Buf support (15.20 KB, patch) 2015-12-08 22:45 UTC, Nicolas Dufresne (ndufresne)	none	Details \| Review
video-sink: Add DMA-Buf support (10.36 KB, patch) 2016-01-09 14:16 UTC, Lionel Landwerlin	reviewed	Details \| Review
video-sink: Add DMA-Buf support (9.96 KB, patch) 2016-01-09 14:28 UTC, Lionel Landwerlin	none	Details \| Review
video-sink: Add DMA-Buf support (9.99 KB, patch) 2016-01-12 21:50 UTC, Lionel Landwerlin	none	Details \| Review
video-sink: Propose a minimum amout of buffers (1.74 KB, patch) 2017-07-06 17:57 UTC, Daniel Drake	none	Details \| Review

Description Nicolas Dufresne (ndufresne) 2015-12-08 19:59:47 UTC

For performance reason, using dmabuf/eglimage performs much faster then standard Texture2D uploads. I have implemented a working draft that I use on MALI and a platform with a v4l2 video decoder. I'm sharing this work here, but I understand this is not entirely correct. Here's the reason

1. EGLImage support should be provide by COGL

Right now, my branch simply get the GL method function using GModule and eglGetProcAddress. It then create an EGLImage which is not a cogl object. Note that EGLImage operation are thread save on GL side, care should be taken so they are also threadsafe as COGL images (needed so we can free the EGLImage in the streaming thread).

2. Fences shall be used to keep track of the rendering operation

Right now I implement a cur_buffer/last_buffer dance. That holds on enough buffer so we don't see visual artifact because of a dmabuf being reused by the decoder before the GL thread is done displaying it.

Additionally, more color formats could be implemented. It is quite rare that GL stack do support color conversion and when they do, it's using EXTERNAL_OES texture target. I don't believe is worth adding this. Instead, we can implement all color formats using shaders, importantion can then take place with R8, RG88 texture as needed.

Comment 1 Nicolas Dufresne (ndufresne) 2015-12-08 20:00:28 UTC

Created attachment 316976 [details] [review]
video-sink: Propose a minimum amout of buffers

This is needed in order for v4l2 element to not copy when their
queue is near empty. This would break zero-copy. Note that queue
don't currently interact with this number, so we need to make sure
it's big enough to accommodate some jitter.

Comment 2 Nicolas Dufresne (ndufresne) 2015-12-08 20:00:32 UTC

Created attachment 316977 [details] [review]
video-sink: Add DMA-Buf support

This is a BGRx only implementation of DMABuf importation.

Comment 3 Nicolas Dufresne (ndufresne) 2015-12-08 22:45:20 UTC

Created attachment 316986 [details] [review]
video-sink: Add DMA-Buf support

This is a BGRx only implementation of DMABuf importation.

Comment 4 Lionel Landwerlin 2015-12-09 09:06:48 UTC

Review of attachment 316976 [details] [review]:

Looks good.
I wonder how gst-vaapi is going to react though :)

Comment 5 Lionel Landwerlin 2015-12-09 09:11:56 UTC

Review of attachment 316986 [details] [review]:

Thanks, that's really cool.

Though I was discussing that with one of the (former) Cogl maintainer (Robert Bragg).
And he thought it would be nice to have support for dmabuf into Cogl directly.
Much like Cogl's X11 pixmap support : https://git.gnome.org/browse/cogl/tree/cogl/winsys/cogl-texture-pixmap-x11.h?h=cogl-1.22

If you don't have time to look into this, I'll try to find some time to do it.
It might be useful to have dmabuf support there for other people too.

Comment 6 Nicolas Dufresne (ndufresne) 2015-12-09 14:27:20 UTC

The really nice thing about adding it to Cogl, is that we would no longer have to destroy the texture everytime to ensure the CoglPipeline get dirtied and redrawn entirely. I'm very uncertain if I will have time, but I wanted to share this code, to show how. Note that if we include it as-is, we'll have to fix the DRM format used. I'm using DRM_FORMAT_BGRA8888 but that's a workaround to libMALI bug. 32bits DRM_FORMAT are all express in order you get when loaded into a 32bits register. So on little-endian, this format shall be DRM_FORMAT_ARGB8888. I can fix the patch and add big-endian support while at it.

Additionally, we could add NV12 support. This is usually done with two ELGImage, one using DRM_FORMAT_R8 and another with DRM_FORMAT_GR88 and then our NV12 shader can be used. Similar apply if we can I420 and other YUV formats. It's known to work nicely on Intel (tested with V4L2 cameras and vivid simulator). And when a specific GL stack does not support a format, it simply return BAD_MATCH and we can fallback to slower Texture2D uploads. That's the rational with why I added an upload_dmabuf() method to renderer. Because multiple format can be uploaded using dmabuf, and shaders get to be reused.

(In reply to Lionel Landwerlin from comment #4)
> I wonder how gst-vaapi is going to react though :)

I'm not sure what is the state of dmabuf in the vaapi element. I know some support exist, but I doubt it is automatic yet. I believe that VAAPI will keep providing buffers with gl upload meta. The only side effect, it that it will try and do Texture2D upload as fallback if it didn't work, and that will fail on video_frame_map() (VASurface are not mappable iirc). Also, VAAPI work in both GLX and EGL backend, while DMABuf importation is limited to EGL backend.

Comment 7 Nicolas Dufresne (ndufresne) 2015-12-09 14:32:13 UTC

(In reply to Lionel Landwerlin from comment #4)
> Review of attachment 316976 [details] [review] [review]:
> 
> Looks good.
> I wonder how gst-vaapi is going to react though :)

My answer was out of context. I'm starting to doubt if that patch should have been submitted or not. 6 is quite a random value here. Let's hold on this one, until I get more information. I have a decoder that do the colorspace conversion to BGR32. Normally, decoder don't do that and expose the number of buffers they need to stay active. So maybe that one is just a workaround for that specific decoder.

Comment 8 Lionel Landwerlin 2016-01-09 14:16:43 UTC

Created attachment 318581 [details] [review]
video-sink: Add DMA-Buf support

This is a BGRx only implementation of DMABuf importation.

Based up a previous patch by :
  Nicolas Dufresne <nicolas.dufresne@collabora.com>

Comment 9 Lionel Landwerlin 2016-01-09 14:20:31 UTC

Here is an update with most of the GL stuff being moved into Cogl.
Tested using this patch : https://bugzilla.gnome.org/show_bug.cgi?id=732281
With the following pipeline : COGL_DRIVER=gles2 gst-launch-1.0 videotestsrc ! inteldmabufupload ! clutterautovideosink

Let me know if that works for you too, I had to make additional changes to Cogl so the shader would use a samplerExternalEOS type rather than sampler2D. This might be mesa a bit more picky on the GL spec.

Comment 10 Lionel Landwerlin 2016-01-09 14:28:23 UTC

Created attachment 318582 [details] [review]
video-sink: Add DMA-Buf support

This is a BGRx only implementation of DMABuf importation.

Based up a previous patch by :
  Nicolas Dufresne <nicolas.dufresne@collabora.com>

Comment 11 Nicolas Dufresne (ndufresne) 2016-01-10 14:51:03 UTC

Review of attachment 318581 [details] [review]:

I didn't tested yet, except for the caps feature, which is still under heavy design, and in fact not really needed, this patch looks good. Note that in general we try and avoid the EXTERNAL_EOS issue by only using a subset of formats and then using shaders for color conversion. The reason I didn't need a special sampler, is that on my target I was using BGRA which was natively supported (unlike all the mesa driver which seems to only have RGBA natively). The trick is to use exactly 3 formats to do everything.

RGB565, RGBA8888, RG88 and R8. On Intel, those don't require color conversion, and can be imported as normal 2D texture and normal 2D sampler. With those format, you can implement all conversion supported reusing the shader you already have. On my spare time, I'm also working on Gallium with the goal to gain the same features. As a reference, you can find some code in glupload element.

In general, it is very complicated to implement multi-plane and conversion in the GL stack. So I believe handling this here makes a lot of sense. It also ensure that the colors are the same regardless of the driver being used and reduce the GL subset that we need.

::: clutter-gst/clutter-gst-video-sink.c
@@ +129,3 @@
+#ifdef COGL_HAS_EGL_DMABUF_SUPPORT
+  ";"
+  MAKE_CAPS ("memory:" GST_ALLOCATOR_DMABUF, "{ RGBA }")

This caps feature has not yet been accepted upstream. It is specific to unmerge intel uploader.

Comment 12 Lionel Landwerlin 2016-01-10 18:20:17 UTC

Ok, thanks for the review.
Do you think the dmabuf negotiation stuff will be finalized for Gstreamer 1.8?

I had to use EXTERNAL_OES because of the limitation of the Intel driver : http://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/dri/i965/intel_tex_image.c#n318
Apparently that's purely a software issue.

This implementation also mirrors what ChromeOS is doing for its video decoding integration : https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu/media/rendering_helper.cc&sq=package:chromium&l=424
This test reflect how the gpu-process works and has been running on ARM and Intel devices. But you might be dealing with yet another GL driver that doesn't support that :/

Comment 13 Nicolas Dufresne (ndufresne) 2016-01-11 01:45:14 UTC

(In reply to Lionel Landwerlin from comment #12)
> Ok, thanks for the review.
> Do you think the dmabuf negotiation stuff will be finalized for Gstreamer
> 1.8?
> 
> I had to use EXTERNAL_OES because of the limitation of the Intel driver :
> http://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/dri/i965/
> intel_tex_image.c#n318
> Apparently that's purely a software issue.

Well of course if you create an external oes texture, you can also create normal texture with that driver. I think cogl_texture_dmabuf_egl_new() API is limiting here, it should let you choose the texture target. We implemented and tested 21 formats for importing DMABuf on Intel drivers with the technique I describe in libgstgl, and no external EOS.

> 
> This implementation also mirrors what ChromeOS is doing for its video
> decoding integration :
> https://code.google.com/p/chromium/codesearch#chromium/src/content/common/
> gpu/media/rendering_helper.cc&sq=package:chromium&l=424
> This test reflect how the gpu-process works and has been running on ARM and
> Intel devices. But you might be dealing with yet another GL driver that
> doesn't support that :/

It does not reflect anything. You need to understand why and when external EOS is needed. In Chromium, we use External EOS because we assume the GL stack is doing the color conversion for us. To we create the EGLImage with DRM_FORMAT_NV12 or similar, and expect the resulting texture to be RGBA. While if you use 2D_TEXTURE, you will in fact create two textures, one with DRM_FORMAT_R8 (the Y plane) and one with DRM_FORMAT_GR88 (the UV plane) and use your usual shaer to do color conversion. I first learned this method from XBMC and was designed specifically for the Intel driver that does not do any color conversion except swizzling. The fact is that using a shader for the color conversion is in most GL stack the same thing as using an external EOS sampler.

To make it clear, if you want a whide variety of color format on a wide variety of GL stack, don't depend on EXTERNAL_EOS and the stack doing the color conversion. The true fact is that the code in Chromium only works with MALI driver, in fact it only works with MALI 6XX series as previous series used different color formats.

Hope this help you. I know the code I produces wasn't aligned with that, but it was dedicated for MALI.

Comment 14 Lionel Landwerlin 2016-01-12 21:50:44 UTC

Created attachment 318924 [details] [review]
video-sink: Add DMA-Buf support

This is a BGRx only implementation of DMABuf importation.

Based up a previous patch by :
  Nicolas Dufresne <nicolas.dufresne@collabora.com>

Comment 15 Nicolas Dufresne (ndufresne) 2016-09-21 15:52:20 UTC

Looks like the way forward. Having it in cogl will also also implement wayland/dmabuf interface in gnome-shell.

Comment 16 Nicolas Dufresne (ndufresne) 2016-09-21 16:03:13 UTC

Review of attachment 318924 [details] [review]:

One performance notice, we'll be create 1 EGLImage/Texture pair per upload. Ideally, those should be cached, so they can be reused when the dmabuf comes back. In gstgl, we attach EGLImage to the GstDmabufMemory, and bind that EGL image to the texture to be rendered. This improve performance and reduce the resources.

Comment 17 Nicolas Dufresne (ndufresne) 2017-05-16 19:43:05 UTC

So, what's the status ? Was this merged in Cogl now ?

Comment 18 Daniel Drake 2017-06-28 20:02:49 UTC

I see this has potentially moved beyond the original approach now (great!) but to leave some feedback anyway:

The "Propose a minimum amout of buffers" patch breaks the cheese webcam live view on a laptop webcam I have here:

 Failed to configure the buffer pool: gstbasetransform.c(968): gst_base_transform_default_decide_allocation (): /GstCameraBin:camerabin/GstViewfinderBin:vf-bin/GstVideoConvert:vfbin-csp:
Configuration is most likely invalid, please report this issue.

When clutter_gst_video_sink_propose_allocation() is called, priv->info.size is 0 which we then pass to gst_query_add_allocation_pool(), and I imagine the 0 size ends up in some breakage.

On an external camera connected to the same laptop, it works fine because clutter_gst_video_sink_propose_allocation() is never called. Not sure what the difference is but I did go one step down in the long backtrace and found that on the working webcam, gst_base_sink_default_query() is never called for GST_QUERY_ALLOCATION.

Comment 19 Daniel Drake 2017-06-29 14:33:29 UTC

Oh, now I tested the original case (accelerated video playback on ARM) and size is also 0 there.

Comment 20 Daniel Drake 2017-07-06 17:57:13 UTC

Created attachment 355028 [details] [review]
video-sink: Propose a minimum amout of buffers

In case it is useful, here is a modification to that patch which seems to work in both use cases.

We were passing size 0 on the accelerated video playback path and this was apparently not causing any issues, but the same codepath with size 0 executing in the context of cheese was breaking webcam usage.

I see that clutter_gst_video_sink_parse_caps() should populate priv->info with the size info, but this function is only called after propose_allocation.

So I adapted propose_allocation to do what a lot of other gst components do (e.g. gstvideorate) - parse the caps right there to get the size. Not sure if it's really correct, but both accelerated video playback and cheese are now working.

Comment 21 Nicolas Dufresne (ndufresne) 2017-08-03 02:47:14 UTC

Hi Daniel, it's all right to parse the caps from the allocation query. Notably, with VAAPI, these caps are padded caps, means the width/height has been increased in a way the the resulting size matches the requirement. This have not been generalized yet though. Size 0 is invalid, so if it works, it was mostly by accident. Cameras produces different video format, which leads to different code path being used.

Comment 22 André Klapper 2021-05-25 17:32:39 UTC

GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new enhancement request ticket at
  https://gitlab.gnome.org/GNOME/clutter-gst/-/issues/

Thank you for your understanding and your help.