GNOME Bugzilla – Bug 759209
cluttersink: DMABuf support
Last modified: 2021-05-25 17:32:39 UTC
For performance reason, using dmabuf/eglimage performs much faster then standard Texture2D uploads. I have implemented a working draft that I use on MALI and a platform with a v4l2 video decoder. I'm sharing this work here, but I understand this is not entirely correct. Here's the reason 1. EGLImage support should be provide by COGL Right now, my branch simply get the GL method function using GModule and eglGetProcAddress. It then create an EGLImage which is not a cogl object. Note that EGLImage operation are thread save on GL side, care should be taken so they are also threadsafe as COGL images (needed so we can free the EGLImage in the streaming thread). 2. Fences shall be used to keep track of the rendering operation Right now I implement a cur_buffer/last_buffer dance. That holds on enough buffer so we don't see visual artifact because of a dmabuf being reused by the decoder before the GL thread is done displaying it. Additionally, more color formats could be implemented. It is quite rare that GL stack do support color conversion and when they do, it's using EXTERNAL_OES texture target. I don't believe is worth adding this. Instead, we can implement all color formats using shaders, importantion can then take place with R8, RG88 texture as needed.
Created attachment 316976 [details] [review] video-sink: Propose a minimum amout of buffers This is needed in order for v4l2 element to not copy when their queue is near empty. This would break zero-copy. Note that queue don't currently interact with this number, so we need to make sure it's big enough to accommodate some jitter.
Created attachment 316977 [details] [review] video-sink: Add DMA-Buf support This is a BGRx only implementation of DMABuf importation.
Created attachment 316986 [details] [review] video-sink: Add DMA-Buf support This is a BGRx only implementation of DMABuf importation.
Review of attachment 316976 [details] [review]: Looks good. I wonder how gst-vaapi is going to react though :)
Review of attachment 316986 [details] [review]: Thanks, that's really cool. Though I was discussing that with one of the (former) Cogl maintainer (Robert Bragg). And he thought it would be nice to have support for dmabuf into Cogl directly. Much like Cogl's X11 pixmap support : https://git.gnome.org/browse/cogl/tree/cogl/winsys/cogl-texture-pixmap-x11.h?h=cogl-1.22 If you don't have time to look into this, I'll try to find some time to do it. It might be useful to have dmabuf support there for other people too.
The really nice thing about adding it to Cogl, is that we would no longer have to destroy the texture everytime to ensure the CoglPipeline get dirtied and redrawn entirely. I'm very uncertain if I will have time, but I wanted to share this code, to show how. Note that if we include it as-is, we'll have to fix the DRM format used. I'm using DRM_FORMAT_BGRA8888 but that's a workaround to libMALI bug. 32bits DRM_FORMAT are all express in order you get when loaded into a 32bits register. So on little-endian, this format shall be DRM_FORMAT_ARGB8888. I can fix the patch and add big-endian support while at it. Additionally, we could add NV12 support. This is usually done with two ELGImage, one using DRM_FORMAT_R8 and another with DRM_FORMAT_GR88 and then our NV12 shader can be used. Similar apply if we can I420 and other YUV formats. It's known to work nicely on Intel (tested with V4L2 cameras and vivid simulator). And when a specific GL stack does not support a format, it simply return BAD_MATCH and we can fallback to slower Texture2D uploads. That's the rational with why I added an upload_dmabuf() method to renderer. Because multiple format can be uploaded using dmabuf, and shaders get to be reused. (In reply to Lionel Landwerlin from comment #4) > I wonder how gst-vaapi is going to react though :) I'm not sure what is the state of dmabuf in the vaapi element. I know some support exist, but I doubt it is automatic yet. I believe that VAAPI will keep providing buffers with gl upload meta. The only side effect, it that it will try and do Texture2D upload as fallback if it didn't work, and that will fail on video_frame_map() (VASurface are not mappable iirc). Also, VAAPI work in both GLX and EGL backend, while DMABuf importation is limited to EGL backend.
(In reply to Lionel Landwerlin from comment #4) > Review of attachment 316976 [details] [review] [review]: > > Looks good. > I wonder how gst-vaapi is going to react though :) My answer was out of context. I'm starting to doubt if that patch should have been submitted or not. 6 is quite a random value here. Let's hold on this one, until I get more information. I have a decoder that do the colorspace conversion to BGR32. Normally, decoder don't do that and expose the number of buffers they need to stay active. So maybe that one is just a workaround for that specific decoder.
Created attachment 318581 [details] [review] video-sink: Add DMA-Buf support This is a BGRx only implementation of DMABuf importation. Based up a previous patch by : Nicolas Dufresne <nicolas.dufresne@collabora.com>
Here is an update with most of the GL stuff being moved into Cogl. Tested using this patch : https://bugzilla.gnome.org/show_bug.cgi?id=732281 With the following pipeline : COGL_DRIVER=gles2 gst-launch-1.0 videotestsrc ! inteldmabufupload ! clutterautovideosink Let me know if that works for you too, I had to make additional changes to Cogl so the shader would use a samplerExternalEOS type rather than sampler2D. This might be mesa a bit more picky on the GL spec.
Created attachment 318582 [details] [review] video-sink: Add DMA-Buf support This is a BGRx only implementation of DMABuf importation. Based up a previous patch by : Nicolas Dufresne <nicolas.dufresne@collabora.com>
Review of attachment 318581 [details] [review]: I didn't tested yet, except for the caps feature, which is still under heavy design, and in fact not really needed, this patch looks good. Note that in general we try and avoid the EXTERNAL_EOS issue by only using a subset of formats and then using shaders for color conversion. The reason I didn't need a special sampler, is that on my target I was using BGRA which was natively supported (unlike all the mesa driver which seems to only have RGBA natively). The trick is to use exactly 3 formats to do everything. RGB565, RGBA8888, RG88 and R8. On Intel, those don't require color conversion, and can be imported as normal 2D texture and normal 2D sampler. With those format, you can implement all conversion supported reusing the shader you already have. On my spare time, I'm also working on Gallium with the goal to gain the same features. As a reference, you can find some code in glupload element. In general, it is very complicated to implement multi-plane and conversion in the GL stack. So I believe handling this here makes a lot of sense. It also ensure that the colors are the same regardless of the driver being used and reduce the GL subset that we need. ::: clutter-gst/clutter-gst-video-sink.c @@ +129,3 @@ +#ifdef COGL_HAS_EGL_DMABUF_SUPPORT + ";" + MAKE_CAPS ("memory:" GST_ALLOCATOR_DMABUF, "{ RGBA }") This caps feature has not yet been accepted upstream. It is specific to unmerge intel uploader.
Ok, thanks for the review. Do you think the dmabuf negotiation stuff will be finalized for Gstreamer 1.8? I had to use EXTERNAL_OES because of the limitation of the Intel driver : http://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/dri/i965/intel_tex_image.c#n318 Apparently that's purely a software issue. This implementation also mirrors what ChromeOS is doing for its video decoding integration : https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu/media/rendering_helper.cc&sq=package:chromium&l=424 This test reflect how the gpu-process works and has been running on ARM and Intel devices. But you might be dealing with yet another GL driver that doesn't support that :/
(In reply to Lionel Landwerlin from comment #12) > Ok, thanks for the review. > Do you think the dmabuf negotiation stuff will be finalized for Gstreamer > 1.8? > > I had to use EXTERNAL_OES because of the limitation of the Intel driver : > http://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/dri/i965/ > intel_tex_image.c#n318 > Apparently that's purely a software issue. Well of course if you create an external oes texture, you can also create normal texture with that driver. I think cogl_texture_dmabuf_egl_new() API is limiting here, it should let you choose the texture target. We implemented and tested 21 formats for importing DMABuf on Intel drivers with the technique I describe in libgstgl, and no external EOS. > > This implementation also mirrors what ChromeOS is doing for its video > decoding integration : > https://code.google.com/p/chromium/codesearch#chromium/src/content/common/ > gpu/media/rendering_helper.cc&sq=package:chromium&l=424 > This test reflect how the gpu-process works and has been running on ARM and > Intel devices. But you might be dealing with yet another GL driver that > doesn't support that :/ It does not reflect anything. You need to understand why and when external EOS is needed. In Chromium, we use External EOS because we assume the GL stack is doing the color conversion for us. To we create the EGLImage with DRM_FORMAT_NV12 or similar, and expect the resulting texture to be RGBA. While if you use 2D_TEXTURE, you will in fact create two textures, one with DRM_FORMAT_R8 (the Y plane) and one with DRM_FORMAT_GR88 (the UV plane) and use your usual shaer to do color conversion. I first learned this method from XBMC and was designed specifically for the Intel driver that does not do any color conversion except swizzling. The fact is that using a shader for the color conversion is in most GL stack the same thing as using an external EOS sampler. To make it clear, if you want a whide variety of color format on a wide variety of GL stack, don't depend on EXTERNAL_EOS and the stack doing the color conversion. The true fact is that the code in Chromium only works with MALI driver, in fact it only works with MALI 6XX series as previous series used different color formats. Hope this help you. I know the code I produces wasn't aligned with that, but it was dedicated for MALI.
Created attachment 318924 [details] [review] video-sink: Add DMA-Buf support This is a BGRx only implementation of DMABuf importation. Based up a previous patch by : Nicolas Dufresne <nicolas.dufresne@collabora.com>
Looks like the way forward. Having it in cogl will also also implement wayland/dmabuf interface in gnome-shell.
Review of attachment 318924 [details] [review]: One performance notice, we'll be create 1 EGLImage/Texture pair per upload. Ideally, those should be cached, so they can be reused when the dmabuf comes back. In gstgl, we attach EGLImage to the GstDmabufMemory, and bind that EGL image to the texture to be rendered. This improve performance and reduce the resources.
So, what's the status ? Was this merged in Cogl now ?
I see this has potentially moved beyond the original approach now (great!) but to leave some feedback anyway: The "Propose a minimum amout of buffers" patch breaks the cheese webcam live view on a laptop webcam I have here: Failed to configure the buffer pool: gstbasetransform.c(968): gst_base_transform_default_decide_allocation (): /GstCameraBin:camerabin/GstViewfinderBin:vf-bin/GstVideoConvert:vfbin-csp: Configuration is most likely invalid, please report this issue. When clutter_gst_video_sink_propose_allocation() is called, priv->info.size is 0 which we then pass to gst_query_add_allocation_pool(), and I imagine the 0 size ends up in some breakage. On an external camera connected to the same laptop, it works fine because clutter_gst_video_sink_propose_allocation() is never called. Not sure what the difference is but I did go one step down in the long backtrace and found that on the working webcam, gst_base_sink_default_query() is never called for GST_QUERY_ALLOCATION.
Oh, now I tested the original case (accelerated video playback on ARM) and size is also 0 there.
Created attachment 355028 [details] [review] video-sink: Propose a minimum amout of buffers In case it is useful, here is a modification to that patch which seems to work in both use cases. We were passing size 0 on the accelerated video playback path and this was apparently not causing any issues, but the same codepath with size 0 executing in the context of cheese was breaking webcam usage. I see that clutter_gst_video_sink_parse_caps() should populate priv->info with the size info, but this function is only called after propose_allocation. So I adapted propose_allocation to do what a lot of other gst components do (e.g. gstvideorate) - parse the caps right there to get the size. Not sure if it's really correct, but both accelerated video playback and cheese are now working.
Hi Daniel, it's all right to parse the caps from the allocation query. Notably, with VAAPI, these caps are padded caps, means the width/height has been increased in a way the the resulting size matches the requirement. This have not been generalized yet though. Size 0 is invalid, so if it works, it was mostly by accident. Cameras produces different video format, which leads to different code path being used.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new enhancement request ticket at https://gitlab.gnome.org/GNOME/clutter-gst/-/issues/ Thank you for your understanding and your help.