Bug 711155 – wayland: add DMABuf support to wayland sink

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 711155 - wayland: add DMABuf support to wayland sink


Summary:	wayland: add DMABuf support to wayland sink


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-bad
Version:	1.x
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	1.11.1
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:	https://git.collabora.com/cgit/user/g...
Whiteboard:

Depends on:	759358 767671
Blocks:

Reported:	2013-10-30 15:09 UTC by Benjamin Gaignard
Modified:	2016-11-03 19:39 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
waylandsin: add wl_drm support (23.70 KB, patch) 2013-10-30 15:09 UTC, Benjamin Gaignard	needs-work	Details \| Review
waylandsink: add wl_drm support (20.86 KB, patch) 2013-11-20 12:42 UTC, Benjamin Gaignard	none	Details \| Review
make waylandsink support wl_dmabuf protocol (19.33 KB, patch) 2013-12-03 18:05 UTC, Benjamin Gaignard	none	Details \| Review
make waylandsink support wl_dmabuf protocol (20.44 KB, patch) 2014-01-07 17:45 UTC, Benjamin Gaignard	none	Details \| Review
make waylandsink support wl_dmabuf protocol (19.10 KB, patch) 2015-09-02 08:42 UTC, Benjamin Gaignard	none	Details \| Review
make waylandsink use zlinux_dmabuf upstreamed protocol (39.81 KB, patch) 2015-09-15 10:34 UTC, Benjamin Gaignard	needs-work	Details \| Review
linux_dmabuf protocol support (36.90 KB, patch) 2015-11-24 09:01 UTC, Fabien Dessenne	none	Details \| Review
linux_dmabuf protocol support v2 (28.03 KB, patch) 2016-08-10 13:52 UTC, Fabien Dessenne	none	Details \| Review
[v2] waylandsink: support linux dmabuf protocol (26.01 KB, patch) 2016-09-21 14:02 UTC, Nicolas Dufresne (ndufresne)	none	Details \| Review
[v3 ]waylandsink: support linux dmabuf protocol (26.14 KB, patch) 2016-09-21 15:12 UTC, Nicolas Dufresne (ndufresne)	none	Details \| Review
[v4] waylandsink: support linux dmabuf protocol (26.56 KB, patch) 2016-09-22 09:15 UTC, Fabien Dessenne	none	Details \| Review
[v5] waylandsink: support linux dmabuf protocol (10.28 KB, patch) 2016-09-26 15:20 UTC, Fabien Dessenne	none	Details \| Review
waylandsink: Allow any kind of FD for shm memory (2.44 KB, patch) 2016-11-02 15:47 UTC, Nicolas Dufresne (ndufresne)	committed	Details \| Review
waylandsink: support linux dmabuf protocol (26.42 KB, patch) 2016-11-02 15:47 UTC, Nicolas Dufresne (ndufresne)	committed	Details \| Review
waylandsink: Rework dmabuf support (11.16 KB, patch) 2016-11-02 15:47 UTC, Nicolas Dufresne (ndufresne)	committed	Details \| Review

Description Benjamin Gaignard 2013-10-30 15:09:41 UTC

Created attachment 258580 [details] [review]
waylandsin: add wl_drm support

Waylandsink is only supporting share memory buffer, this patch could add the support of wayland_drm buffers. With this type of buffer we can dmabuf allocator and so have zero copy between hardware accelerated elements.

This patch has been develop on top of 1.0 and not full match with the latest gst allocator dmabuf update mainly because allocator change from _obtain to _new. It is an "under work" patch to trigger the discussion of how do this feature.

Comment 1 Sebastian Dröge (slomo) 2013-11-11 17:26:31 UTC

Review of attachment 258580 [details] [review]:

::: ext/wayland/gstwaylandsink.c
@@ +78,3 @@
+    {GST_WAYLAND_POOL_DRM,  "use DRM to allocated pool buffers", "DRM"},
+    {GST_WAYLAND_POOL_SHM,  "use shared memory to allocated pool buffers", "SHM"},
+    {GST_WAYLAND_POOL_AUTO, "use DRM or share memory", "AUTO"},

The allocation method should be negotiated with the ALLOCATION query. waylandsink would put a dmabuf allocator in there and a shmem allocator. Depending on what the upstream element configures on the pool then, you would use dmabuf or shm. As both can behave like sysmem (they can be mapped), no additional precautions have to be done.

So you also would only have a single buffer pool class.

@@ +385,3 @@
+  sink->device_name = strdup (name);
+  if (!sink->device_name) {
+    GST_WARNING_OBJECT (sink, "no device name");

Maybe put this above the strdup? :)

::: ext/wayland/waylanddrmpool.c
@@ +32,3 @@
+#include "gst/allocators/gstdmabuf.h"
+#include "waylanddrmpool.h"
+#include <libkms/libkms.h>

Needs configure checks for libkms and also need to put that into the LIBADD in Makefile.am

@@ +47,3 @@
+wayland_drm_pool_mem_unmap (GstMemory * gmem)
+{
+	/* do nothing, except avoid doing munmap */

Why?

@@ +73,3 @@
+  allocator = gst_dmabuf_allocator_obtain ();
+  /* hook to avoid unmap issue on drm buffer */
+  allocator->mem_unmap = wayland_drm_pool_mem_unmap;

This way you break all other dmabuf allocators btw. One of the reasons why it's not a singleton anymore now ;)

Comment 2 Benjamin Gaignard 2013-11-20 12:42:56 UTC

Created attachment 260302 [details] [review]
waylandsink: add wl_drm support

I hope this patch could fix most of the previous remarks.

Please note that wayland drm protocol headers files have to be "extracted" from mesa to make all this work.

Comment 3 Benjamin Gaignard 2013-12-03 18:05:26 UTC

Created attachment 263418 [details] [review]
make waylandsink support wl_dmabuf protocol

This patch allow to use wl_dmabuf protocol in waylandsink.

wl_dmabuf protocol is under review here:
http://lists.freedesktop.org/archives/wayland-devel/2013-December/012390.html

it doesn't make sense to merge this patch until wayland patch hasn't been accepted.

Comment 4 Sebastian Dröge (slomo) 2013-12-26 16:41:48 UTC

Any news here?

Comment 5 Benjamin Gaignard 2013-12-26 17:53:26 UTC

I have send a third version of my patches for wayland:http://lists.freedesktop.org/archives/wayland-devel/2013-December/012566.html

I got some comments on IRC, so I will rework them asap.
I still working on gstwaylandsink to have it inline with the latest version of those patches.

Comment 6 Sebastian Dröge (slomo) 2014-01-02 14:01:34 UTC

This is wl_dmabuf support, not wl_drm, right? wl_drm is only an internal interface between Wayland and EGL AFAIU? How is the dmabuf stuff related to DRM though, are all these DRM function calls required by wl_dmabuf or is this only specific to your implementation?

Comment 7 Benjamin Gaignard 2014-01-02 16:32:34 UTC

You are right it is wl_dmabuf support, wl_drm is provided by Mesa for EGL needs.
Today only DRM is able to export dmabuf file descriptor that is why my patch call DRM functions.
In a near futur I hope we could also use ION has dmabuf file descriptor provider.
ION is already in staging directory here:https://git.kernel.org/cgit/linux/kernel/git/gregkh/staging.git/log/drivers/staging/android/ion?h=staging-next

Comment 8 Sebastian Dröge (slomo) 2014-01-02 16:49:52 UTC

How does one know if it's DRM or ION or $WHATEVER memory? Wouldn't it be better to define a generic API for all these that then is implemented inside wl_dmabuf and maps to whatever API is required?

Comment 9 Benjamin Gaignard 2014-01-07 17:45:52 UTC

Created attachment 265552 [details] [review]
make waylandsink support wl_dmabuf protocol

This patch is aligned with this proposal on wayland/weston side:
http://lists.freedesktop.org/archives/wayland-devel/2014-January/012727.html

Comment 10 Sebastian Dröge (slomo) 2014-01-08 09:25:20 UTC

This still has the problems I mention in comment 8. What are the plans here?

Comment 11 Sebastian Dröge (slomo) 2014-01-14 19:34:45 UTC

Comment on attachment 265552 [details] [review]
make waylandsink support wl_dmabuf protocol

As discussed on IRC the protocol is still not finalised and there are some open issues. Especially around a generic dmabuf allocator, or an allocator API in wl_dmabuf... which would solve all my concerns.

Comment 12 Daniel Stone 2014-01-16 15:41:00 UTC

I think you've misunderstood the role of dmabuf.

The only thing dmabuf does, is allow potentially cross-subsystem mapping of already allocated buffers, and allow userspace to pass a reference to these buffers in the form of file descriptors.  That's it.  The standard example is dequeuing a frame with V4L, asking V4L for a dmabuf fd, then passing that fd to your display system (or GL driver) to display, without any copies along the way.

It doesn't do allocation (we couldn't even work out how to make a generic GPU buffer allocation interface, so GEM doesn't even standardise this*), it doesn't do a mapping, and where it came from is irrelevant.

The allocation and usage is still in the subsystems, as is a subsystem-specific call to extract a dmabuf fd from an existing buffer, or import a dmabuf fd in for use.  So whether it's DRM, ION, V4L, or whatever is irrelevant: all that matters is that you're able to import it.

This will not change: you won't get a generic dmabuf allocator.  Even mmap() to userspace has been pretty controversial.

As for wl_dmabuf as a generic API, yes, there is a proposal, but so far it's not merged into Weston core.  And, thanks to dmabuf not having an allocator, that moves us even _further_ away from having the sink be able to allocate buffers, since that would have to be done with something like, say, wl_drm.

http://lists.freedesktop.org/archives/wayland-devel/2013-December/012455.html

*: Except that scanout (i.e. display) buffers which are only ever rendered into with software.  These 'dumb' buffers are a special case to support boot screens or unaccelerated display systems with generic code.

Comment 13 Sebastian Dröge (slomo) 2014-01-16 16:00:43 UTC

Thanks for describing the issue in a larger context, that's basically my concerns about all this and why I think the unconditional usage of DRM in waylandsink (or even conditional usage of DRM) is not the way to go.

If anything there should be something in wl_dmabuf that allows us to allocate dmabufs, and which in turn asks the driver to give us something sensible.


FWIW, without the allocation bits and just accepting dmabufs that come from somewhere this looks acceptable from the GStreamer point of view. It does not seem very useful on its own though.

Comment 14 Benjamin Gaignard 2014-01-16 16:45:25 UTC

wl_drm will not fit with the needs of those who don't have GPU or don't have a Mesa implementation for it. 

I think that something like wl_dmabuf could be useful:
http://lists.freedesktop.org/archives/wayland-devel/2014-January/012816.html

Anyway I (and Linaro Graphic Working Group) take care of your comments and agree that wl_dmabuf could be improve.

Dmabuf will change too and maybe a generic allocator will show up so I hope I could come back later with better version of wl_dmabuf.

Comment 15 Nicolas Dufresne (ndufresne) 2015-08-27 21:27:47 UTC

The dmabuf protocol is now merged in wayland, which re-open this subject. Note that dmabuf currently has few limitation we need to take into account. The main one in my opinion:

- No listing of supported color formats (because it's hidden inside the EGL stack)
- Required video alignment and allocation type (CMA, SG, VM) is unknown

That simply means there is no guaranty that decoder driver produced buffer will work unless you have an "harmonized" allocation for your board integration. This is just a point in time fact, this could eventually change.

In the end, wayland currently offer 3 mainline protocol for creating wl_buffer. In the shm case, we have an dedicated allocation which works mostly like any normal allocator. We know the list of formats and we have the option to copy non-shm memory into shm. Most compositor only offset RGB based formats. Any other format would require video conversion before this sink. This is similar to ximagesink in some ways.

The second is the EGL protocol. This one is defined by Mesa, and requires the wayland client to do the upload (so the compositor don't get overloaded). This can be achieve using libgstgl, but the recommended way is to wrap such sink into glsinkbin. This is vaguely the same thing as glimagesink over wayland. I don't know myself what could be optimized by the compositor itself.

The third is dmabuf protocol. It's a black box, but it's know that an unsupported dmabuf will fail in the creation of the wl_buffer if anything is not compatible. This element require no transformation but present a uncertainty on let's say generic systems (not to say desktop system). While it will just work on a fixed system where all driver have been tuned to work together or where ION has been tuned and is used for allocation.

Assuming we target all these possibilities, it becomes clear that the negotiation for each case is largely different. Building a single element for all those use cases in one will create complexity and might require pulling libvideoconvert and libgstgl. It also removes some flexibility as I believe there is use cases where you want DMABUF or nothing and want to tune the system to make it work this way.

For this reason, I'd like to propose splitting up this in a set of seperate elements (with a base class of course). We'd have eventually 3 sinks:

waylandshmsink
waylandeglsink
waylanddmabufsink

In addition to that, to cover the generic case, where we want to try everything to make it work regardless the price (like on desktop), we would have waylandsinkbin that will know the specificity of each internal sink. And will allow doing fallback when dmabuf and egl failed. The exposed list of color format would be the list of format supported by videoconvert. Which I believe is a fair limitation on generic/desktop platforms.

This design could decrease the complexity of each sink, making it very efficient to use in embedded/controlled use cases, which to be fair will likely be the biggest users of these element. On desktop, I do believe most use case will be satisfied by gl base sink, specially the ones implementing easy to use widgets like gtkglsink and qmlglsink.

Comment 16 Sebastian Dröge (slomo) 2015-08-28 06:49:08 UTC

Shouldn't waylandeglsink be the same as glimagesink (with libgstgl compiled with Wayland support and using Wayland/EGL at runtime)? Why yet another sink?


Otherwise sounds like a good approach

Comment 17 Benjamin Gaignard 2015-08-28 08:24:06 UTC

shm, egl or dmabuf are allocation methods so I believe we have to add them as allocator in wayland pool. In this way it will be up to the upstream element to select the wanted allocator and not decide this while creating the pipeline by putting different elements for each allocation method.

Until now wayland dmabuf protocol doesn't propose to allocate the memory so what we have done is to use libdrm to allocate dumb buffers and get a file descriptor with prime interface. I think it is fair enough since most (even all) systems using wayland are drm based.

Comment 18 George Kiagiadakis 2015-08-28 11:46:35 UTC

I have the same question as Sebastian on the gl sink. I don't think we need yet another sink for this, since glimagesink already works on wayland and the concept is the same (client upload).

Regarding the sink split, I like the design approach from a use case point of view, but I'm not sure if it's less complex to implement 2 sinks and a bin (plus a sink base class to share code) instead of just one sink. Currently, dmabuf support in waylandsink is a relatively small patch. It remains to be seen how complex the format negotiation is going to be...

Comment 19 Nicolas Dufresne (ndufresne) 2015-08-28 12:31:01 UTC

(In reply to Sebastian Dröge (slomo) from comment #16)
> Shouldn't waylandeglsink be the same as glimagesink (with libgstgl compiled
> with Wayland support and using Wayland/EGL at runtime)? Why yet another sink?

The glimagesink method is slightly less efficient, because we will do a convertion to another texture, and then render to our GL context, which will internally submit to the compositor.

With the waylandeglsink, you bind the unconverted (at least weston support few YUV formats) to an Wayland EGLImage and pass that do the compositor, saving an extra step.

Comment 20 Nicolas Dufresne (ndufresne) 2015-08-28 12:32:47 UTC

(In reply to Benjamin Gaignard from comment #17)
> shm, egl or dmabuf are allocation methods so I believe we have to add them
> as allocator in wayland pool. In this way it will be up to the upstream
> element to select the wanted allocator and not decide this while creating
> the pipeline by putting different elements for each allocation method.

No, because only shm can be allocated, the rest comes from upstream.

> 
> Until now wayland dmabuf protocol doesn't propose to allocate the memory so
> what we have done is to use libdrm to allocate dumb buffers and get a file
> descriptor with prime interface. I think it is fair enough since most (even
> all) systems using wayland are drm based.

Which is Intel specific, and most likely not mappable on any other driver. I'm sorry, but adding DRM code in there was and is still a bad idea.

Comment 21 Nicolas Dufresne (ndufresne) 2015-08-28 12:34:39 UTC

(In reply to George Kiagiadakis from comment #18)
> Regarding the sink split, I like the design approach from a use case point
> of view, but I'm not sure if it's less complex to implement 2 sinks and a
> bin (plus a sink base class to share code) instead of just one sink.
> Currently, dmabuf support in waylandsink is a relatively small patch. It
> remains to be seen how complex the format negotiation is going to be...

What's nearly impossible to get right is the resulting combined caps negotiation and allocation decision. The base class should also do pretty much all the job. The instances only need to ceate wl_buffer.

Comment 22 Sebastian Dröge (slomo) 2015-08-28 12:39:11 UTC

(In reply to Nicolas Dufresne (stormer) from comment #19)
> (In reply to Sebastian Dröge (slomo) from comment #16)
> > Shouldn't waylandeglsink be the same as glimagesink (with libgstgl compiled
> > with Wayland support and using Wayland/EGL at runtime)? Why yet another sink?
> 
> The glimagesink method is slightly less efficient, because we will do a
> convertion to another texture, and then render to our GL context, which will
> internally submit to the compositor.

Why don't we optimize this for this specific case on Wayland then?

Comment 23 Daniel Stone 2015-08-28 13:17:05 UTC

(In reply to Sebastian Dröge (slomo) from comment #22)
> (In reply to Nicolas Dufresne (stormer) from comment #19)
> > (In reply to Sebastian Dröge (slomo) from comment #16)
> > > Shouldn't waylandeglsink be the same as glimagesink (with libgstgl compiled
> > > with Wayland support and using Wayland/EGL at runtime)? Why yet another sink?
> > 
> > The glimagesink method is slightly less efficient, because we will do a
> > convertion to another texture, and then render to our GL context, which will
> > internally submit to the compositor.
> 
> Why don't we optimize this for this specific case on Wayland then?

Depends. Mesa does support the EGL_WL_create_wayland_buffer_from_image extension, but this is not widely supported on other implementations. Any implementation supporting this could skip the blit through GL (i.e. calling glDrawArrays sourcing from the provided/imported EGLImage, and rendering to a Wayland-backed EGLSurface) in the case where it could do a native import to EGLImage, i.e. dmabuf.

I don't know how useful it is to optimise for this case though: if you have a dmabuf, you can skip a lot of effort by just passing it directly through Wayland's dmabuf protocol. Are there many cases where we get an EGLImage but it doesn't come from dmabuf? I guess OpenMAX, but I've yet to see systems in the wild which support both OMX and Wayland.

Also, as a quick aside - the reason most compositors don't advertise support for YUV formats on SHM is two-fold. Firstly (but very easily fixed) is that we lack a multi-planar SHM buffer creation protocol. Secondly, OpenGL ES 2.x lacks native support for one- and two-component texture formats, which we'd need to efficiently convert in the compositor. For that, you need ES3, or GL_EXT_texture_rg.

Another quick aside - if you have data in shared memory, we'd massively prefer for that to go through EGL, as it places the TexImage2D overhead in the client (easy to account for when profiling/etc, lets others run in the meantime) to the compositor (blocks the entire session).

Comment 24 Daniel Stone 2015-08-28 13:18:55 UTC

(In reply to Nicolas Dufresne (stormer) from comment #20)
> (In reply to Benjamin Gaignard from comment #17)
> > Until now wayland dmabuf protocol doesn't propose to allocate the memory so
> > what we have done is to use libdrm to allocate dumb buffers and get a file
> > descriptor with prime interface. I think it is fair enough since most (even
> > all) systems using wayland are drm based.
> 
> Which is Intel specific, and most likely not mappable on any other driver.
> I'm sorry, but adding DRM code in there was and is still a bad idea.

Dumb buffers are a generic/independent API that work on all drivers, and they _guarantee_ that you can mmap() them.

However, you're not supposed to use them for hardware-accelerated rendering, which you could argue media decode engines is. So passing them to V4L2 will almost certainly work, but is quite iffy from an interface purists' point of view. Dave Airlie will probably get pretty upset about that at some stage.

[So, back to the generic allocator then ...]

Comment 25 Daniel Stone 2015-08-28 13:21:07 UTC

(In reply to Nicolas Dufresne (stormer) from comment #15)
> The second is the EGL protocol. This one is defined by Mesa, and requires
> the wayland client to do the upload (so the compositor don't get
> overloaded). This can be achieve using libgstgl, but the recommended way is
> to wrap such sink into glsinkbin. This is vaguely the same thing as
> glimagesink over wayland. I don't know myself what could be optimized by the
> compositor itself.

Another quick 'well actually': Mesa doesn't define it, but the EGL/Wayland interface is defined by the combination of EGL_KHR_platform_wayland and wayland-egl.h. All these really define is a way to get an EGLDisplay and an EGLSurface from a wl_display and a wl_surface, respectively.

Any additional trickery is not actually standardised. The way Mesa communicates with the Wayland compositor is completely private and not supposed to be used by anyone external, ever. That VA-API does so is really nasty, and that code should all be ripped out and replaced with dmabuf.

Comment 26 Nicolas Dufresne (ndufresne) 2015-08-28 14:55:40 UTC

I think in the end, it's fair to say waylandeglsink is a project for possible future micro optimization, I didn't mean to mention that in the sense that we MUST add support for that. Instead I wanted this proposed design to be complete complete. Currently glimagesink and friends will most often do the job.

I think what I care the most is having the ability to say "I want DMABUF, or nothing". And because you cannot fully negotiate dmabuf, having both SHM and DMABUF together may fail in the opposite use case because the color format of your dmabuf is not one of the supported shm for at.

To make it clear, if you have a DMABUF with a color format that is known for the list of DRM valid formats, you have no choice but to assume it will work. But if it's fails, the way to fallback with a single sink would require duplicating videoconvert inside the single element. A wrapper bin can do that with less code, resuing videoconvert and enabling the "DMABUF of fail" use case.

As it's clear that uploading on client side should happen on client side, again with a single sink we'd have to duplicate what glupload and glvideoconvert do.

Comment 27 George Kiagiadakis 2015-08-28 15:42:14 UTC

As I understand it, after Daniel's comments here and on irc, this wayland-EGL method is something private between mesa and the compositor *only*. So, we don't have to consider this case, but this doesn't change anything in the design actually.

The point that I see and needs discussion, though, is dmabuf allocation. There are 2 potential ways to do this:

1) Use a hardware decoder or camera that gives us dmabuf (in gstreamer it would mean bufers are allocated and pushed from v4l2dec/src)
2) Use "generic dumb buffers", which are allocated from DRM and according to the graphics people, it's the client (gstreamer) that should allocate them. Supposing we do this in waylanddmabufsink, then this sink would propose a pool that allocates buffers using libdrm basically.

The problem is, you cannot mix dumb buffers with hw decoder/camera buffers, and they are not the same in terms of acceleration. Dumb buffers are still not an accelerated path, they just avoid the GPU upload that needs to happen in glTexImage2D.

So basically there are 4 paths to consider when the platform is wayland:

1) if there is a hw decoder/camera (v4l2), upstream pushes dmabuf to waylandsink and they are sent as is to the compositor
2) if this is not the case (or fails), allocate dumb dmabufs from the sink and let upstream render into them (which is faster than using glupload / glTexImage2D)
3) if dmabuf doesn't work, use glupload / glimagesink
4) if all else fails, use wl_shm (what the existing waylandsink does), which is the worst method (equivalent to ximagesink)

Comment 28 Nicolas Dufresne (ndufresne) 2015-08-28 15:58:22 UTC

(In reply to George Kiagiadakis from comment #27)
> The problem is, you cannot mix dumb buffers with hw decoder/camera buffers,
> and they are not the same in terms of acceleration.

There is exception to that. With UVC driver, you can pass dumb buffer (using DMABUF or USERPTR) and it remains the fastest path. It will always work since this driver uses vmalloc and has no alignment requirement. In latest (or future?) this driver will also export dmabuf, which are backed by virtual memory, hence might not be usable if you graphic driver requires CMA or SG.

Comment 29 Benjamin Gaignard 2015-08-28 17:09:18 UTC

@Daniel: I have propose a generic allocator framework (which include buffer securing) but I don't get lot of enthusiasm for it...
https://lkml.org/lkml/2015/7/10/357
If we can upstream that I think that will help us lot.

I don't think that having a specific sink for dmabuf protocol could work because dmabuf usage is negotiated in the pool not in the caps so you won't be able to select the good sink.
Else I don't understand the need to introduced a bin with videoconvert, wl_dmabuf will not expose unsupported formats and I don't believe that drivers will support a format in shm and not in dmabuf (or reverse).

Comment 30 Daniel Stone 2015-08-28 17:12:14 UTC

(In reply to Benjamin Gaignard from comment #29)
> @Daniel: I have propose a generic allocator framework (which include buffer
> securing) but I don't get lot of enthusiasm for it...
> https://lkml.org/lkml/2015/7/10/357
> If we can upstream that I think that will help us lot.

Indeed. It might be interesting to sort through the notes from the ION discussion at Plumbers, mind.

> I don't think that having a specific sink for dmabuf protocol could work
> because dmabuf usage is negotiated in the pool not in the caps so you won't
> be able to select the good sink.

From what I understand, Nicolas's intention was to allow people to demand specific sinks where fallback was not an option: e.g. if you are an STB and you don't have working dmabuf, it's better to fail hard than to limp along at 0.2fps.

Comment 31 Nicolas Dufresne (ndufresne) 2015-08-28 19:37:38 UTC

(In reply to Benjamin Gaignard from comment #29)
> I don't think that having a specific sink for dmabuf protocol could work
> because dmabuf usage is negotiated in the pool not in the caps so you won't
> be able to select the good sink.
> Else I don't understand the need to introduced a bin with videoconvert,
> wl_dmabuf will not expose unsupported formats and I don't believe that
> drivers will support a format in shm and not in dmabuf (or reverse).

I have been asked to explicitly negotiated dmabuf memory type through caps features, see:

https://bugzilla.gnome.org/show_bug.cgi?id=745459

When that is fixed, V4L2 element that can export or import DMABUF will be able to consider downstream capabilities and configure the queue accordingly.

Caps features are often used to accept different sets of color formats depending of the memory type. At the moment, there is cases I've seen where the list of formats do vary. Though, right now it will only vary because there is no method to retrieve this list (yet).

My biggest need for the split is what Daniel said. I do start to agree that considering the dumb/drm allocator would make that specialised sink more useful, potentially allowing a SW decoder to be used and upstream importation assuming we can fulfill any kind of alignment. Though, on platform where SW decoder can be considered, it is likely that glimagesink (and widget specific friends) will perform better. We need not to forget the waylanddmabufsink is not the only upcoming way to render DMABUF with acceleration. It's a way that allow the compositor to make smarter decisions (e.g. choosing between GL and HW layer is an example).

https://bugzilla.gnome.org/show_bug.cgi?id=743345

Comment 32 Benjamin Gaignard 2015-08-30 16:36:32 UTC

(In reply to Nicolas Dufresne (stormer) from comment #31)
> (In reply to Benjamin Gaignard from comment #29)
> > I don't think that having a specific sink for dmabuf protocol could work
> > because dmabuf usage is negotiated in the pool not in the caps so you won't
> > be able to select the good sink.
> > Else I don't understand the need to introduced a bin with videoconvert,
> > wl_dmabuf will not expose unsupported formats and I don't believe that
> > drivers will support a format in shm and not in dmabuf (or reverse).
> 
> I have been asked to explicitly negotiated dmabuf memory type through caps
> features, see:
> 
> https://bugzilla.gnome.org/show_bug.cgi?id=745459
> 
> When that is fixed, V4L2 element that can export or import DMABUF will be
> able to consider downstream capabilities and configure the queue accordingly.

I have answer in this thread why I believe that putting dmabuf in cap is a bad idea.
The way of how use dmabuf allocator has been discuss while introduce it mainline: https://bugzilla.gnome.org/show_bug.cgi?id=703659

> 
> Caps features are often used to accept different sets of color formats
> depending of the memory type. At the moment, there is cases I've seen where
> the list of formats do vary. Though, right now it will only vary because
> there is no method to retrieve this list (yet).

If you look at v4l2 or drm frameworks the list of formats should not vary depending of memory format. In the both frameworks it is only a flag (VB2_DMABUF or DRIVER_PRIME) that is set to indicate dmabuf usage but without any impact on supported formats.

> 
> My biggest need for the split is what Daniel said. I do start to agree that
> considering the dumb/drm allocator would make that specialised sink more
> useful, potentially allowing a SW decoder to be used and upstream
> importation assuming we can fulfill any kind of alignment. Though, on
> platform where SW decoder can be considered, it is likely that glimagesink
> (and widget specific friends) will perform better. We need not to forget the
> waylanddmabufsink is not the only upcoming way to render DMABUF with
> acceleration. It's a way that allow the compositor to make smarter decisions
> (e.g. choosing between GL and HW layer is an example).
> 
> https://bugzilla.gnome.org/show_bug.cgi?id=743345
If the shm allocator remain the first in pool's allocator list it will be selected by default so SW decoder will continue to work as usual. If you want that your SW decoder use dmabuf you will modify it to select dmabuf allocation at pool negotiation time and take care of alignment.
It works very well since a while for us like this.

Selecting the better sink (wayland or gl) for your platform is not link dmabuf but to the compositor you want to use.

Comment 33 Nicolas Dufresne (ndufresne) 2015-08-31 00:37:43 UTC

Exception that the dmabuf allocator does not allocate, hence cannot be put in the allocator array. If it get used accidentally by an element that does not know how it works, it will completely fail the pipeline. Passing a dmabuf allocator in there seemed like a good idea even to me at first, but is in fact a bad idea as it makes everything more complex when come time to deal with the allocator array in generic code.

That being said, the format list *DO* vary. It's very common with MALI, where the EGL path uses shaders internally, hence can support more formats then your HW mixer do (which arguably makes everything more complex). The same DMABUF on these target could be imported in GL space, but not directly renderer to the display.

Comment 34 Sebastian Dröge (slomo) 2015-08-31 07:39:28 UTC

(In reply to Nicolas Dufresne (stormer) from comment #33)
> Exception that the dmabuf allocator does not allocate, hence cannot be put
> in the allocator array. If it get used accidentally by an element that does
> not know how it works, it will completely fail the pipeline. Passing a
> dmabuf allocator in there seemed like a good idea even to me at first, but
> is in fact a bad idea as it makes everything more complex when come time to
> deal with the allocator array in generic code.

It's not different to some other allocators, like the GL one. There's nothing wrong with putting an allocator into the ALLOCATION query that requires special API to be used.

Comment 35 George Kiagiadakis 2015-08-31 08:50:30 UTC

I will agree with Nicolas on this one. We should not advertise the dmabuf allocator, since it will not work with elements that don't know how to use it.

However, what we *can* advertise, is a subclass of the GstDmabufAllocator that implements the alloc() vfunc in a certain way. For example, assuming we want to use those "dumb buffers" to let the software decoder render into them, we could implement a GstDmabufAllocator subclass that allocates those. This can and should be advertised *without* the memory:dmabuf caps feature. Afterwards, if the sink detects that the buffers come from this allocator, it can import them as dmabuf, otherwise it will fall back to some other method (perhaps copy the contents into such buffers)

But in the v4l2dec ! somedmabufcapablesink case, the caps should have the memory:dmabuf feature if dmabuf is in use, because:

1) there is special allocation and handling involved that the elements must know about (it's a requirement, not optional!)
2) the buffers should not be mmap()-ed and processed, as this is going to slow down the hardware pipeline (and it is not guaranteed to work anyway). So by using the special caps features we ensure that no generic software transform element is going to sit in the middle and ruin everything.



Btw, I have started working towards this split as an experiment:
https://git.collabora.com/cgit/user/gkiagia/gst-plugins-bad.git/log/?h=waylandsink-1.7

My aim is to include dmabuf support in 1.7 if we are all happy with all the details.

Comment 36 Benjamin Gaignard 2015-08-31 09:09:09 UTC

(In reply to George Kiagiadakis from comment #35)
> I will agree with Nicolas on this one. We should not advertise the dmabuf
> allocator, since it will not work with elements that don't know how to use
> it.
>

This is false, elements doesn't need to know the allocator type to get access to the buffer, a call to gst_buffer_map() is working well.
For example a software element like videoconvert is able to use a dmabuf buffer without knowing that it is a dmabuf memory type. 
If you introduce dmabuf in caps you will split gstreamer elements in two categories: those who can use dmabuf and the others.

> However, what we *can* advertise, is a subclass of the GstDmabufAllocator
> that implements the alloc() vfunc in a certain way. For example, assuming we
> want to use those "dumb buffers" to let the software decoder render into
> them, we could implement a GstDmabufAllocator subclass that allocates those.
> This can and should be advertised *without* the memory:dmabuf caps feature.
> Afterwards, if the sink detects that the buffers come from this allocator,
> it can import them as dmabuf, otherwise it will fall back to some other
> method (perhaps copy the contents into such buffers)
> 
> But in the v4l2dec ! somedmabufcapablesink case, the caps should have the
> memory:dmabuf feature if dmabuf is in use, because:
> 
> 1) there is special allocation and handling involved that the elements must
> know about (it's a requirement, not optional!)
> 2) the buffers should not be mmap()-ed and processed, as this is going to
> slow down the hardware pipeline (and it is not guaranteed to work anyway).
> So by using the special caps features we ensure that no generic software
> transform element is going to sit in the middle and ruin everything.

That is exactly the slip between elements that I want to avoid, we have to be able to mix SW and HW elements either we loose all gstreamer flexibility.

> 
> 
> 
> Btw, I have started working towards this split as an experiment:
> https://git.collabora.com/cgit/user/gkiagia/gst-plugins-bad.git/log/
> ?h=waylandsink-1.7
> 
> My aim is to include dmabuf support in 1.7 if we are all happy with all the
> details.

Comment 37 Nicolas Dufresne (ndufresne) 2015-08-31 12:56:59 UTC

(In reply to Sebastian Dröge (slomo) from comment #34)
> It's not different to some other allocators, like the GL one. There's
> nothing wrong with putting an allocator into the ALLOCATION query that
> requires special API to be used.

To be fair, this part of GStreamer lacks a design. Basically, there is no documented rule to define which allocator can be use with which pool. So if you endup mixing videopool with the dmabuf allocator, _set_active() will fails and most element will post an error on the bus.

Comment 38 Nicolas Dufresne (ndufresne) 2015-08-31 12:59:18 UTC

@george indeed, a subclass that implements _alloc() make sense.

Comment 39 Nicolas Dufresne (ndufresne) 2015-08-31 13:20:03 UTC

(In reply to Benjamin Gaignard from comment #36)
> This is false, elements doesn't need to know the allocator type to get
> access to the buffer, a call to gst_buffer_map() is working well.
> For example a software element like videoconvert is able to use a dmabuf
> buffer without knowing that it is a dmabuf memory type. 
> If you introduce dmabuf in caps you will split gstreamer elements in two
> categories: those who can use dmabuf and the others.

For existing V4L2 driver I agree. On the graphic side, I have been told that there is low interest in adding mmap support to their DMABUF implementation (dumb buffer are the exception) and the memory itself may be on slow-path for read operations. Also, I believe if you add this idea of protected memory to V4L2, you will no longer be able to map the DMABUF no ?

Obviously, if dumb buffers are opt-in, it should work just fine in with memory:SystemMemory. The idea of memory:dmabuf is to ensure that we can negotiate a pipeline where the memory will always be DMABUF, preventing element that would cause a transition to memory:SystemMemory from being used. We basically want a way to have no surprises, hence no compromise.

> That is exactly the slip between elements that I want to avoid, we have to
> be able to mix SW and HW elements either we loose all gstreamer flexibility.

For that, I believe it's good habit to opt-in DMABUF by looking at incoming memory type rather then caps (and implement fallbacks) if possible. That's what the suggested wayland sink wrapper bin would implement (wrapper which is *suggested*, not mandatory for merging dmabuf work). When the fallback back is trivial, it would make sense to avoid this split, but at least right now, the fallback is quite complicated in Wayland protocol. Note that for most decoder (and that's a bug in GST) the output buffers of the decoder need to be read-only as otherwise you endup messing the decoder observation. That means, if you do something in SW you need to copy, and you may not have DMABUF in the end.

Comment 40 Benjamin Gaignard 2015-09-02 08:42:39 UTC

Created attachment 310472 [details] [review]
make waylandsink support wl_dmabuf protocol

I attach an updated version of the patch.

Even it is not using the upstreamed wayland dmabuf protocol it show how we can use dmabuf in pool allocator to perform zero-copy.
With this patch we have been able to use SW and HW (v4l2) decoders, transform, and encoder in STB context.
We have done playbacks of local and live streams, transcoding and video mixing (glmixer + Mali) by using this way of dealing with dmabuf in pool.
We also have use SW elements (for video mixing and color enhancement) between HW elements using dmabuf without issues.

The next step is to do the same with the upstreamed wayland dmabuf protocol.

@slomo, @tim, @daniel: I would like to have your opinion of how we should implement and use dmabuf in gstreamer. Can we continue to negotiate it as a memory type in pool's allocator ?
From what I experimented since more a year this solution works well and allow to mix SW and HW elements.
For me Nicolas's solution will reduce the compatibility between SW and HW elements because add dmabuf in caps will not allow to graph elements freely even if SW could fairly use dmabuf.

Comment 41 Matthew Waters (ystreet00) 2015-09-02 09:26:33 UTC

(In reply to Benjamin Gaignard from comment #40)
> For me Nicolas's solution will reduce the compatibility between SW and HW
> elements because add dmabuf in caps will not allow to graph elements freely
> even if SW could fairly use dmabuf.

It doesn't reduce compatibility and provides a bit more flexibility/choice.

1. An element can still have both sysmem and dmabbuf caps features and negotiation would choose dmabuf over sysmem but sysmem is still there for the software elements.
2. Nothing takes away the possibility of using a dmabuf as if it was system memory.
2a. dmabuf memory can still be provided for sysmem caps features as long is the dmabuf is mappable to sysmem.
3. Another goal of the dmabuf caps feature is to be able to limit and/or choose what an element will produce/consume much like the GLMemory/EGLImage/GLTextureUploadMeta splits in the GStreamer GL stack.
3a. Related: the caps features can be used to say "dmabuf or fail".
4. It was mentioned before about different formats possibly being supported with dmabuf vs sysmem.

Comment 42 Nicolas Dufresne (ndufresne) 2015-09-02 12:36:11 UTC

Review of attachment 310472 [details] [review]:

I don't think the approach is right. This seem to track GstBuffer instead of GstMemory. About the ifdef, I might have miss-read (and be wrong), but I believe it's not because you have a release that support dmabuf that this protocol is implement in the compositor. It also seems to assume that all DMABUF will be backed by DRM Dumb.

::: ext/wayland/gstwaylandsink.c
@@ +586,3 @@
     config = gst_buffer_pool_get_config (pool);
     gst_buffer_pool_config_set_params (config, caps, size, 2, 0);
+    gst_buffer_pool_config_set_allocator (config, NULL, &params);

Why bother passing allocation params if you don't have any alignment requirement ?

::: ext/wayland/waylandpool.c
@@ +38,3 @@
 #include <sys/types.h>
 
+#ifdef HAVE_WAYLAND_DMABUF

I think this should mostly be run-time. It would be sad if we cannot use the element at all on a compositor that does not implement this protocol.

@@ +78,3 @@
+    memset (&destroy_arg, 0, sizeof destroy_arg);
+    drmPrimeFDToHandle (meta->drm_fd, prime_fd, &destroy_arg.handle);
+    drmIoctl (meta->drm_fd, DRM_IOCTL_MODE_DESTROY_DUMB, &destroy_arg);

What if the dmabuf was not DRM dumb ?

::: ext/wayland/wlvideoformat.c
@@ +63,3 @@
+#ifdef HAVE_WAYLAND_DMABUF
+  {WL_DMABUF_FORMAT_XRGB8888, GST_VIDEO_FORMAT_BGRx},
+  {WL_DMABUF_FORMAT_ARGB8888, GST_VIDEO_FORMAT_BGRA},

And no YUV format ? Without any YUV format, it's a bit pointless to have this in GStreamer.

Comment 43 Benjamin Gaignard 2015-09-03 16:25:47 UTC

(In reply to Nicolas Dufresne (stormer) from comment #42)
> Review of attachment 310472 [details] [review] [review]:
> 
> I don't think the approach is right. This seem to track GstBuffer instead of
> GstMemory. About the ifdef, I might have miss-read (and be wrong), but I
> believe it's not because you have a release that support dmabuf that this
> protocol is implement in the compositor. It also seems to assume that all
> DMABUF will be backed by DRM Dumb.

No we track which allocator has been selected in pool set_config() 
self->use_wl_dmabuf = (self->allocator
      && g_strcmp0 (self->allocator->mem_type, GST_ALLOCATOR_DMABUF) == 0);
when we use this to allocate the buffer either with shm or dumb buffer.
We only have use dumb buffer because it what we need on our platform.
We have put dmabuf protocol code in wayland and implement it on weston, to be sure that functions prototype exist we check if the headers are present or not in package configuration.
Maybe we have put to much #ifdef HAVE_WAYLAND_DMABUF but at least it indicate us where we have adding code.

> ::: ext/wayland/gstwaylandsink.c
> @@ +586,3 @@
>      config = gst_buffer_pool_get_config (pool);
>      gst_buffer_pool_config_set_params (config, caps, size, 2, 0);
> +    gst_buffer_pool_config_set_allocator (config, NULL, &params);
> 
> Why bother passing allocation params if you don't have any alignment
> requirement ?

because gst_buffer_pool_config_set_allocator() do not accept to have the both parameters NULL and was needed for pool initialization.

> ::: ext/wayland/waylandpool.c
> @@ +38,3 @@
>  #include <sys/types.h>
>  
> +#ifdef HAVE_WAYLAND_DMABUF
> 
> I think this should mostly be run-time. It would be sad if we cannot use the
> element at all on a compositor that does not implement this protocol.

I don't understand ... this #ifdef only cover protocol included files.

> @@ +78,3 @@
> +    memset (&destroy_arg, 0, sizeof destroy_arg);
> +    drmPrimeFDToHandle (meta->drm_fd, prime_fd, &destroy_arg.handle);
> +    drmIoctl (meta->drm_fd, DRM_IOCTL_MODE_DESTROY_DUMB, &destroy_arg);
> 
> What if the dmabuf was not DRM dumb ?

In your case dmabuf always means dumb buffer.
It is a short cut we have done for your implementation but we can imagine have something like v4l2 plugins io-mode to select allocation method.
 
> 
> ::: ext/wayland/wlvideoformat.c
> @@ +63,3 @@
> +#ifdef HAVE_WAYLAND_DMABUF
> +  {WL_DMABUF_FORMAT_XRGB8888, GST_VIDEO_FORMAT_BGRx},
> +  {WL_DMABUF_FORMAT_ARGB8888, GST_VIDEO_FORMAT_BGRA},
> 
> And no YUV format ? Without any YUV format, it's a bit pointless to have
> this in GStreamer.
Of course we have YUV formats but it is an other patch and link to dmabuf usage.

Comment 44 Nicolas Dufresne (ndufresne) 2015-09-03 18:19:51 UTC

(In reply to Benjamin Gaignard from comment #43)
> No we track which allocator has been selected in pool set_config() 

I mean tracking the life time of a memory instead of a buffer. This is very basic error in this GstBufferPool/GstAllocator world.

> Maybe we have put to much #ifdef HAVE_WAYLAND_DMABUF but at least it
> indicate us where we have adding code.

I might be in disbelief, but to me it's impossible that a modern IPC like Wayland suggest can't let you check what is supported at run-time. It's really unfortunate if you have to rebuild GStreamer for every compositor. I'd like someone that know to strongly asset this before adding any kind of ifdef in the code.

> 
> > ::: ext/wayland/gstwaylandsink.c
> > @@ +586,3 @@
> >      config = gst_buffer_pool_get_config (pool);
> >      gst_buffer_pool_config_set_params (config, caps, size, 2, 0);
> > +    gst_buffer_pool_config_set_allocator (config, NULL, &params);
> > 
> > Why bother passing allocation params if you don't have any alignment
> > requirement ?
> 
> because gst_buffer_pool_config_set_allocator() do not accept to have the
> both parameters NULL and was needed for pool initialization.

Ok, so this is a hack instead of fixing the assertion, This is a need-work then.

> 
> > @@ +78,3 @@
> > +    memset (&destroy_arg, 0, sizeof destroy_arg);
> > +    drmPrimeFDToHandle (meta->drm_fd, prime_fd, &destroy_arg.handle);
> > +    drmIoctl (meta->drm_fd, DRM_IOCTL_MODE_DESTROY_DUMB, &destroy_arg);
> > 
> > What if the dmabuf was not DRM dumb ?
> 
> In your case dmabuf always means dumb buffer.
> It is a short cut we have done for your implementation but we can imagine
> have something like v4l2 plugins io-mode to select allocation method.

This is exactly the kind of implicit limitation and hacks I'm trying to move away from.

>  
> > 
> > ::: ext/wayland/wlvideoformat.c
> > @@ +63,3 @@
> > +#ifdef HAVE_WAYLAND_DMABUF
> > +  {WL_DMABUF_FORMAT_XRGB8888, GST_VIDEO_FORMAT_BGRx},
> > +  {WL_DMABUF_FORMAT_ARGB8888, GST_VIDEO_FORMAT_BGRA},
> > 
> > And no YUV format ? Without any YUV format, it's a bit pointless to have
> > this in GStreamer.
> Of course we have YUV formats but it is an other patch and link to dmabuf
> usage.

So this patchset is partial ?

Comment 45 Julien Isorce 2015-09-14 17:44:57 UTC

+1 to add memory:dmabuf caps feature in gst-plugins-base.

Benjamin I am trying to find a consensus so in your use case you have a flow with both dmabuf and system mem right ?

What we probably need in your case it dmabuf type for gst_query_add_allocation_meta or a dmabuf option for gst_buffer_pool_config_add_option.
So that your sink does not need to expose a dmabuf caps feature in its caps, but just add this meta or option when proposing a pool.
At runtime you will always have gst_is_dmabuf_memory to check what type of buffer you got.
In the end the caps of the flow will be just video/x-raw without restricting the memory type.

Comment 46 Benjamin Gaignard 2015-09-15 10:26:23 UTC

(In reply to Julien Isorce from comment #45)
> +1 to add memory:dmabuf caps feature in gst-plugins-base.
> 
> Benjamin I am trying to find a consensus so in your use case you have a flow
> with both dmabuf and system mem right ?
> 
> What we probably need in your case it dmabuf type for
> gst_query_add_allocation_meta or a dmabuf option for
> gst_buffer_pool_config_add_option.
> So that your sink does not need to expose a dmabuf caps feature in its caps,
> but just add this meta or option when proposing a pool.

That is exactly what we do when adding the allocator in pool with gst_query_add_allocation_param() function.

> At runtime you will always have gst_is_dmabuf_memory to check what type of
> buffer you got.

Yes we do that too to check the buffer memory type

> In the end the caps of the flow will be just video/x-raw without restricting
> the memory type.

That definitively what I expected to have: caps without memory type and allocator selection while negotiate buffer pool

Comment 47 Benjamin Gaignard 2015-09-15 10:34:43 UTC

Created attachment 311346 [details] [review]
make waylandsink use zlinux_dmabuf upstreamed protocol

With this patch we have been able to do 0-copy with dmabuf between the sink and weston.

We had to copy the protocol because weston doesn't exported it. It is only experimental for intel driver, we need to send a patch to weston to fix that then we can probably test weston version in configure to know if the protocol file is available or not.

We have only consider allocation through DRM dumb buffer API so investigating how add EGL allocation is needed.

As in the previous patch dmabuf allocator is added in pool configuration to let elements negotiate the allocator.

Comment 48 Julien Isorce 2015-09-15 13:18:43 UTC

(In reply to Benjamin Gaignard from comment #46)
> That is exactly what we do when adding the allocator in pool with
> gst_query_add_allocation_param() function.

Ah right, I also missed Sebastian's first review.

Could you attach the patches for other elements. I guess in their decide_allocation you check for the existence of the proposed allocator attached to the query. So that they know if they can produce/handle dmabuf flow.
Because in the end I think these other patches are the main concern, especially if it is for gstv4l2 elements.

Just to understand, are in the case dec/import -> export/sink or dec/export -> import/sink ? (It seems to be the first case)

> 
> That definitively what I expected to have: caps without memory type and
> allocator selection while negotiate buffer pool

Right, I think there are 2 cases about dmabuf buffers: SW+HW buffers in the same flow and HW buffers. I think the caps feature memory:dmabuf only applies to the second case. In other words this caps feature does not apply to your case. So we should not expect you to use memory:dmabuf caps feature in your patch.

Here is what I suggest to handle the 2 cases:

If we expect the decoder to produce both system and dmabuf memories then the downstream element only notify dmabuf during propose/decide query sequence using gst_query_add_allocation_param.
But if you expect you decoder to produce dmabuf only then you additionally set the caps feature memory:dmabuf in the caps of that query. (And add it to template caps as well)
If in any case the decoder can only produce exclusively SW or HW buffers then only the caps feature is needed to deduce what to produce.

All of these to say that we should agree to have memory:dmabuf as a caps feature defined in gst-plugins-base. And patch gstv4l2 as an example for this logic described above.

Comment 49 Nicolas Dufresne (ndufresne) 2015-09-15 13:23:57 UTC

(In reply to Benjamin Gaignard from comment #47)
> As in the previous patch dmabuf allocator is added in pool configuration to
> let elements negotiate the allocator.

That's the part I'm less happy with. The allocation stuff is already cluttered of exception case, adding allocators with the GST_ALLOCATOR_FLAG_CUSTOM_ALLOC flags to the query is yet another exception to handle. Maybe the pool inforce the allocator instead. This way you can achieve the same, but by reading back the updated configuration from the pool, rather then looking for a dmabuf in the query array ?

Comment 50 Benjamin Gaignard 2015-09-15 14:58:47 UTC

(In reply to Julien Isorce from comment #48)
> (In reply to Benjamin Gaignard from comment #46)
> > That is exactly what we do when adding the allocator in pool with
> > gst_query_add_allocation_param() function.
> 
> Ah right, I also missed Sebastian's first review.
> 
> Could you attach the patches for other elements. I guess in their
> decide_allocation you check for the existence of the proposed allocator
> attached to the query. So that they know if they can produce/handle dmabuf
> flow.
> Because in the end I think these other patches are the main concern,
> especially if it is for gstv4l2 elements.
> 
v4l2 decoder/encoder/transform code is here:
https://git.linaro.org/people/benjamin.gaignard/gst-plugins-v4l2.git

and you are right decide_allocation function check and select the allocator attached to the query like this for example: https://git.linaro.org/people/benjamin.gaignard/gst-plugins-v4l2.git/blob/HEAD:/gst/v4l2dec/gstv4l2dec.c#l469

> Just to understand, are in the case dec/import -> export/sink or dec/export
> -> import/sink ? (It seems to be the first case)
> 
yes it is the first case but the second could works also.

> > 
> > That definitively what I expected to have: caps without memory type and
> > allocator selection while negotiate buffer pool
> 
> Right, I think there are 2 cases about dmabuf buffers: SW+HW buffers in the
> same flow and HW buffers. I think the caps feature memory:dmabuf only
> applies to the second case. In other words this caps feature does not apply
> to your case. So we should not expect you to use memory:dmabuf caps feature
> in your patch.
> 
> Here is what I suggest to handle the 2 cases:
> 
> If we expect the decoder to produce both system and dmabuf memories then the
> downstream element only notify dmabuf during propose/decide query sequence
> using gst_query_add_allocation_param.

Yes selecting the allocator need to be done in propose/decide query sequence.

> But if you expect you decoder to produce dmabuf only then you additionally
> set the caps feature memory:dmabuf in the caps of that query. (And add it to
> template caps as well)

If one element can only deal with dmabuf memory type and if this allocator isn't available in decide_allocation function it should just failed.
If you add it into the caps you duplicate them for each allocator type.

> If in any case the decoder can only produce exclusively SW or HW buffers
> then only the caps feature is needed to deduce what to produce.

As you have said before you can deduce what type of memory to use in propose/decide allocation no need of caps here. 

> 
> All of these to say that we should agree to have memory:dmabuf as a caps
> feature defined in gst-plugins-base. And patch gstv4l2 as an example for
> this logic described above.

Comment 51 Benjamin Gaignard 2015-09-15 17:11:17 UTC

Julien: from this example you give on another bug
gst-launch-1.0 filesrc location=video.mp4 ! qtdemux ! h264parse ! vaapidecode ! "video/x-raw(memory:VASurface), format=NV12" ! vaapipostproc ! "video/x-raw(memory:dmabuf), format=RGBA" ! glimagesink

May I understand that you want to extend filter element to be able to select the allocator in buffer pool ? If it is that and while memory:* do not appear in element src/sink caps it fine for me.

Comment 52 Julien Isorce 2015-09-15 20:00:59 UTC

(In reply to Benjamin Gaignard from comment #51)
> Julien: from this example you give on another bug
> gst-launch-1.0 filesrc location=video.mp4 ! qtdemux ! h264parse !
> vaapidecode ! "video/x-raw(memory:VASurface), format=NV12" ! vaapipostproc !
> "video/x-raw(memory:dmabuf), format=RGBA" ! glimagesink
> 
> May I understand that you want to extend filter element to be able to select
> the allocator in buffer pool ? If it is that and while memory:* do not
> appear in element src/sink caps it fine for me.

Which filter ?
In this example I just use the caps feature because once it is negotiated it can only produce dmabuf. It won't mix different memory types in the same flow.
Note that in this example it is export -> import.
So upstream element does not need to setup downstream pool in advance.

Sorry it is not clear to me :), in the end are you in favor of memory:DMABuf caps feature ? I mean as long as it is still allowed to do it without the cas feature. I.e. with add_allocation_param or Nicolas's solution in comment #49, for more granularity.
The caps feature is just there to strictly fix the memory type and to advertise in the caps what it can support.

Comment 53 Benjamin Gaignard 2015-09-15 22:25:13 UTC

(In reply to Julien Isorce from comment #52)
> (In reply to Benjamin Gaignard from comment #51)
> > Julien: from this example you give on another bug
> > gst-launch-1.0 filesrc location=video.mp4 ! qtdemux ! h264parse !
> > vaapidecode ! "video/x-raw(memory:VASurface), format=NV12" ! vaapipostproc !
> > "video/x-raw(memory:dmabuf), format=RGBA" ! glimagesink
> > 
> > May I understand that you want to extend filter element to be able to select
> > the allocator in buffer pool ? If it is that and while memory:* do not
> > appear in element src/sink caps it fine for me.
> 
> Which filter ?

This one for example: "video/x-raw(memory:VASurface), format=NV12"
If memory:VASurface is used to check that the pool provide an allocator named VASurface it is fine for me. If memory:VASurface is used to select a specific caps 
vaapidecode I think it is wrong.

In gstreamer doc it is clear that allocators are one of the properties negotiated between elements:
http://cgit.freedesktop.org/gstreamer/gstreamer/tree/docs/design/part-bufferpool.txt#n20


> In this example I just use the caps feature because once it is negotiated it
> can only produce dmabuf. It won't mix different memory types in the same
> flow.
> Note that in this example it is export -> import.
> So upstream element does not need to setup downstream pool in advance.
> 
> Sorry it is not clear to me :), in the end are you in favor of memory:DMABuf
> caps feature ?

No adding memory:dmabuf caps feature will make us duplicate template caps every where because you link stream format and memory type

> I mean as long as it is still allowed to do it without the
> cas feature. I.e. with add_allocation_param or Nicolas's solution in comment
> #49, for more granularity.
> The caps feature is just there to strictly fix the memory type and to
> advertise in the caps what it can support.

In propose_allocation you can check the caps and put the allocator(s) that support them. If you have a smart decide_allocation function you can select the allocator else the first one will used.

Comment 54 Nicolas Dufresne (ndufresne) 2015-09-15 23:15:27 UTC

In fact, memory:VASurface is used to state that stream will only hold buffer witch contains memory of type VASurface. There could be no VASurface. So if you had memory:DMABuf, that would mean memory travelling over this stream must be GstBuffer backed by memory of type memory:DMABuf. Most use case for this is for guaranties, or because it is known in advance that these DMABuf are not mappable, so passing those DMABuf as if they where SystemMemory would fail the requirement that a memory of type SystemMemory should be mappable to user space. memory:SystemMemory (the default) can be a DMABuf of course, this is and will always be a legitimate use case.

Comment 55 Nicolas Dufresne (ndufresne) 2015-09-16 01:38:02 UTC

(correction: There could be no VASurface *allocator*)

Comment 56 Nicolas Dufresne (ndufresne) 2015-09-16 02:30:15 UTC

(In reply to Benjamin Gaignard from comment #53)
> In gstreamer doc it is clear that allocators are one of the properties
> negotiated between elements:
> http://cgit.freedesktop.org/gstreamer/gstreamer/tree/docs/design/part-
> bufferpool.txt#n20

If you read attentively, allocator are specially useful to allocate buffer of variable size (that the intention). The allocator type could be use, but an allocator should work with the generic pools. Thus, allocators stored in the query are expected to implement gst_allocator_alloc() and to allow selecting a size and an alignment. This is not the case of the generic DMABUF allocator, which does not implement the generic alloc method. So unless you can subclass this allocator (I don't know if that was the intention here) and implement the generic alloc method, I believe it's wrong to add this allocator there.

About the V4L2 interface, it works with fixed size allocation, it is thus impossible for such an element to provide a DMABUF based allocator. For the dumb buffer it makes sense. So, GstDMABUFAllocator as-is, is a no-go in the allocation query. If that is possible, a subclass of it for the dumb buffer would be perfectly suitable.

That leaves us with the need for another mechanism to advertise that waylandsink (or waylanddmabufsink) implements the importer role. If we have an HW or a compositor that don't support the dumb buffer, then we have a sink that can only take the importer role, and in that case, the caps feature makes a lot of sense, as without upstream providing memory of type DMABuf, nothing can work. As soon as you're sink can take the exporting role, there exist fallback path to make it work.

Ideally, for the v4l2 case, we'll also need a way to advertise that downstream may have dmabuf exporter role only, so an upstream element with importer role can take the right decision, but it's outside of the scope for wayland, hence for this bug.

Comment 57 Julien Isorce 2015-09-22 09:36:06 UTC

(In reply to Nicolas Dufresne (stormer) from comment #56)
> (In reply to Benjamin Gaignard from comment #53)
> Thus, allocators stored in
> the query are expected to implement gst_allocator_alloc() and to allow
> selecting a size and an alignment. This is not the case of the generic
> DMABUF allocator, which does not implement the generic alloc method. So I believe it's wrong to add this allocator there.

I think "gst_query_add_allocation_param" should prevent to add an allocator which has its GstAllocatorClass->alloc set to NULL. (which is the case for gst dmabuf alocator, it seems that it has derived to also define it with a g_warning inside, see allocator_class->alloc = _gl_mem_alloc)

It think it should be a way to just add the supported memory types to the query, by adding only the name, not the allocators directly. I found caps features would be the proper way.

I asked slomo on irc and told me that it is valid to add these allocators to gst_query_add_allocation_param. But maybe we can revisit this for next versions.
For sure gstgl add it in the uploader propose_allocation:

allocator = gst_allocator_find (GST_GL_MEMORY_ALLOCATOR);
gst_query_add_allocation_param (query, allocator, &params);

But I cannot find where it is useful. Even if it is I think this could be done with caps features.

Comment 58 Benjamin Gaignard 2015-09-22 09:59:37 UTC

(In reply to Julien Isorce from comment #57)
> I think "gst_query_add_allocation_param" should prevent to add an allocator
> which has its GstAllocatorClass->alloc set to NULL. (which is the case for
> gst dmabuf alocator, it seems that it has derived to also define it with a
> g_warning inside, see allocator_class->alloc = _gl_mem_alloc)
> 
> It think it should be a way to just add the supported memory types to the
> query, by adding only the name, not the allocators directly. I found caps
> features would be the proper way.
> 
> I asked slomo on irc and told me that it is valid to add these allocators to
> gst_query_add_allocation_param. But maybe we can revisit this for next
> versions.
> For sure gstgl add it in the uploader propose_allocation:
> 
> allocator = gst_allocator_find (GST_GL_MEMORY_ALLOCATOR);
> gst_query_add_allocation_param (query, allocator, &params);

Yes before do it like that I have asked the same question (and even when testing buffer pool and allocator on 0.11 branch) and it is working well so why change it now ?

> 
> But I cannot find where it is useful. Even if it is I think this could be
> done with caps features.

If you change the caps feature you will have to change lot of plugins.
For example videotestsrc ! waylandsink is functional with dmabuf set as allocator and this save the copy from shm to hw memory.
It is the same with subtitles, with the implementation I propose we can use subtitle element (or any other) without modification.

Comment 59 Nicolas Dufresne (ndufresne) 2015-09-22 15:29:00 UTC

It's not because we thought it was fine before that it's fine now (there has been more then a year since we started this discussion). If an allocator has no ->alloc it can only be part of the proposed allocation if there is a negotiated caps feature to back it. GL allocator is only offered in the case where it can do memory of type SystemMemory (the GL Upload slow path). For EGLImage, it is backed with caps feature.

Again, cause I'm really repeating myself, if you have an allocator (or a pool using an internal allocator) producing mappable DMABUF like V4L2 and the proposed DRM Dumb Buffers allocator, it can be proposed with a memory:SystemMemory caps feature. Hence, no, videotestsrc ! waylandsink would not need anything special if waylandsink can allocate memory of type SystemMemory. That would depend on having Dumb Buffer support first, which we don't have in upstream GStreamer, maybe your focus should be on offering mergeable patches (against git master) for this allocator ?

Why we need an ->alloc is simple. If we proposed an allocator to videotestsrc, it will do:

  * Ok I got an allocator but no pool
  * Ok, let's create a videopool and pass it the downstream allocator
  * Much later ... allocate fails, pipeline fails.

This is not acceptable.

Comment 60 George Kiagiadakis 2015-11-22 12:26:28 UTC

I've worked a bit on this recently and I've updated my branch with the split sink approach:

https://git.collabora.com/cgit/user/gkiagia/gst-plugins-bad.git/log/?h=waylandsink-dmabuf

Now wldmabufsink is fully implemented and works, but it is still unclear how to do proper negotiation of format and allocator.

Currently, the linux_dmabuf v1 protocol that I am using has no way of informing the client of what requirements it has on the dmabuf that it can import. It can inform it only about the colour format (although this is not implemented at the moment), but it cannot inform it about the alignment constraints as well as the potential hardware constraints (i.e. a certain device may not be able to share dmabuf with another certain device in some platform maybe?). However, the protocol does have a way to find out early-ish if a certain configuration is not going to work. This works like this: you allocate a dmabuf from your potential allocator, you try to import it to the compositor with the format and alignment that you want and if the compositor cannot use this, then it sends back an event saying so. This can be done without creating a surface, so it can be used as an early negotiation mechanism.

When I started writing wldmabufsink, I was thinking that I could do this in set_caps() in order to check if the negotiated format is OK (since listing formats from the compositor is not implemented). However, the problem is that at that point the allocator is unknown. One solution would be to write a drm dumb-buffers-based allocator in waylandsink, like the one that Benjamin has written in his patch, and then allocate a dummy buffer from there with the negotiated format and see if it works. But this is wrong in many ways. Apart from adding a dependency to drm and limiting waylandsink for use on drm-based systems, it would also not really prove anything about the negotiation, because the compositor may accept dumb buffers and reject v4l2 buffers for example, or vice versa (in a v4l2{src,dec} ! wldmabufsink scenario).

A better solution would be instead to *negotiate* the allocator. In GStreamer this is currently not supported. Downstream gets to make a proposal of its preferred allocator and upstream gets to make a decision, which may be totally incompatible with what downstream can use (in this specific case of wldmabufsink, downstream has nothing to propose and upstream gets to select whatever allocator it wants). Downstream instead needs a way to validate that a chosen allocator works and upstream needs a way to propose an allocator and get feedback from downstream on its decision. This could be achieved by extending the semantics of the existing ALLOCATION query and adding a couple of vfuncs for them in GstBase* classes, plus some more GstQuery manipulation functions. Upstream could set one or more allocators on the query before sending it downstream and downstream could then check that these allocators work. If a certain allocator doesn't work, downstream should remove it from the list and maybe also sort the working ones in order of preference. Eventually upstream would then get a list of allocators that are guaranteed to work, or an empty list in case nothing works and negotiation has failed.

Now this doesn't fully solve the problem again, because there is still no way to negotiate the format. But this is something that eventually will be implemented, I hope, in the wayland linux_dmabuf protocol, so we will be able to use normal caps to negotiate the format first and then negotiate the allocator using the mechanism described above. Fingers crossed, though, as I'm a bit sceptical wrt the reason that this feature is not implemented yet.

Regarding the drm dumb buffers allocator, it would be really nice to have it around so that something like videotestsrc ! wldmabufsink can work. However, as I said above, I do not agree with adding a dependency on drm in waylandsink. Thinking about it, this allocator is a lot like the allocator that I have done in the "inteldmabufupload" element [1]. The difference is that the inteldmabuf upload element allocates buffers from GEM, which are GPU buffers meant to be used by GL internally, and uses the intel-specific API. Dumb buffers are in a different memory and are cross-GPU compatible. What I would propose there is to keep wldmabufsink without any allocator and instead require the use of "upload" elements, like inteldmabufupload. We can have dumbdmabufupload, intel*, radeon*, nouveau*, ion*, whatever, if there are use cases for them. Wayland does not really require any specific allocator, it is up to the negotiation mechanism to decide what to use.

Eventually, with this design, we can have 2 kinds of pipelines:

1) Software source, using mmap()'able dmabuf:

videotestsrc ! dumbdmabufupload ! wldmabufsink
filesrc ! qtdemux ! avdec_h264 ! dumbdmabufupload ! wldmabufsink

2) Hardware source, eliminating the need for an upload element:

v4l2src ! wldmabufsink
filesrc ! qtdemux ! v4l2dec ! wldmabufsink

And of course things like "videotestsrc ! wldmabufsink" would fail in the negotiation of the allocator, as the list of acceptable allocators should be empty in this case.

[1]. https://bugzilla.gnome.org/show_bug.cgi?id=732281

Comment 61 Daniel Stone 2015-11-23 16:43:15 UTC

(In reply to George Kiagiadakis from comment #60)
> I've worked a bit on this recently and I've updated my branch with the split
> sink approach:
> 
> https://git.collabora.com/cgit/user/gkiagia/gst-plugins-bad.git/log/
> ?h=waylandsink-dmabuf

I've not actually looked into the code yet, nor do I have anything to clever to say about the GSt-specific design, but a few things jumped out at me ...

> Currently, the linux_dmabuf v1 protocol that I am using has no way of
> informing the client of what requirements it has on the dmabuf that it can
> import. It can inform it only about the colour format (although this is not
> implemented at the moment), but it cannot inform it about the alignment
> constraints as well as the potential hardware constraints (i.e. a certain
> device may not be able to share dmabuf with another certain device in some
> platform maybe?).

Sorry, but you'll never get that. Not only do the restrictions vary from format to format, but to an extent they also often depend on state rather than static limits.

There was an effort a few years ago to find a common way to express these limitations, but it never stuck, and hardware has only got more complex since. Not only that, but you'd then have to convey the restrictions back to the allocator ... in short, trial and error is the only method that will work for the forseeable future.

> When I started writing wldmabufsink, I was thinking that I could do this in
> set_caps() in order to check if the negotiated format is OK (since listing
> formats from the compositor is not implemented). However, the problem is
> that at that point the allocator is unknown. One solution would be to write
> a drm dumb-buffers-based allocator in waylandsink, like the one that
> Benjamin has written in his patch, and then allocate a dummy buffer from
> there with the negotiated format and see if it works. But this is wrong in
> many ways.

Listing formats for the compositor is WIP, and the protocol included in the last revision on Phabricator.

> Apart from adding a dependency to drm and limiting waylandsink
> for use on drm-based systems, it would also not really prove anything about
> the negotiation, because the compositor may accept dumb buffers and reject
> v4l2 buffers for example, or vice versa (in a v4l2{src,dec} ! wldmabufsink
> scenario).

And this may be format-dependent, e.g. for one format/modifier combination (thinking of tiling and compression in particular), it may only be possible to use non-dumb buffers.

> Now this doesn't fully solve the problem again, because there is still no
> way to negotiate the format. But this is something that eventually will be
> implemented, I hope, in the wayland linux_dmabuf protocol, so we will be
> able to use normal caps to negotiate the format first and then negotiate the
> allocator using the mechanism described above. Fingers crossed, though, as
> I'm a bit sceptical wrt the reason that this feature is not implemented yet.

The reason it isn't implemented yet is twofold.

Firstly, DRM format modifiers were added as an additional combination to the format. In this case, the format describes the actual pixel values one-by-one (e.g. each pixel is x8r8g8b8), but modifiers describe an additional transformation between the raw buffer and the pixels: tiling, compression, et al. This is required for quite a few usecases, and replaces the separate NV12M-style format codes, as well as driver-specific tiling which can only be ascertained through custom queries (i915, Nouveau). There is currently no support in the EGL dmabuf-import extension for this, which is WIP, but we need to think reasonably carefully about how to implement (format, modifier) query tuples.

Secondly, the EGL extension again does not include support for format queries. This is a bit of a showstopper: if DRM can support importing NV12, but the EGL implementation supports neither NV12 via TEXTURE_EXTERNAL nor R8/RG8 via TEXTURE_2D, then that means we can only ever display that buffer on planes, and not through the EGL fallback. This is a showstopper for advertising formats, and is also WIP.

> Regarding the drm dumb buffers allocator, it would be really nice to have it
> around so that something like videotestsrc ! wldmabufsink can work. However,
> as I said above, I do not agree with adding a dependency on drm in
> waylandsink. Thinking about it, this allocator is a lot like the allocator
> that I have done in the "inteldmabufupload" element [1]. The difference is
> that the inteldmabuf upload element allocates buffers from GEM, which are
> GPU buffers meant to be used by GL internally, and uses the intel-specific
> API.

Technically, dumb buffers are also very much GEM, but this is just a nitpick.

> Dumb buffers are in a different memory and are cross-GPU compatible.

No, that's not it. Often they're in the same memory! It's every bit as possible to use buffers allocated through device-specific API as it is dumb buffers. In fact, sometimes even more possible.

The difference between dumb and non-dumb buffers is that dumb buffers are designed for exactly one usecase: software rendering into them whilst mmap()ed. You are guaranteed to be allowed to be able to mmap() dumb buffers, but this is explicitly _not_ true of non-dumb buffers, and there is no generic API provided for doing so. Similarly, non-dumb buffers can be used with hardware acceleration engines (mainly GPU), whereas dumb buffers are _explicitly_ not supported for hardware rendering. There are myriad reasons for this, frequently to do with coherency (not just pipeline/cache coherency, but also things like compression buffers); some of these are less valid today than they were when this was created, but not all.

tl;dr: the main difference is mmap

> What I would propose there is to keep wldmabufsink without any allocator and
> instead require the use of "upload" elements, like inteldmabufupload. We can
> have dumbdmabufupload, intel*, radeon*, nouveau*, ion*, whatever, if there
> are use cases for them. Wayland does not really require any specific
> allocator, it is up to the negotiation mechanism to decide what to use.

So the 'upload' element is essentially an allocator, right? In that it will control allocation of the storage backing the dmabuf in the first place, mediate access to it (e.g. performing any necessary pipeline/cache coherency calls), and provide (emulation of) CPU mapping if necessary.

The rest sounds good to me. Thanks for pushing on this!

Comment 62 George Kiagiadakis 2015-11-23 17:12:22 UTC

(In reply to Daniel Stone from comment #61)
> (In reply to George Kiagiadakis from comment #60)

[...]

> > Dumb buffers are in a different memory and are cross-GPU compatible.
> 
> No, that's not it. Often they're in the same memory! It's every bit as
> possible to use buffers allocated through device-specific API as it is dumb
> buffers. In fact, sometimes even more possible.
> 
> The difference between dumb and non-dumb buffers is that dumb buffers are
> designed for exactly one usecase: software rendering into them whilst
> mmap()ed. You are guaranteed to be allowed to be able to mmap() dumb
> buffers, but this is explicitly _not_ true of non-dumb buffers, and there is
> no generic API provided for doing so. Similarly, non-dumb buffers can be
> used with hardware acceleration engines (mainly GPU), whereas dumb buffers
> are _explicitly_ not supported for hardware rendering. There are myriad
> reasons for this, frequently to do with coherency (not just pipeline/cache
> coherency, but also things like compression buffers); some of these are less
> valid today than they were when this was created, but not all.
> 
> tl;dr: the main difference is mmap

Ok, sorry, documentation about all this is totally confusing. Thanks for clarifying.
 
> > What I would propose there is to keep wldmabufsink without any allocator and
> > instead require the use of "upload" elements, like inteldmabufupload. We can
> > have dumbdmabufupload, intel*, radeon*, nouveau*, ion*, whatever, if there
> > are use cases for them. Wayland does not really require any specific
> > allocator, it is up to the negotiation mechanism to decide what to use.
> 
> So the 'upload' element is essentially an allocator, right? In that it will
> control allocation of the storage backing the dmabuf in the first place,
> mediate access to it (e.g. performing any necessary pipeline/cache coherency
> calls), and provide (emulation of) CPU mapping if necessary.

Yes, the upload element will essentially provide an allocator to the pipeline and also in the very very worst case that we have an upstream element that doesn't know how to properly use allocators and buffer pools, it will copy over buffer contents from other memory to the mmap()'ed dmabuf memory.

The rest (access mediation, mmap()ing, etc...) is controlled by the superclasses of gstreamer already (GstMemory / GstAllocator / GstBufferPool). The element's allocator will just implement the alloc() and free() functions.

Comment 63 Daniel Stone 2015-11-23 17:19:44 UTC

(In reply to George Kiagiadakis from comment #62)
> The rest (access mediation, mmap()ing, etc...) is controlled by the
> superclasses of gstreamer already (GstMemory / GstAllocator /
> GstBufferPool). The element's allocator will just implement the alloc() and
> free() functions.

Right, I just meant in terms of potentially performing any flush at map / invalidation at unmap, for either caches or compressed buffers, or whatever. You'd have to have map() and unmap() plumbed through to the allocator, rather than a single fixed allocation. But I'm pretty sure this happens today, right?

Comment 64 Olivier Crête 2015-11-23 20:16:21 UTC

(In reply to Daniel Stone from comment #63)
> Right, I just meant in terms of potentially performing any flush at map /
> invalidation at unmap, for either caches or compressed buffers, or whatever.
> You'd have to have map() and unmap() plumbed through to the allocator,
> rather than a single fixed allocation. But I'm pretty sure this happens
> today, right?

The "download" should happen on the first call to gst_memory_map() and the upload/flush/etc should be done by the sink.. Otherwise you may end up doing multiple download/flushes as different elements may map/unmap the memory.

Comment 65 Fabien Dessenne 2015-11-24 09:01:33 UTC

Created attachment 316144 [details] [review]
linux_dmabuf protocol support

Replaces previous attachment "make waylandsink use zlinux_dmabuf upstreamed protocol" uploaded by Benjamin Gaignard

Comment 66 Fabien Dessenne 2015-11-24 09:01:55 UTC

I have reworked the previous "linux_dmabuf protocol support" patch that was attached by Benjamin Gaignard.
The major change from the previous version is the removal of the dmabuf buffer pool (that was DRM dumb buffer based).
The sink handles both SHM and dmabuf buffers

There are still some issues to fix, in the base classes of decoder and transform to have everyhting working fine in both shm/dmabuf modes, but I think that this is a good starting point

Fabien

Comment 67 George Kiagiadakis 2015-11-24 12:24:28 UTC

(In reply to Fabien Dessenne from comment #66)
> I have reworked the previous "linux_dmabuf protocol support" patch that was
> attached by Benjamin Gaignard.
> The major change from the previous version is the removal of the dmabuf
> buffer pool (that was DRM dumb buffer based).
> The sink handles both SHM and dmabuf buffers
> 
> There are still some issues to fix, in the base classes of decoder and
> transform to have everyhting working fine in both shm/dmabuf modes, but I
> think that this is a good starting point
> 
> Fabien

I see a collision here as we are both working on the same thing apparently.

A few notes on your patch:
* It still depends on libdrm (I know we have to use drm_fourcc.h, but it doesn't justify the dependency for me, as it's just a couple of constants. I prefer copying them over).
* It's not based on master (and I can tell it does not apply cleanly on master)
* GST_VIDEO_FORMATS is confusing as a name, since there is another constant like that in the gstvideo library
* Using gst_wl_display_rountrip() is bad. At that time that you call it, the event queue is polled by another thread in wldisplay.c, so doing a sync roundtrip in another thread while the event loop thread is running in parallel is prone to cause races.

Finally, I see that you are now using the "memory:dmabuf" caps feature. I thought that you were trying to avoid that. Do you have reconsidered your proposal, or do you agree/disagree with my proposal in the comments above? It would be useful to discuss a bit more about it.

Comment 68 Fabien Dessenne 2015-11-24 13:41:36 UTC

(In reply to George Kiagiadakis from comment #67)
> I see a collision here as we are both working on the same thing apparently.
Agree :)

> A few notes on your patch:
> * It still depends on libdrm (I know we have to use drm_fourcc.h, but it
> doesn't justify the dependency for me, as it's just a couple of constants. I
> prefer copying them over).
I am neither satisfied with having such a libdrm dependance, nor with hard copying the drm_fourcc.h file.
Maybe there is something to do in configure.ac so we can have only a libdrm header file dependance, not a 'full' dependance. I do not know if this is possible (AC_CHECK_HEADER?) but it would be probably the best option.


> * It's not based on master (and I can tell it does not apply cleanly on
> master)
Well, I apologize here. To be honest I did not plan to share this patch right now (as I did not rebase it yet on master) but your patch proposal forced me to do it now, so we can discuss on the two different options we have now.


> * GST_VIDEO_FORMATS is confusing as a name, since there is another constant
> like that in the gstvideo library
Agree, I need to fix it.


> * Using gst_wl_display_rountrip() is bad. At that time that you call it, the
> event queue is polled by another thread in wldisplay.c, so doing a sync
> roundtrip in another thread while the event loop thread is running in
> parallel is prone to cause races.
Good point. The way you manage it in your proposal seems a better option.


> Finally, I see that you are now using the "memory:dmabuf" caps feature. I
> thought that you were trying to avoid that.
I have tried, tried and tried again. I finally came to this least bad option.


> Do you have reconsidered your
> proposal, or do you agree/disagree with my proposal in the comments above?
> It would be useful to discuss a bit more about it.
Benjamin (who is an office coworker of mine) has a better view on this discussion, he's going to take over my answer.

Fabien

Comment 69 Benjamin Gaignard 2015-11-24 14:03:20 UTC

(In reply to George Kiagiadakis from comment #67)
> 
> Finally, I see that you are now using the "memory:dmabuf" caps feature. I
> thought that you were trying to avoid that. Do you have reconsidered your
> proposal, or do you agree/disagree with my proposal in the comments above?
> It would be useful to discuss a bit more about it.

After talking with Nicolas we have decided to test the caps feature solution even if until now we are facing issues to be able to negotiate those caps between elements (mainly because base classes don't accept caps feature "ANY" as fixate caps)
We also have decided to no more propose to use dumb buffer API to allocate buffer inside waylandsink inside it will be up to the upstream element (v4l2 decoder or transform) to export a dmabuf buffer.

Comment 70 Nicolas Dufresne (ndufresne) 2015-11-24 14:21:59 UTC

Sorry, but hijacking someones work like this is a pitty. Please don't do that again.

Comment 71 Nicolas Dufresne (ndufresne) 2015-11-24 14:29:05 UTC

From now on, I have added George branch to the URL field. If someone wants to help, use this branch, propose patches to George for inclusion in that branch please.

Comment 72 Fabien Dessenne 2015-11-24 14:53:06 UTC

(In reply to Nicolas Dufresne (stormer) from comment #70)
> Sorry, but hijacking someones work like this is a pitty. Please don't do
> that again.

Nicolas,
I feel realy really REALLY not comfortable with your comment.

I do not intend to hijack anyone's work. Like I said, I propose a solution that can be compared with another one.
Maybe you do not like when we have a choice. I personnaly do.

Then, if you have a quick look at this thread history you shall notice that I provided a first proposal that needed to be reworked, which I have done now (a bit late).
So I do not feel like I hijacked anything.
But, anyway we do not care at all of who was the first one, this is absolutelty meaningless.

Now, I propose you move back my patch proposal from "Obsolete" to "new". And then we can start discussing of the pros & cons of George and I solutions. Just like George, an open-minded guy, started to do.

Fabien

Comment 73 Nicolas Dufresne (ndufresne) 2015-11-24 18:03:55 UTC

I was, and I agree, over aggressive, my apology. Posting more patches with such a bad timing make reviews much harder. No one had time to actually look in depth into George patch set. At best, we can mark you patches "need work" as, for the reason George already mention, they are not ready for upstream.

Instead of posting patches, you could have made a review, commented and prepared the work to find out the pros and cons (or the missing feature you need). Otherwise you force everyone on the GStreamer maintenance side into reading two patch sets.

Comment 74 Nicolas Dufresne (ndufresne) 2015-11-24 18:44:01 UTC

In Fabien's patch, the caps feature implementation is not acceptable. The first patch I would expect is to add a define with the caps feature name. So far, everyone have use memory:dmabuf, which does not follow the usual convention found in other caps feature where CamelCase is preferred. This define should be part of the dmabuf allocator API imho (at least it's a good location for it as we need that anyway). It's also an opportunity to write some documentation to help ensuring that it's use will be consistent. I am still working myself on a definition and method to use caps features to help solving the DMABuf negotiation. Though, this is not simple and the fact that proper format list are not available does not make it that interesting.

The main difference between the patch is unique sink vs specialized sink. The main problem with the unique sink, is that the entire sink caps become a lie. There is no way to use the sink and be guaranties it will work. A malloc buffer could be received in an unsupported format, or an dmabuf that cannot be imported, cannot be copied to SHM for the same reason. There is no fallback, just frustration.

With the split between shm and dmabuf, you at least have one sink that you are guarantied will work. It also gives a lot more control, as you can now solve the fact DMABuf will not always work at a higher level. You can at the same time have a very strict (DMABUF or fail) kind of pipeline, or implement a higher level sink plugger that will try pushing puffers to it's prefered sink, and if that fails choose the best fallback (most of the time this is not SHM with software conversion but GL, specially that dmabuf support there is nearly ready and does not require caps feature).

Comment 75 Fabien Dessenne 2016-07-29 10:43:45 UTC

There have been some discussions in several threads / chats, and I feel like there is a common agreement to resume work with the following assumptions:
- usage of "memory:DMABuf" caps feature  (as defined in https://bugzilla.gnome.org/show_bug.cgi?id=759358)
- unique sink (no dmabuf / shm sinks split) since we use "memory:DMABbuf" caps feature

-> example of proposed caps with weston supporting linux-dmabuf:
video/x-raw, format=(string){ BGRA, BGRx, RGB16, RGB, RGBx, RGBA }; video/x-raw(memory:DMABuf), format=(string){ BGRx, RGBx, BGRA, RGBA, RGB16, BGR, NV12 }

With everything based on latest wayland-protocols (zwp_linux_dmabuf_v1)

Do not hesitate to comment since I am resuming work and will soon propose an updated patch.

Comment 76 Fabien Dessenne 2016-08-10 13:49:13 UTC

Here is the updated patch.
(please apply first this pending patch:
 https://bugzilla.gnome.org/show_bug.cgi?id=767671 "wayland: Update from scaler to viewporter protocol")

Waiting for your comments.

Comment 77 Fabien Dessenne 2016-08-10 13:52:04 UTC

Created attachment 333064 [details] [review]
linux_dmabuf protocol support v2

Comment 78 Benjamin Gaignard 2016-09-15 09:50:31 UTC

Can someone review the last patch and help us to progress on this topic ?
Thanks

Comment 79 Nicolas Dufresne (ndufresne) 2016-09-15 14:04:19 UTC

I'd say we should split the patch. First one adding dmabuf importation and a second adding dmabuf negotiated with caps features. Reason is that there is no use in master of the caps feature, so it pretty much means we'd be merging untested code (or mark as depends on 759358). It's pretty much what we did in kmssink.

I still have one concern regarding the draft API and the use of it:

In the event format:
+  XXX: Can a compositor ever enumerate them?

What does that mean, what does it looks like in existing / usable compositors ? This patch is highly dependant on this feature (dmabuf format enumaration). I know that only shader based GL renderer can provide that, even though it's not a guaranty.

(In reply to Fabien Dessenne from comment #76)
> Here is the updated patch.
> (please apply first this pending patch:
>  https://bugzilla.gnome.org/show_bug.cgi?id=767671 "wayland: Update from
> scaler to viewporter protocol")

Please fill the "Depens On:" field next time.

Comment 80 Nicolas Dufresne (ndufresne) 2016-09-15 14:25:50 UTC

Review of attachment 333064 [details] [review]:

I think I covered all I could find.

::: ext/wayland/gstwaylandsink.c
@@ +660,3 @@
       wbuf = gst_wl_shm_memory_construct_wl_buffer (mem, sink->display,
           &sink->video_info);
+    else if (gst_is_dmabuf_memory (mem))

As you added NV* support, you may have two DMA-Buf FDs. This check need to be a look (or followed by a loop), and gst_wl_linux_dmabuf_construct_wl_buffer() need work.

@@ +662,3 @@
+    else if (gst_is_dmabuf_memory (mem))
+      wbuf = gst_wl_linux_dmabuf_construct_wl_buffer (buffer, sink->display,
+          &sink->video_info);

Need to check the the dmabuf interface is present, will cause an assertion inside that function if missing in the compositor.

@@ +670,3 @@
       GstMapInfo src;
       /* we don't know how to create a wl_buffer directly from the provided
+       * memory, so we have to copy the data to shm memory that we know how

This is sub-optimal if the incoming buffer is a DMA-Buf that failed to import but is in one of the supported shm formats. A method to wrap the DMA-Buf fs as being shm should be added (quite trivial).

@@ +738,2 @@
   {
     GST_ERROR_OBJECT (sink, "could not create wl_buffer out of wl_shm memory");

An error should be posted on the bus, see GST_ELEMENT_ERROR(). Also, we should add some logic to clarify. We may have a dmabuf that failed to import and was of a format unsupported by SHM. There no clear error for that atm,

@@ +743,3 @@
+no_wl_buffer:
+  {
+    GST_ERROR_OBJECT (sink, "buffer %p cannot have a wl_buffer", buffer);

GST_ELEMENT_ERROR().

Comment 81 Nicolas Dufresne (ndufresne) 2016-09-16 15:33:50 UTC

Review of attachment 333064 [details] [review]:

::: ext/wayland/wldisplay.c
@@ +324,3 @@
   VERIFY_INTERFACE_EXISTS (shm, "wl_shm");
   VERIFY_INTERFACE_EXISTS (viewporter, "wp_viewporter");
+  VERIFY_INTERFACE_EXISTS (dmabuf, "zwp_linux_dmabuf_v1");

Missed it on first pass, but the interface should be optional.

Comment 82 Nicolas Dufresne (ndufresne) 2016-09-16 15:36:46 UTC

Review of attachment 333064 [details] [review]:

And some build errors:

In file included from wlshmallocator.h:29:0,
                 from wlshmallocator.c:23:
wldisplay.h:87:38: erreur : unknown type name ‘uint’
 gboolean is_dmabuf_format_supported (uint format_dmabuf,
                                      ^~~~
In file included from wlshmallocator.c:24:0:
wlvideoformat.h:34:1: erreur : unknown type name ‘uint’
 uint gst_video_format_to_wl_dmabuf_format (GstVideoFormat format);
 ^~~~
wlvideoformat.h:36:54: erreur : unknown type name ‘uint’
 GstVideoFormat gst_wl_dmabuf_format_to_video_format (uint wl_format);
                                                      ^~~~
wlvideoformat.h:38:46: erreur : unknown type name ‘uint’
 const gchar *gst_wl_dmabuf_format_to_string (uint wl_format);

Comment 83 Nicolas Dufresne (ndufresne) 2016-09-16 15:58:02 UTC

Review of attachment 333064 [details] [review]:

::: ext/wayland/wldisplay.c
@@ +218,3 @@
+    self->dmabuf =
+        wl_registry_bind (registry, id, &zwp_linux_dmabuf_v1_interface, 1);
+    zwp_linux_buffer_params_v1_add_listener (self->dmabuf, &dmabuf_listener,

This is calling the wrong function, should be zwp_linux_dmabuf_v1_add_listener(). Obviously does not compile due to wrong type.

::: ext/wayland/wllinuxdmabuf.c
@@ +80,3 @@
+
+  mem = gst_buffer_peek_memory (buf, 0);
+  fd = gst_dmabuf_memory_get_fd (mem);

Since you don't include gst/allocators/gstdmabuf.h, it does not compile.

Comment 84 Nicolas Dufresne (ndufresne) 2016-09-21 14:02:33 UTC

Created attachment 335999 [details] [review]
[v2] waylandsink: support linux dmabuf protocol

This is your patch that I fixed to compile, along with other fixes. It's
rebase on some waylandsink cleanup fixes I'm about to merge for 1.10. Some
of the things left:

- Implement support for multiple dmabuf FD per plane
- Complete the DRM to GST format map (see GL/DMABUF code)
- Implement passing dmabuf fd as shm when color format is supported
- Fix robustness issue in the format manipulation code (it currently
  asserts if the compositor enumerate an unknown format).
- There is some left over GST_ERROR that should be ELEMENT_ERROR

There is probably more I'm not thinking about yet.

Comment 85 Nicolas Dufresne (ndufresne) 2016-09-21 15:12:30 UTC

Created attachment 336006 [details] [review]
[v3 ]waylandsink: support linux dmabuf protocol

Rebased against upstream changes.

Comment 86 Fabien Dessenne 2016-09-22 09:15:09 UTC

Created attachment 336061 [details] [review]
[v4] waylandsink: support linux dmabuf protocol

Update since v3:
- support dmabuf with multiple fd
- make dmabuf interface optional
- gst_video_format_to_wl_dmabuf_format returns int, not uint

Comment 87 Fabien Dessenne 2016-09-22 09:25:36 UTC

Nicolas, I have added/fix few things in v3. In top of that, I have two remarks on v3:
- in show_frame, since you removed the "if (sink->pool)" check in show_frame, the patch is unconsistent. Indeed set_caps does not ALWAYS propose a pool (as described in the commit message: no buffer pool for dmabuf). If you really want to remove "if (sink->pool)", then you need some rework. I suggest you keep this check, and add a "fixme" or something like this
- in gst_wayland_sink_set_caps, you start with updating sink->video_info / sink->video_info_changed. Then you check for parameters support, which may result in "return false": in that case, you have updated video_info while you probably should not have. I suggest you keep the initial implementation.

Another remark, we talked about, regarding additional check for dmabuf_format support: I think there is no need to update anything, and you probably got confused with your hack patch "HACK: Just hackup a format list to be able to test". Indeed, before adding a new format to dmabuf_formats you shall first check if that format can actually be converted to a known GST video format. Just like it is done in dmabuf_format() (wldisplay.c), otherwise you will get some assert error (which you talked me about)
Example (this format will be filtered out);
  format = DRM_FORMAT_YUV420;
  if (gst_wl_dmabuf_format_to_video_format (format) != GST_VIDEO_FORMAT_UNKNOWN)
    g_array_append_val (self->dmabuf_formats, format);

Comment 88 Nicolas Dufresne (ndufresne) 2016-09-22 13:05:05 UTC

(In reply to Fabien Dessenne from comment #87)
> Nicolas, I have added/fix few things in v3. In top of that, I have two
> remarks on v3:
> - in show_frame, since you removed the "if (sink->pool)" check in
> show_frame, the patch is unconsistent. Indeed set_caps does not ALWAYS
> propose a pool (as described in the commit message: no buffer pool for
> dmabuf). If you really want to remove "if (sink->pool)", then you need some
> rework. I suggest you keep this check, and add a "fixme" or something like
> this

Sorry, I'm not sure I follow you, I believed I have checked the conditions for the pool checks properly. Can you clarify ?

> - in gst_wayland_sink_set_caps, you start with updating sink->video_info /
> sink->video_info_changed. Then you check for parameters support, which may
> result in "return false": in that case, you have updated video_info while
> you probably should not have. I suggest you keep the initial implementation.

Failing SET_CAPS is fatal, the state of your video-info is no longer important.

> 
> Another remark, we talked about, regarding additional check for
> dmabuf_format support: I think there is no need to update anything, and you
> probably got confused with your hack patch "HACK: Just hackup a format list
> to be able to test". Indeed, before adding a new format to dmabuf_formats
> you shall first check if that format can actually be converted to a known
> GST video format. Just like it is done in dmabuf_format() (wldisplay.c),
> otherwise you will get some assert error (which you talked me about)
> Example (this format will be filtered out);
>   format = DRM_FORMAT_YUV420;
>   if (gst_wl_dmabuf_format_to_video_format (format) !=
> GST_VIDEO_FORMAT_UNKNOWN)
>     g_array_append_val (self->dmabuf_formats, format);

You miss-understood. GStreamer do have a match for all main DRM_FORMAT_*, I'd like the static map to be complete. The assertion was cause indeed by the hack, but would occure if the remote compositor sent a FOURCC that is unsupported. That because gst_video_format_to_string() does not accept UNKNOWN.

Comment 89 Nicolas Dufresne (ndufresne) 2016-09-22 13:43:21 UTC

Review of attachment 336061 [details] [review]:

::: ext/wayland/wlvideoformat.c
@@ +110,3 @@
+  {DRM_FORMAT_RGB565, GST_VIDEO_FORMAT_RGB16},
+  {DRM_FORMAT_YUYV, GST_VIDEO_FORMAT_YUY2},
+  {DRM_FORMAT_NV12, GST_VIDEO_FORMAT_NV12},

Why not adding them all ? GStreamer supports more packed format, along with AYUV, it supports NV21, NV16, NV61. It also has support for 24bits RGB formats.

Comment 90 Nicolas Dufresne (ndufresne) 2016-09-23 14:39:15 UTC

Review of attachment 336061 [details] [review]:

Nothing major come out this time, this is nearly ready for 1.11.

::: ext/wayland/wldisplay.c
@@ +166,3 @@
+
+gboolean
+is_shm_format_supported (enum wl_shm_format format_shm, GstWlDisplay * display)

Namespace missing.

@@ +181,3 @@
+
+gboolean
+is_dmabuf_format_supported (guint format_dmabuf, GstWlDisplay * display)

Namespace missing.

@@ +334,3 @@
 
+  if (!self->dmabuf) {
+    g_warning ("Could not bind to zwp_linux_dmabuf_v1");

Should be a GST_INFO() instead, logged in the PERFORMANCE category.

Comment 91 Fabien Dessenne 2016-09-26 15:20:42 UTC

Created attachment 336270 [details] [review]
[v5] waylandsink: support linux dmabuf protocol

v5:
Complete, fix and factorize the video format conversion table:
- use a common table for shm and dmabuf
- add some pixel formats (NV61, XRGB1555 / XBGR1555)
- fix YUV444 gst video format conversion (is Y444, not v308)
- consider endianness for RGB24bits formats

Comment 92 Nicolas Dufresne (ndufresne) 2016-09-30 18:02:36 UTC

Review of attachment 336270 [details] [review]:

I'll rebase it, but I made a important rework of the gstwaylandsink part of this patch.

::: ext/wayland/wlvideoformat.c
@@ +67,3 @@
+  {WL_SHM_FORMAT_XBGR1555, DRM_FORMAT_XBGR1555, GST_VIDEO_FORMAT_BGR15},
+
+  {WL_SHM_FORMAT_YUYV, DRM_FORMAT_YUYV, GST_VIDEO_FORMAT_YUY2},

Need to double check, but I think packed video format are also affected by endianness in translation.

Comment 93 Nicolas Dufresne (ndufresne) 2016-11-02 15:47:28 UTC

Created attachment 338959 [details] [review]
waylandsink: Allow any kind of FD for shm memory

Comment 94 Nicolas Dufresne (ndufresne) 2016-11-02 15:47:34 UTC

Created attachment 338960 [details] [review]
waylandsink: support linux dmabuf protocol

Support the wayland zwp_linux_dmabuf_unstable_v1 protocol.
SHM formats and DMABuf formats are exposed differently in caps: the
DMABuf formats are flagged with GST_CAPS_FEATURE_MEMORY_DMABUF.
No buffer pool is proposed for DMABuf buffers, it is the upstream
element responsibility to provide with such buffers.

Comment 95 Nicolas Dufresne (ndufresne) 2016-11-02 15:47:45 UTC

Created attachment 338961 [details] [review]
waylandsink: Rework dmabuf support

Simplify and fix some of the show_frame logic.

Comment 96 Nicolas Dufresne (ndufresne) 2016-11-03 19:38:12 UTC

Attachment 338959 [details] pushed as 3272f20 - waylandsink: Allow any kind of FD for shm memory
Attachment 338960 [details] pushed as 2ad337e - waylandsink: support linux dmabuf protocol
Attachment 338961 [details] pushed as 5d01d3b - waylandsink: Rework dmabuf support

Comment 97 Nicolas Dufresne (ndufresne) 2016-11-03 19:39:57 UTC

Please, create new bugs if you find bugs.