Bug 779146 – dmabuf: be able to negotiate tiled surfaces

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 779146 - dmabuf: be able to negotiate tiled surfaces


Summary:	dmabuf: be able to negotiate tiled surfaces


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gstreamer-vaapi
Version:	git master
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:	797071

Reported:	2017-02-23 17:53 UTC by Scott D Phillips
Modified:	2018-11-03 15:49 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
[PATCH gst-plugins-bad 1/2] allocators: Add GstDmabufDrmModifierMeta (9.12 KB, patch) 2017-03-03 21:18 UTC, Scott D Phillips	none	Details \| Review
[PATCH gst-plugins-bad 2/2] allocators: add meson.build support (1.70 KB, patch) 2017-03-03 21:19 UTC, Scott D Phillips	none	Details \| Review
[PATCH gst-plugins-bad v2 1/2] allocators: Add GstDmabufMeta (8.54 KB, patch) 2017-03-14 19:59 UTC, Scott D Phillips	needs-work	Details \| Review
[PATCH gst-plugins-bad v2 2/2] allocators: add meson.build support (1.73 KB, patch) 2017-03-14 19:59 UTC, Scott D Phillips	reviewed	Details \| Review

Description Scott D Phillips 2017-02-23 17:53:31 UTC

The memory format natively supported by intel-vaapi-driver is Y-tiled NV12 for YUV 420 surfaces. Currently the gstreamer-vaapi dmabuf allocator only allocates non-tiled surfaces which the intel-vaapi-driver needs to then tile before encoding/untile after decoding.

Having some way of negotiating support for tiling would save this unnecessary tiling or untiling step. I would appreciate input on the best way to implement this. Should it be a new video format, like NV12_128Y32? A video format is a little funny to me because this is for dmabuf negotiation only and the tiled plane will never be seen in virtual memory. Should there be some gem-specific dmabuf allocation query parameters?

Comment 1 Nicolas Dufresne (ndufresne) 2017-02-23 18:12:44 UTC

If the dmabuf are not mapable to CPU, I don't think it's worth adding a new format. When I added NV12_64Z32, the dmabufs were all mappable (CMA). Exposing the format was a great way to help with testing, as I could then use videoconvert and videotestsrc to produce data that I could just copy into those DMA buffers. On Exynos 4412, this format came back in many IP blocks, so it was very useful to test each blocks seperatly.

So question is, is this format mapable ? Would having videotestsrc or videoconvert support for this format useful ?

I was thinking that we should expose something like:

  video/x-raw-opaque(memory:DMABuf),format=XYZ

In some cases.

Comment 2 Scott D Phillips 2017-02-23 19:06:56 UTC

(In reply to Nicolas Dufresne (stormer) from comment #1)
> If the dmabuf are not mapable to CPU, I don't think it's worth adding a new
> format. When I added NV12_64Z32, the dmabufs were all mappable (CMA).
> Exposing the format was a great way to help with testing, as I could then
> use videoconvert and videotestsrc to produce data that I could just copy
> into those DMA buffers. On Exynos 4412, this format came back in many IP
> blocks, so it was very useful to test each blocks seperatly.
> 
> So question is, is this format mapable ? Would having videotestsrc or
> videoconvert support for this format useful ?

The format is mappable, but the memory controller makes the memory look linear in virtual memory when you do the VA-API map call. To actually view the memory as tiled you would need to do some side-run around vaapi's map api.

Comment 3 Julien Isorce 2017-02-23 22:18:57 UTC

Hi Scott, do you mean that here https://cgit.freedesktop.org/gstreamer/gstreamer-vaapi/tree/gst/vaapi/gstvaapipluginbase.c#n512 you would like to remove the LINEAR flag  in some cases ? Like the idea in line 552 (note that it is not really implemented yet).
Except that in your case it will be even more tricky, i.e. you would like to be tiled even when not using the same devices for import / export ? (ex: v4l2src ! vaapienc ?)

The general idea line 552 is to say that we want linear when using same devices for import / export or export / import. And tiled if same devices. We could change "same devices" to "compatible devices". And we could also do that with list/map of vendor ids / device ids. Do you think that would work ? And/or do you have another idea how to detect that ?

Comment 4 Scott D Phillips 2017-02-23 23:11:53 UTC

(In reply to Julien Isorce from comment #3)
> Hi Scott, do you mean that here
> https://cgit.freedesktop.org/gstreamer/gstreamer-vaapi/tree/gst/vaapi/
> gstvaapipluginbase.c#n512 you would like to remove the LINEAR flag  in some
> cases ?

Yes, exactly that.

> Like the idea in line 552 (note that it is not really implemented
> yet).
> Except that in your case it will be even more tricky, i.e. you would like to
> be tiled even when not using the same devices for import / export ? (ex:
> v4l2src ! vaapienc ?)

Right, the exact case I have in mind is (icamerasrc ! vaapih264enc) https://github.com/01org/icamerasrc . The icamerasrc uses a separate image processing unit that is capable of supporting the same Y-tiled format. Right now that source element has tiling hardcoded off, but ideally it would enable tiling when sharing dmabuf memory with vaapih264enc and maybe even mesa(?).

> The general idea line 552 is to say that we want linear when using same
> devices for import / export or export / import. And tiled if same devices.
> We could change "same devices" to "compatible devices". And we could also do
> that with list/map of vendor ids / device ids. Do you think that would work
> ? And/or do you have another idea how to detect that ?

Right, I'm not really sure what the best way to do this is. I was thinking something along the lines of custom flags in an allocation query but I don't know what the best way would be end-to-end.

Comment 5 Nicolas Dufresne (ndufresne) 2017-02-24 01:24:39 UTC

OK, so you basically want to negotiate a second format, which is the hardware format. I suppose this format can be described as a DRM 64bit format (with modifiers). If you want to use the allocation query, you'll probably need to set the supported backend formats, when known (linear is assumed if unknown). This way, the information can be read by exporters downstream. Then downstream will intersect with it's on set of backend formats (this will mostly be 1 format), and return back the information.

This way, regardless who's the exporter, both side can make the right decision. This is all doable with allocation_meta(). Which is just a GType + a structure. We could stage this work in plugin-bad allocator library to start with, and when happy, we could eventually move that to the -base allocator library. What we need there is to register the GType and add documentation about the GstStructure layout. Ideally, the formats, which will by definition not be GstVideoFormat, shall have a unique definition. To recap, let's run through two use cases:

v4l2src ! vaapienc

v4l2src will run an allocation query, the GsDmabufBackedFormatMeta will be missing, because V4L2 does not expose this information. VAAPI will receive the empty allocation query in propose_allocation(), and decide to go linear.

icamerasrc ! vaapienc

icamerasrc will set GsDmabufBackedFormatMeta,foramt=INTEL_Y_TILED,LIENEAR. VAAPI will read find this meta, as it supports this tiling, it will leave the meta as-is, and return the filled allocation query. At this point, both elements knows they can operate with tiles.

randomcamsrc ! vaapienc

randomcamsrc will set GsDmabufBackedFormatMeta,foramt=REANDOM_FORMAT,LINEAR. VAAPI will read the meta, and fine that the intersection is LINEAR. VAAPI will update the allocation query, and return.

randomcamsrc2 ! vaapienc

randomcamsrc2 will set GsDmabufBackedFormatMeta,RANDOM_FORMAT,RANDOM_FORMAT_2. VAAPI won't recognize any format, and will remove all of them. Both side will know that it's not worth bothering doing DMABuf.

This might be over-simplistic, and extremely Intel specific, but could serve as a base. It could also be GstDmabufConstraintMeta. And formats could just be one of the parameter (e.g. allocation type like CMA, SG/page-aligned, vmalloc, memory alignement, etc). We are a bit ahead of what the kernel currently expose to usespace though.

Comment 6 Nicolas Dufresne (ndufresne) 2017-02-24 01:25:37 UTC

(We should put extra care, so gst_structure_intersect() can be used, otherwise it's just a pain, just thinking about the rest of the alocation query items)

Comment 7 Julien Isorce 2017-02-24 10:08:29 UTC

I think that Nicolas's suggestion is a very good idea. Let's try to apply it on the following use cases:

vaapidec ! glimagesink doing HW decoding on integrated intel gpu and GL rendering on the integrated gpu too.

vaapidec will set GsDmabufBackedFormatMeta,format=INTEL_Y_TILED,LINEAR. GstGL will read the meta, and fine that the intersection is INTEL_Y_TILED. GstGL will update the allocation query, and return. So that vaapidec will clear the linear flag for the vaapi driver which will work in tiled mode.

vaapidec ! glimagesink doing HW decoding on integrated intel gpu and GL rendering on discrete AMD or NVIDIA gpu.

vaapidec will set GsDmabufBackedFormatMeta,format=INTEL_Y_TILED,LINEAR. GstGL will read the meta, and fine that the intersection is LINEAR. GstGL will update the allocation query, and return. So that vaapidec will set the linear flag for the vaapi driver.

vaapidec ! glimagesink doing HW decoding on discrete NVIDIA/nouveau gpu and GL rendering on the integrated intel gpu.

vaapidec will set GsDmabufBackedFormatMeta,format=NVIDIA_Y_TILED,LINEAR. GstGL will read the meta, and fine that the intersection is LINEAR. GstGL will update the allocation query, and return. So that vaapidec will set the linear flag for the vaapi driver.

And so on. But then I think the format modifiers should also appear in -base:: GstDmaBufAllocator. As there is already gst_dmabuf_memory_get_fd, we could add gst_dmabuf_memory_get_modifier. So that then in GstGL we could pass the modifiers LO/HI to eglCreateImage in gst_egl_image_from_dmabuf. Also see https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_image_dma_buf_import_modifiers.txt .

Extract from drmfourcc.h:

/* Vendor Ids: */
#define DRM_FORMAT_MOD_NONE 0
#define DRM_FORMAT_MOD_VENDOR_INTEL 0x01
#define DRM_FORMAT_MOD_VENDOR_AMD 0x02
#define DRM_FORMAT_MOD_VENDOR_NV 0x03
#define DRM_FORMAT_MOD_VENDOR_SAMSUNG 0x04
#define DRM_FORMAT_MOD_VENDOR_QCOM 0x05
/* add more to the end as needed */

#define fourcc_mod_code(vendor, val) \
((((__u64)DRM_FORMAT_MOD_VENDOR_## vendor) << 56) | (val & 0x00ffffffffffffffULL))

#define I915_FORMAT_MOD_X_TILED fourcc_mod_code(INTEL, 1)
#define I915_FORMAT_MOD_Y_TILED fourcc_mod_code(INTEL, 2)
#define I915_FORMAT_MOD_Yf_TILED fourcc_mod_code(INTEL, 3)
#define DRM_FORMAT_MOD_SAMSUNG_64_32_TILE fourcc_mod_code(SAMSUNG, 1)

I can see there are 3 different way to do tiling for intel: X, Y, Yf. But the vaapi spec (va.h from libva) only defines one flag: VA_SURFACE_EXTBUF_DESC_ENABLE_TILING. But I am guessing this is not really a problem because at importation time it will internally call "libdrm::dri_bo_get_tiling" to retrieve exact internal tiling and swizzle.

Also maybe the name should be GsDmabufModifierMeta instead of GsDmabufBackedFormatMeta ?
Or could we just add a new field "guint64 tiling/format_modifier" to the existing -base::GstVideoMeta ? + eventually a new GstVideoFrameFlags GST_VIDEO_FRAME_FLAG_LINEAR which would indicates that when mapping the dmabuf it will be accessible as linear even if the modifier indicates tiled.

Comment 8 Nicolas Dufresne (ndufresne) 2017-02-24 19:40:27 UTC

I think you have narrowed something here. Basically, there is DRM modifiers that may not be reflected by GstVideoFormat (because there is memory hardware to hide it, or simply that the GstVideoFormat does not matter, the memory isn't mappable). I like GsDmabufModifierMeta, but I wonder if we shouldn't call those GsDmabufDrmModifierMeta for extra clarity. I would not merge this into GstVideo, because it's really an extension, specific to DMABuf and DRM BOs handles. Obviously, the DMABuf handles are the one to be shared between components.

Small step back, I'd probably define an allocation meta as GstDmabufMeta, and make the drm-modifiers one of the field. This way we can extend later, e.g. with the kernel-allocator, or other things like that (if that ever get exposed obviously).

Comment 9 Daniel Stone 2017-02-24 20:06:18 UTC

(In reply to Julien Isorce from comment #7)
> I can see there are 3 different way to do tiling for intel: X, Y, Yf. But
> the vaapi spec (va.h from libva) only defines one flag:
> VA_SURFACE_EXTBUF_DESC_ENABLE_TILING. But I am guessing this is not really a
> problem because at importation time it will internally call
> "libdrm::dri_bo_get_tiling" to retrieve exact internal tiling and swizzle.

The get_tiling() call is very much unsupported, and doesn't even work for some of the newer tiling formats. The replacement is to pass modifiers all along the way - which I suppose would make the libva API insufficient.

> Also maybe the name should be GsDmabufModifierMeta instead of
> GsDmabufBackedFormatMeta ?
> Or could we just add a new field "guint64 tiling/format_modifier" to the
> existing -base::GstVideoMeta ? + eventually a new GstVideoFrameFlags
> GST_VIDEO_FRAME_FLAG_LINEAR which would indicates that when mapping the
> dmabuf it will be accessible as linear even if the modifier indicates tiled.

Please just format_modifier, and it's not just for tiling. Support is coming up for render-compression modes from various hardware vendors, which is either a single buffer in a proprietary compression format, or a normal flat buffer with external compression metadata (thus occupying more space in memory, but needing to read less of it) in another plane.

Comment 10 Scott D Phillips 2017-03-03 21:18:38 UTC

Created attachment 347168 [details] [review]
[PATCH gst-plugins-bad 1/2] allocators: Add GstDmabufDrmModifierMeta

This meta carries the drm modifier value for the plane contained
in attached dmabuf memory. The modifier expresses platform
specific swizzling/tiling/compression/etc and should only be
attached to buffer if the src and sink have both indicated support
for it during the allocation query.

Comment 11 Scott D Phillips 2017-03-03 21:19:00 UTC

Created attachment 347169 [details] [review]
[PATCH gst-plugins-bad 2/2] allocators: add meson.build support

Comment 12 Scott D Phillips 2017-03-03 21:22:05 UTC

Here is a patch to add GstDmabufDrmModifierMeta as discussed. I'll work on adding usage of it into vaapi. Is there a more formal way to document the structure in the allocation query or should there be an intersect function or something so that the structure representation is internal?

Comment 13 Julien Isorce 2017-03-08 00:46:53 UTC

Review of attachment 347168 [details] [review]:

Overall looks good thx! Just a few minor remarks. Also the name should be reviewed by Nicolas too since I am confused with comment #8, see my question in the review.

::: gst-libs/gst/allocators/gstdmabufdrmmodifiermeta.c
@@ +88,3 @@
+
+static void
+gst_dmabuf_drm_modifier_meta_free (GstMeta * sync_meta, GstBuffer * buffer)

What is the "sync_" ?

@@ +89,3 @@
+static void
+gst_dmabuf_drm_modifier_meta_free (GstMeta * sync_meta, GstBuffer * buffer)
+{

Should it set the modifier to 0 /* DRM_FORMAT_MOD_NONE */ for cosmetic reasons ?

@@ +147,3 @@
+        g_strdup_printf
+        ("GstDmabufDrmModifierMeta, dmabuf.drm_modifier=(guint64){ %" PRIu64
+        " };", modifier);

G_GUINT64_FORMAT instead of PRIu64 ?

::: gst-libs/gst/allocators/gstdmabufdrmmodifiermeta.h
@@ +41,3 @@
+  GstMeta parent;
+
+  guint64 modifier;

I think comment #8 says GstDmabufMeta with field drm_modifier but I may have missed something.

Comment 14 Scott D Phillips 2017-03-13 23:20:18 UTC

Ah ok, I think I misunderstood. So this new meta will be GstDmabufMeta and will be used to negotiate the use of dmabuf at all. Also, as a feature on the side, will be the drm modifier list which may be present and may be used to negotiate something other than DRM_FORMAT_MOD_NONE. Does that sound right?

Comment 15 Nicolas Dufresne (ndufresne) 2017-03-14 00:36:20 UTC

More of less, GstDmabufMeta, will first only implement the modifiers. Modifiers are still negotiated through the allocation query, and then set in this meta. Not all modifiers will go through this though, e.g. the Samsung Tile won't, as it already has GST_VIDEO_FORMAT_NV12_64Z32 that matches, and the memory control won't "fix it" if you map it. So the format from userspace point of view is the same as the hardware one.

Why using a generic meta name, well, there is other type of information that would later be attached. One potential subject being discussed, is the Y-FLIP flags (specific to some DMABuf create from GL using specific stack, not sure anything else then Renesas behaves this way ...)

Comment 16 Daniel Stone 2017-03-14 13:36:55 UTC

(In reply to Nicolas Dufresne (stormer) from comment #15)
> More of less, GstDmabufMeta, will first only implement the modifiers.
> Modifiers are still negotiated through the allocation query, and then set in
> this meta. Not all modifiers will go through this though, e.g. the Samsung
> Tile won't, as it already has GST_VIDEO_FORMAT_NV12_64Z32 that matches, and
> the memory control won't "fix it" if you map it. So the format from
> userspace point of view is the same as the hardware one.

Again, this isn't actually true.

For Intel, X-tiled and Y-tiled surfaces _can_ be linearly mapped through the GTT, but do not have to be; there is a resource/performance tradeoff to doing so. From my understanding, Yf- and Ys-tiled _cannot_ be linearly mapped. CCS (render compression) definitely cannot be linearly mapped, at least for write access. AFBC (ARM framebuffer compression) cannot be mapped to userspace in a linear uncompressed view either, at least on hardware I know of. And that's just scraping the surface of available modifiers.

The very specific case of Intel's current VA-API driver with two particular tiling formats does expose a linear view, but this is not and will not be true of even the majority of cases, let alone all.

If you depend on the hardware exposing a view which matches only the format (rather than format+modifier), this API is broken and needs to be fixed before it can be used for anything real.

Comment 17 Nicolas Dufresne (ndufresne) 2017-03-14 15:25:54 UTC

Not sure why you state the Exynos case for stating that this isn't true. The integration choices remains the same, regardless of what your hardware is.

I'm giving the option to:
1. Add a new GST format, like NV12_64Z32
2. Expose the format seen on map() and add the modifiers needed for the rest
3. Disable map() using Dmabuf caps feature, and expose the format + modifiers

Case 1 will enable full GStreamer support, you'll be able to software color convert, possibly even use overlays and stuff. With case 2, you also get full support, with some extra cost when doing map(), and you need to ensure downstream can support that modifier, as the caps won't cover it.

The case 3, well, it's a bit unfortunate, all you'll be able to do is pass it up to another driver. But that driver could be GL with the new extension, so it remains quite flexible.

So what matters here, if you think this is broken design, is to describe what isn't covered. It does not matter to me if the implementation in VA is wrong or not, we'll catch this at review or testing. This of course is just the theoretical basis, a lot of this is currently impossible / not-exposed by the kernel interface (e.g. DRM). In VAAPI, there is hardware specific user-space driver were you can hardcode this knowledge.

Comment 18 Scott D Phillips 2017-03-14 17:01:30 UTC

(In reply to Daniel Stone from comment #16)
> If you depend on the hardware exposing a view which matches only the format
> (rather than format+modifier), this API is broken and needs to be fixed
> before it can be used for anything real.

The fact that a surface can be mapped linear isn't really important here, and the proposed GstDmabufMeta isn't depending on that. It is just a mechanism to negotiate the mutual support of some vendor-specific modified format between elements.

(In reply to Nicolas Dufresne (stormer) from comment #17)
> I'm giving the option to:
> 1. Add a new GST format, like NV12_64Z32

In the case of NV12 + I915_TILING_Y, I don't think adding it as a GstVideoFormat would be practical (ignoring the fact that it can be mapped linearly). The specifics of that tiling format vary by processor generation and even hardware configuration (some address bits get swizzled when multiple memory channels are present to balance accesses).

Comment 19 Nicolas Dufresne (ndufresne) 2017-03-14 17:47:26 UTC

(In reply to Scott D Phillips from comment #18)
> The fact that a surface can be mapped linear isn't really important here,
> and the proposed GstDmabufMeta isn't depending on that. It is just a
> mechanism to negotiate the mutual support of some vendor-specific modified
> format between elements.

Well, you need to negotiate the caps before the allocation (hence the modifiers). If you don't have DMABuf caps feature in your caps, you cannot do 3. Any software in-place filters will prevent doing DMABuf caps feature here. Now, if producing linear dmabuf cost the same as using a special map, there is really no point looking at that, we don't need two equivalent fallback here.

> 
> (In reply to Nicolas Dufresne (stormer) from comment #17)
> > I'm giving the option to:
> > 1. Add a new GST format, like NV12_64Z32
> 
> In the case of NV12 + I915_TILING_Y, I don't think adding it as a
> GstVideoFormat would be practical (ignoring the fact that it can be mapped
> linearly). The specifics of that tiling format vary by processor generation
> and even hardware configuration (some address bits get swizzled when
> multiple memory channels are present to balance accesses).

Good to know, though the options are not mutually exclusive. I picked 1. for the Exynos, but it does not apply to Intel titling. The current approach used in VAAPI is 1, but with existing color formats (linear formats).

I think the simplest for now it to ignore 2. and implement 3. So basically you keep it linear like it is now by default, and try to negotiate the tiling if downstream has the DMABuf caps feature in it's caps. The negotiation of the modifiers remains unaffected, but you need to negotiate the caps first to get there anyway.

Comment 20 Nicolas Dufresne (ndufresne) 2017-03-14 17:55:55 UTC

In real life, a pipeline like:

... ! vaapidecode ! textoverlay ! vaapiencode ! ..

Will use linear memory backend. So still not copying, but will pay the overhead. While if you do:

... ! vaapidecode ! vaapiencode ! ...

You'll negotiate DMABuf caps feature + the modifiers supported. Modififers negotiation could eventually be added to glupload/download, waylandsink and kmssink, but that seems far in the future though.

Comment 21 Scott D Phillips 2017-03-14 18:38:30 UTC

(In reply to Nicolas Dufresne (stormer) from comment #20)
> In real life, a pipeline like:
> 
> ... ! vaapidecode ! textoverlay ! vaapiencode ! ..
> 
> Will use linear memory backend. So still not copying, but will pay the
> overhead. While if you do:

Well, with gstreamer-vaapi + intel-vaapi-driver as it exists today, the only situation where linear surfaces are actually created is with dmabuf. In all the non-dmabuf cases gstreamer-vaapi doesn't specify tiled or linear in allocations and the intel-vaapi-driver chooses tiled. The reason is that the intel encode and decode hardware units are able to operate on tiled surfaces only. When a linear surface is given somehow then it will either by tiled by the cpu (vaDeriveImage + memcpy) or the gpu (vaPutImage), but someone needs to pay the tiling cost before encode can happen.

Comment 22 Scott D Phillips 2017-03-14 19:59:03 UTC

Created attachment 347945 [details] [review]
[PATCH gst-plugins-bad v2 1/2] allocators: Add GstDmabufMeta

Comment 23 Scott D Phillips 2017-03-14 19:59:36 UTC

Created attachment 347946 [details] [review]
[PATCH gst-plugins-bad v2 2/2] allocators: add meson.build support

Comment 24 Julien Isorce 2017-03-14 22:18:10 UTC

Thx everyone's inputs and thx Scott for the patches and its update. I think it is safe to push it since it is quite generic and will go in -bad for now. 
If it is more convenient maybe keep this same bug for the gstreamer-vaapi usage of this meta.

Comment 25 Nicolas Dufresne (ndufresne) 2017-03-15 00:29:32 UTC

Comment on attachment 347945 [details] [review]
[PATCH gst-plugins-bad v2 1/2] allocators: Add GstDmabufMeta

Not yet, we need something using it first. Never commit to a new API without something using it first please.

Comment 26 Nicolas Dufresne (ndufresne) 2017-03-15 00:40:05 UTC

Review of attachment 347945 [details] [review]:

::: gst-libs/gst/allocators/gstdmabufmeta.c
@@ +25,3 @@
+ *
+ * #GstDmabufMeta carries metadata that goes along with
+ * dmabuf memory in the buffer, like drm modifier.

Document the negotiation process.

@@ +133,3 @@
+
+void
+gst_query_add_allocation_dmabuf_meta (GstQuery * query, guint64 drm_modifier)

This is confusing name, does it add the allocation meta, or it add a modifier. To be honnest, I have never see a meta working like this. Why not simply pass an array of guint64 here ? Also, undocumented.

::: gst-libs/gst/allocators/gstdmabufmeta.h
@@ +41,3 @@
+  GstMeta parent;
+
+  guint64 drm_modifier;

Modifiers are applied per plane in other API. This must be incorrect.

@@ +44,3 @@
+};
+
+GST_EXPORT

We don't currently use these, please remove until further notice.

Comment 27 Nicolas Dufresne (ndufresne) 2017-03-15 00:45:38 UTC

For the reference on how modifiers actually works, looks at /usr/include/drm/drm_mode.h struct drm_mode_fb_cmd2.

Comment 28 Scott D Phillips 2017-03-15 17:11:52 UTC

(In reply to Nicolas Dufresne (stormer) from comment #26)
> Review of attachment 347945 [details] [review] [review]:
> 
> ::: gst-libs/gst/allocators/gstdmabufmeta.c
> @@ +25,3 @@
> + *
> + * #GstDmabufMeta carries metadata that goes along with
> + * dmabuf memory in the buffer, like drm modifier.
> 
> Document the negotiation process.
> 
> @@ +133,3 @@
> +
> +void
> +gst_query_add_allocation_dmabuf_meta (GstQuery * query, guint64
> drm_modifier)
> 
> This is confusing name, does it add the allocation meta, or it add a
> modifier. To be honnest, I have never see a meta working like this. Why not
> simply pass an array of guint64 here ? Also, undocumented.

Sure, I can change it to an array. I suppose it will be a length and then an array pointer, because there is no possible sentinel value for the end of an array of modifiers (as zero is valid and not necessarily the last choice in everyone's preference list). But yes, I can see that being more ergonomic than what is there currently.

> ::: gst-libs/gst/allocators/gstdmabufmeta.h
> @@ +41,3 @@
> +  GstMeta parent;
> +
> +  guint64 drm_modifier;
> 
> Modifiers are applied per plane in other API. This must be incorrect.

Which API? There is just a single modifier with ADDFB2. From drm_mode.h:

> Note that even though it looks like we have a modifier
> per-plane, we in fact do not. The modifier for each plane must
> be identical. Thus all combinations of different data layouts
> for multi plane formats must be enumerated as separate
> modifiers.

> @@ +44,3 @@
> +};
> +
> +GST_EXPORT
> 
> We don't currently use these, please remove until further notice.

Comment 29 Nicolas Dufresne (ndufresne) 2017-03-15 17:46:37 UTC

(In reply to Scott D Phillips from comment #28)
> (In reply to Nicolas Dufresne (stormer) from comment #26)
> > Review of attachment 347945 [details] [review] [review] [review]:
> > 
> > ::: gst-libs/gst/allocators/gstdmabufmeta.c
> > @@ +25,3 @@
> > + *
> > + * #GstDmabufMeta carries metadata that goes along with
> > + * dmabuf memory in the buffer, like drm modifier.
> > 
> > Document the negotiation process.
> > 
> > @@ +133,3 @@
> > +
> > +void
> > +gst_query_add_allocation_dmabuf_meta (GstQuery * query, guint64
> > drm_modifier)
> > 
> > This is confusing name, does it add the allocation meta, or it add a
> > modifier. To be honnest, I have never see a meta working like this. Why not
> > simply pass an array of guint64 here ? Also, undocumented.
> 
> Sure, I can change it to an array. I suppose it will be a length and then an
> array pointer, because there is no possible sentinel value for the end of an
> array of modifiers (as zero is valid and not necessarily the last choice in
> everyone's preference list). But yes, I can see that being more ergonomic
> than what is there currently.

As I already mention, use GstStructure types, so you can then gst_structure_intersect(). We have everything needed.

> 
> > ::: gst-libs/gst/allocators/gstdmabufmeta.h
> > @@ +41,3 @@
> > +  GstMeta parent;
> > +
> > +  guint64 drm_modifier;
> > 
> > Modifiers are applied per plane in other API. This must be incorrect.
> 
> Which API? There is just a single modifier with ADDFB2. From drm_mode.h:

Modifiers are per-plane, as per the Kernel API. This also need to be reflected in the negotiations stuff (you'll notice the value 4, use GST_VIDEO_MAX_PLANES in gst though).

http://lxr.free-electrons.com/source/include/uapi/drm/drm_mode.h#L426

Comment 30 Scott D Phillips 2017-03-15 17:49:54 UTC

(In reply to Nicolas Dufresne (stormer) from comment #29)
> > > ::: gst-libs/gst/allocators/gstdmabufmeta.h
> > > @@ +41,3 @@
> > > +  GstMeta parent;
> > > +
> > > +  guint64 drm_modifier;
> > > 
> > > Modifiers are applied per plane in other API. This must be incorrect.
> > 
> > Which API? There is just a single modifier with ADDFB2. From drm_mode.h:
> 
> Modifiers are per-plane, as per the Kernel API. This also need to be
> reflected in the negotiations stuff (you'll notice the value 4, use
> GST_VIDEO_MAX_PLANES in gst though).
> 
> http://lxr.free-electrons.com/source/include/uapi/drm/drm_mode.h#L426

There is a single modifier for the ADDFB2 command, even though there are 4 modifier fields:

http://lxr.free-electrons.com/source/include/uapi/drm/drm_mode.h#L416

Comment 31 Nicolas Dufresne (ndufresne) 2017-03-15 17:53:25 UTC

(In reply to Scott D Phillips from comment #28)
> (In reply to Nicolas Dufresne (stormer) from comment #26)
> > Review of attachment 347945 [details] [review] [review] [review]:
> > 
> > ::: gst-libs/gst/allocators/gstdmabufmeta.c
> > @@ +25,3 @@
> > + *
> > + * #GstDmabufMeta carries metadata that goes along with
> > + * dmabuf memory in the buffer, like drm modifier.
> > 
> > Document the negotiation process.
> > 
> > @@ +133,3 @@
> > +
> > +void
> > +gst_query_add_allocation_dmabuf_meta (GstQuery * query, guint64
> > drm_modifier)
> > 
> > This is confusing name, does it add the allocation meta, or it add a
> > modifier. To be honnest, I have never see a meta working like this. Why not
> > simply pass an array of guint64 here ? Also, undocumented.
> 
> Sure, I can change it to an array. I suppose it will be a length and then an
> array pointer, because there is no possible sentinel value for the end of an
> array of modifiers (as zero is valid and not necessarily the last choice in
> everyone's preference list). But yes, I can see that being more ergonomic
> than what is there currently.

As I already mention, use GstStructure types, so you can then gst_structure_intersect().

> 
> > ::: gst-libs/gst/allocators/gstdmabufmeta.h
> > @@ +41,3 @@
> > +  GstMeta parent;
> > +
> > +  guint64 drm_modifier;
> > 
> > Modifiers are applied per plane in other API. This must be incorrect.
> 
> Which API? There is just a single modifier with ADDFB2. From drm_mode.h:

Modifiers are per-plane, as per the Kernel API. This also need to be reflected in the negotiations stuff (you'll notice the value 4, but in gst use GST_VIDEO_MAX_PLANES).

http://lxr.free-electrons.com/source/include/uapi/drm/drm_mode.h#L426

Comment 32 Daniel Stone 2017-03-15 17:55:32 UTC

Check the bottom of the comment linked there: though there are per-plane modifier fields, the kernel _requires_ them to be identical for all planes (cf. the addfb2 ioctl handler).

Comment 33 Nicolas Dufresne (ndufresne) 2017-03-15 17:57:57 UTC

Ok, mid-air collision. I just saw the note, ok then, single modifier. I'll let you fix the rest. You should add a unit test to verify intersection works, we don't don't need to manually intersect the results.

Comment 34 Julien Isorce 2017-05-21 16:44:49 UTC

Hi Scott, any update on this ? Also do you have a patch to actually use this meta in gstreamer-vaapi ? and GstGL ?

Comment 35 Scott D Phillips 2017-05-22 16:48:23 UTC

(In reply to Julien Isorce from comment #34)
> Hi Scott, any update on this ? Also do you have a patch to actually use this
> meta in gstreamer-vaapi ? and GstGL ?

Per daniels's comment about libva as-is being insufficient, I've made some patches to libva and intel-vaapi-driver:

https://github.com/scott-d-phillips/libva/commit/52e006fe0126d8535d88d045872307d8e332386f#diff-ce40de7930eb5bb0c9f5801e5101ce29

https://github.com/scott-d-phillips/intel-vaapi-driver/commit/5aebbb3043cda3626f62698f03d112a987eccc64

I haven't yet proposed these on the libva mailing list, I'm waiting for the patches which add support for the modifiers to land in i915 and mesa. My plan is to pick this back up when those patches are merged so we can have an end-to-end working usecase and we know the modifier negotiation is good.

Comment 36 Daniel Stone 2017-05-22 18:12:57 UTC

Wow cool, thanks for doing that! The only comment I'd have is that it'd be nice to have a creation API similar to gbm_surface_create_with_modifiers() or VkExportImageDmaBufInfoMESAX (passed at creation time, rather than afterwards). Both of those specify a list of acceptable modifiers the implementation can choose from, which is nice for interop. But aside from that, looks great!

Comment 37 Scott D Phillips 2017-05-22 20:40:49 UTC

Thanks for taking a look Daniel,

(In reply to Daniel Stone from comment #36)
> Wow cool, thanks for doing that! The only comment I'd have is that it'd be
> nice to have a creation API similar to gbm_surface_create_with_modifiers()
> or VkExportImageDmaBufInfoMESAX (passed at creation time, rather than
> afterwards). Both of those specify a list of acceptable modifiers the
> implementation can choose from, which is nice for interop. But aside from
> that, looks great!

that sounds good, I'll add a way to do that. And I guess libva will also need something akin to eglQueryDmaBufModifiersEXT so we can do the negotiation flow Nicolas suggests.

Comment 38 Víctor Manuel Jáquez Leal 2017-05-26 13:25:59 UTC

(In reply to Scott D Phillips from comment #35)

> I haven't yet proposed these on the libva mailing list, I'm waiting for the
> patches which add support for the modifiers to land in i915 and mesa. My
> plan is to pick this back up when those patches are merged so we can have an
> end-to-end working usecase and we know the modifier negotiation is good.

Cool! I'm waiting for them!! :)

Comment 39 kevin 2017-07-03 06:35:34 UTC

Do you have any update? We also need this for i.MX8 tiled and compressed video frame buffer negotiation.

Comment 40 haihao.xiang 2017-09-19 03:11:02 UTC

Mark provides a new interface in libva to offer surface metadata e.g. tiling to the 3rd party library, could you have a look?

https://github.com/01org/libva/pull/85

Comment 41 Olivier Crête 2018-01-18 18:29:05 UTC

For all formats that are not linear and not the Samsung 64Z32 one, content of the memory is pretty much opaque. And using capsfeatures & a custom meta makes negotiation debugging a pain. I think it would be easier to just add a new caps like "video/x-drm, format=(string)fourcc-from-drm_fourcc.h, modifier=(uint64), width=, height=, framerate=".. Using video/x-raw means that we would want to implement support for them in GstVideoInfo, but that's pretty much pointless if the content is otherwise opaque.

Comment 42 Nicolas Dufresne (ndufresne) 2018-01-18 20:00:49 UTC

That's also an option, but it's also very under specified. One need to cary the plane layout information, this is still a bit fuzzy, since it many cases the driver speaks in strides and offsets, but strides is under specified for tiled or compressed formats. This information is needed in order to get two drivers to actually agree on what the buffer contains.

Comment 43 Daniel Stone 2018-01-18 21:44:28 UTC

Stride isn't underspecified, it's an integral part of the format modifier definition. You can have tiled surfaces with wider allocations than the display width would suggest, just as with linear.

It's true that the meaning of stride is a lot more complex than linear, and this often causes discussion whilst the modifier token is being defined. But once it's locked down, it's very well defined, and essential.

Comment 44 Nicolas Dufresne (ndufresne) 2018-01-18 22:01:27 UTC

Would be nice to point to documentation that explains what the strides means, the state of stride is terrible right now in DRM, here's what we implement in Gst:

  - If dumb buffer, strides means bit-per-pixel * width / 8
  - Otherwise it's the stride of the first plane
  - If it's tiled, it will follow alignment and rules, and represent the stride if the the image was made linear

I don't want to carry this horror in Gst. I'm sure random driver create random rules they prefer to stick with in a creative way.

The problem is that it's not clear that for N plane we have N strides, the content of the stride is not our problem of course, if drivers don't agree (and that will happen all the time of course) it's driver bug, as long as we can pass this information, but for this we need to store it some not so nasty structure.

Comment 45 Daniel Stone 2018-01-19 08:19:22 UTC

What about 'the state of stride' is terrible? For linear buffers, stride is the number of bytes between (x,y) and (x,y+1).

As devices often have hard requirements on stride alignment, when you use an allocator you often do not get to specify stride yourself. For dumb buffers, there is a field in the ioctl which is filled out (along with the buffer handle) with the stride value you must use for a given width.

As you've noted, this applies separately to each plane of a multi-plane image. Stride being a buffer-storage property, this makes sense. Consider the definition of the I915_FORMAT_MOD_Y_CCS modifier, where one plane is 32bpp with supertiling, meaning you need quite a bit of padding. The other plane represents 128 pixels horizontally in a _single byte_. You could in theory probably have a single stride value for the overall combined image, but this would require overallocation of the auxiliary plane by a factor of 512.

Random drivers don't create random rules, they don't disagree on what stride means, and they're not desperate to create some random avant-garde interpretation of it. They _do_ have their own hard requirements on how stride must be aligned, which comes very directly from their hardware. If you ever do find inconsistencies between drivers or random values, or anything else which makes you think GPU drivers treat stride as a PRNG, then please let me know.

Tiling _is_ more difficult to think about, especially supertiling. But the definition above holds; it might be easier to invert what you said about converting to linear, and instead just imagine linear as a 1x1 tiling format. In that case, the rule becomes 'from (x,y) to (x,y+TH), where TH is the greatest/outer tile height in the format'.

This isn't inconsistent or even an issue that should bother GSt. If you know the tiling format, then you know how to lay the data out including stride (as with NV12MT). If you don't, then you don't ever see the data, and stride is another piece of data to transfer without needing to inspect. If you have a pipeline involving conversions from one layout to another (e.g. linear for software processing, converted to tiled on GPU upload), then you will have differing strides at different points in your pipeline, which is fine: the buffers you're addressing have fundamentally different properties, and the two strides are no more comparable than the buffers themselves.

Comment 46 Nicolas Dufresne (ndufresne) 2018-01-19 12:13:14 UTC

The mains problem remains the stride and other memory related. The thing that comes out is that it's given back at allocation time. With this proposal, there is no possible to fallback because GST does not have the knowledge. So everything has to be in the caps (stride, size, alignment, number of memory segment, etc) if you want any guarantee that the buffer exchange to work. Allocating for every format to get the info make no sense and asking for API will just never happen. So how do we make it work, not just sometimes and by chance ?

Comment 47 Daniel Stone 2018-01-23 17:56:44 UTC

(In reply to Nicolas Dufresne (stormer) from comment #46)
> The mains problem remains the stride and other memory related. The thing
> that comes out is that it's given back at allocation time. With this
> proposal, there is no possible to fallback because GST does not have the
> knowledge. So everything has to be in the caps (stride, size, alignment,
> number of memory segment, etc) if you want any guarantee that the buffer
> exchange to work. Allocating for every format to get the info make no sense
> and asking for API will just never happen. So how do we make it work, not
> just sometimes and by chance ?

Sorry, I don't really understand the question: what is not working today, that you would like to 'make work'?

If you're looking for a generic formula which will let you calculate the stride for some given configuration in a totally generic way, no, we just don't have that. (Though the work on the 'allocator' project is intended to provide this to client applications, provided you can reason about producer/consumer devices when allocating.)

Display _hardware_ (again, this is not an arbitrary rule invented for fun in drivers because hate our users - this is literally what the hardware demands) requires varying stride alignments, depending on all of format, usage, vendor, generation, etc. Often 32px is fine if it's linear. But sometimes it's 512px, or 512 bytes. If you want accelerated rotation on OMAP, it's a fixed 4096 bytes. But try not to be too generous with over-allocation, because everyone has stride _limits_, which can be surprisingly low.

Obviously these requirements and restrictions change with the modifier, as you discovered with NV12MT where you need to factor tile size into account as well as alignment requirements.

The model that KMS, Wayland, EGL, Vulkan, X11 DRI3, etc are built around, is that stride is determined (for each individual plane) in a device-specific way by whoever performs the allocation, attached to the plane, and passed to any consumers of the plane. Unfortunately GStreamer seems to have created stride exactly backwards - specific stride demanded by consumer which allocator must respect; attached to the logical image/framebuffer rather than each individual plane memory allocation - and that is unfortunate, but we can't change our model because it just won't work on hardware if we do.

I'm sorry I don't have better news for you, but hardware does not allow us to tell you up front what stride will be, and we do not (yet) have a library with the algorithms for every known piece of hardware which you can interrogate.

Comment 48 Nicolas Dufresne (ndufresne) 2018-01-23 19:27:32 UTC

(In reply to Daniel Stone from comment #47)
> The model that KMS, Wayland, EGL, Vulkan, X11 DRI3, etc are built around, is
> that stride is determined (for each individual plane) in a device-specific
> way by whoever performs the allocation, attached to the plane, and passed to
> any consumers of the plane. 

In practice, during kmssink dev we found that most display driver does not gives this liberty to the allocating party, resulting in hard failure (or miss-rendering) if the expected stride and alignment is not met (also type/location of physical memory, but that is just invisible to userpace, so I keep it quite and hope that we have sys/iommus).

> Unfortunately GStreamer seems to have created
> stride exactly backwards - specific stride demanded by consumer which
> allocator must respect; attached to the logical image/framebuffer rather
> than each individual plane memory allocation - and that is unfortunate, but
> we can't change our model because it just won't work on hardware if we do.

This is a miss-interpretation of what GStreamer do with strides, and a miss-understanding of the problems we are facing in user space today. The problem with stride passing is that some (call them legacy) driver interface provides 1 stride, regardless of the number of planes, and other passes (or receive) multiple strides. This duality affects both DRM and V4L2 interface, and probably other driver interface I'm not familiar with. So far, we handle this in userpace because we know the format well enough that we can extrapolate the other strides values, and that makes things works a lot more. GStreamer is also stride/offset/alignment aware for a reason, it implements a large set of software filters. These software filters are not always too slow, an example, STM used the software renderers for sub-titles instead of dedicated plane. That saved previous planes that could then be used by their compositor for other tasks. But for that part, Olivier's protocol make sense, as these format are not well known, they should not be considered raw formats in the GStreamer sense.

When this interpolation is not possible or uncertain (requires too much guessing), we end up with a support that can fail or miss-render on one system and works on another. This is just a sample of the possible issues we keep facing. If we just enable all this, it will fail in application it didn't fail before.

Comment 49 Daniel Stone 2018-01-24 10:05:22 UTC

(In reply to Nicolas Dufresne (stormer) from comment #48)
> (In reply to Daniel Stone from comment #47)
> > The model that KMS, Wayland, EGL, Vulkan, X11 DRI3, etc are built around, is
> > that stride is determined (for each individual plane) in a device-specific
> > way by whoever performs the allocation, attached to the plane, and passed to
> > any consumers of the plane. 
> 
> In practice, during kmssink dev we found that most display driver does not
> gives this liberty to the allocating party, resulting in hard failure (or
> miss-rendering) if the expected stride and alignment is not met (also
> type/location of physical memory, but that is just invisible to userpace, so
> I keep it quite and hope that we have sys/iommus).

If you're using dumb buffers, then you do not get to even try to specify the stride: stride is an out-value only, along with the handle. For other buffers, the drivers I've worked with fail when required stride alignment is not met. If you find drivers which accept buffers with particular stride values but fail to display them correctly, please do file bugs.

> > Unfortunately GStreamer seems to have created
> > stride exactly backwards - specific stride demanded by consumer which
> > allocator must respect; attached to the logical image/framebuffer rather
> > than each individual plane memory allocation - and that is unfortunate, but
> > we can't change our model because it just won't work on hardware if we do.
> 
> This is a miss-interpretation of what GStreamer do with strides, and a
> miss-understanding of the problems we are facing in user space today. The
> problem with stride passing is that some (call them legacy) driver interface
> provides 1 stride, regardless of the number of planes, and other passes (or
> receive) multiple strides. This duality affects both DRM and V4L2 interface,
> and probably other driver interface I'm not familiar with.

Where does it affect it with DRM? When creating dumb buffers in DRM, there is a single stride and a single handle: dumb buffers only support single-planar formats, and only linear (no modifiers). This is by design of dumb buffers. drmModeAddFB takes a single stride token, but again, it only supports single-planar formats. drmModeAddFB2 and drmModeAddFB2WithModifiers take per-plane stride/offset/handle.

Wayland's dmabuf interface has per-plane stride/offset/fd, as do the EGL and Vulkan dmabuf import interfaces. So I don't know of anywhere in the graphics world where stride is magically inferred from one plane to the other, except for X11 Xv.

> So far, we handle
> this in userpace because we know the format well enough that we can
> extrapolate the other strides values, and that makes things works a lot
> more. GStreamer is also stride/offset/alignment aware for a reason, it
> implements a large set of software filters. These software filters are not
> always too slow, an example, STM used the software renderers for sub-titles
> instead of dedicated plane. That saved previous planes that could then be
> used by their compositor for other tasks. But for that part, Olivier's
> protocol make sense, as these format are not well known, they should not be
> considered raw formats in the GStreamer sense.

That makes sense, again as long as you a) ensure the buffers are linear, or b) implement detiling/decompression for whatever modifiers you receive.

> When this interpolation is not possible or uncertain (requires too much
> guessing), we end up with a support that can fail or miss-render on one
> system and works on another. This is just a sample of the possible issues we
> keep facing. If we just enable all this, it will fail in application it
> didn't fail before.

I still don't really understand the problem. Is it just that some unnamed KMS drivers display the wrong content when stride alignment does not meet their expectations?

Comment 50 Daniel Stone 2018-01-24 10:08:58 UTC

> This is just a sample of the possible issues we keep facing.

Also, I would really like to know, quite specifically, what these issues are. I've tried to guess a few times at what you mean (e.g. by 'the state of stride') and am still clueless.

If you let me know then either we can fix them, or fix the usage where the latter is incorrect - for instance, where you say 'If dumb buffer, strides means bit-per-pixel * width / 8', then as above, this is incorrect because the driver _returns_ an aligned stride value from the ioctl, along with the handle. This is somewhat documented in the ioctl: https://cgit.freedesktop.org/drm/drm-tip/tree/include/uapi/drm/drm_mode.h?h=drm-tip#n667

Can you please enumerate somewhere (it doesn't have to be here), a list of the problems you see, rather than just saying that the entire graphics stack is completely and hopelessly broken, as if it were a known issue? :\

Comment 51 Nicolas Dufresne (ndufresne) 2018-01-24 13:01:07 UTC

The single stride is a recurring issue with V4L2, but also UVC and other standard to which the planar formats were added after the fact. One thing to note, GStreamer, outside vaapi, is rarely involved in DRM to DRM memory sharing. Instead we deal with cross kernel domain sharing, and the is little talk among devs there. Hopefully padovan nomination will help long term. Another issue is the cache coherency, e.g. shall we use the dma sync API, when, why. Which case is driver bug ? I'm thinking about the visual artifact when you render UVC or vivid buffer to DRM display. I've reported this a long time ago, and I still have no clue who's faut, what need fixing.

Comment 52 Nicolas Dufresne (ndufresne) 2018-01-24 13:08:23 UTC

(In reply to Daniel Stone from comment #50)
> Can you please enumerate somewhere (it doesn't have to be here), a list of
> the problems you see, rather than just saying that the entire graphics stack
> is completely and hopelessly broken, as if it were a known issue? :\

I would like to remind that this is not about the graphic stack. Again, DRM to DRM is a tiny little aspect here. This bug is about a proposal on how to exchange unknown pixel formats across drivers, what information need to be carried and were (caps, meta, etc). Its also about how to avoid the traps that will make the pipeline fail in an unrecoverable way, specially in cases it worked before using a slower path.

Comment 53 Nicolas Dufresne (ndufresne) 2018-02-23 02:32:48 UTC

I've just re-read that thread, again, and am still not convinced witch way will do.

The meta way, which will likely end up as GstDrmMeta at this pace, has a lot of drawback. It inherit the GstVideoInfo/Format and GstVideoMeta for cases where the format won't be specified inside GStreamer. Of course some of the compatible fieds could be reused, like strides, but a lot is not. Also, the formats will be negotiated based on the assumption driver always support a DRM fourcc with and without the modifiers. Strides are also not negotiated in a context where this cannot fallback to video_frame_copy() slow path.

On the other side, the video/x-opaque-drm caps negotiation would better negotiate the format, but exposing the stride to further guaranty that negotiation is a success does not seem something available API wise. In that scenario though, GstDrmMeta becomes a clear subset of information passed blindly to driver (which is what we want) that is sufficient for the exchange.

In the end, I think the modifiers need to be in the caps, which give a plus to Olivier's proposal. But we also need to signal the usual stuff like colorimetry, chroma-siting, 3d layout, framerate, par, etc, which need to be parsed from the caps, for which reusing the VideoInfo code would be nice. So we need an in-between. But I'm quite clear now that the modifiers must go in the caps.

Then for fallback, In think the answer is renegotiation. By dropping the non-working buffers, and redoing the nego without the format/modifiers that just failed. This is a lot of non-trivial work though, specially for decoders. But for vaapi, it's the vpp that would handle that, the decoders can stay inflexible.Thanks for taking the time to report this.
However, you are using a version that is too old and not supported anymore by GNOME developers. GNOME developers are no longer working on that version, so unfortunately there will not be any bug fixes by GNOME developers for the version that you use.

By upgrading to a newer version of GNOME you could receive bug fixes and new functionality. You may need to upgrade your Linux distribution to obtain a newer version of GNOME.

Please feel free to reopen this bug report if the problem still occurs with a recent version of GNOME, or feel free to report this bug in the bug tracking system of your Linux distribution if your distribution still supports the version that you are using.

Comment 54 Nicolas Dufresne (ndufresne) 2018-02-23 02:35:12 UTC

Sorry, I have just hit the obsolete quick link by accident.

Comment 55 Víctor Manuel Jáquez Leal 2018-09-06 12:12:24 UTC

Let me update the current status on this matter, as far as I know:

After Scott's efforts, Mark jumped in and finally merged in libva a new API for exporting DRM objects: vaExportSurfaceHandle() https://github.com/intel/libva/pull/125

Here's a simple example of using that API: http://ixia.jkqxz.net/~mrt/kms_va_nv12.c

There's an implementation for the intel-vaapi-driver: https://github.com/intel/intel-vaapi-driver/pull/278

for AMD Mesa's state tracker
https://lists.freedesktop.org/archives/mesa-dev/2017-September/169953.html

and for media-driver (MSDK)
https://github.com/intel/media-driver/pull/177

We have to keep in mind that not all the drivers might implement it (libva-v4l2-request? for example)

When I was trying to understand the issue, my first thought was to write a vaapidownload element which would transform the memory:VASurface buffers into dmabuf (and later perhaps others), but we would face the problem of auto-connecting it, except to adding it into vaapodecodebin.

Comment 56 Nicolas Dufresne (ndufresne) 2018-09-06 12:24:17 UTC

(In reply to Víctor Manuel Jáquez Leal from comment #55)
> We have to keep in mind that not all the drivers might implement it
> (libva-v4l2-request? for example)

Well, this one is brand new and does not work for anything else then Kodi at the moment, so I would not worry too much. Note that implementing PRIME_2 for V4L2 driver is a bit weird.

The other aspect is that anything else then vaExportSurfaceHandle() is just unsafe to export, because crucial information is missing. So long term, we should not support anything else in regard to DMABuf exportation.

The main issue is that we don't get to chose the titling and the compression, we just read it back after allocating a surface. With the caps feature, we would need to allocate at caps negotiation, rather then allocation query.

We also implement a dmabuf map check, thinking that a dmabuf may or may not be mappable. I'm still to find a non-mappable dmabuf in any upstream driver.

Comment 57 Daniel Stone 2018-09-17 19:06:09 UTC

> We also implement a dmabuf map check, thinking that a dmabuf may or may not be mappable. I'm still to find a non-mappable dmabuf in any upstream driver.

dmabufs you come across in the wild will generally be mappable. There are explicit ioctls to manage CPU <-> device cache coherency: https://cgit.freedesktop.org/drm/drm-tip/tree/include/uapi/linux/dma-buf.h#n25

Comment 58 Nicolas Dufresne (ndufresne) 2018-09-17 19:26:20 UTC

DMABuf syncrhonization has beed added already, see:

https://cgit.freedesktop.org/gstreamer/gst-plugins-base/commit/?id=8ee306eb3fb69c34c579874f6aedcecb6d94a855

Comment 59 GStreamer system administrator 2018-11-03 15:49:03 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gstreamer-vaapi/issues/49.