GNOME Bugzilla – Bug 751605
audio: Add proper support for non-interleaved / planar audio
Last modified: 2018-07-03 13:24:03 UTC
This is related to bug #705986. Currently we have a field in the caps for that, however that's not enough. Similar to planar video formats, there can be alignment constraints for each "plane" / channel. So that for each channel you e.g. allocate GST_ROUND_UP_16(samples*bpf). This sounds like we need to add a GstAudioNonInterleavedMeta. This should contain an array of channel offsets at least and the number of samples in this buffer. Which should be enough to support the two relevant cases: 1) one big memory with channels at some offset in there, 2) one memory per channel, according to the buffer each channel directly coming after the previous but actually in different memory areas. In theory we could go without the meta by always doing things like in case 2), and if necessary wrapping multiple GstMemory around a single memory area. However that would be less efficient. If the meta is not available, we should assume that one channel comes directly after the other without any padding/alignment. Any comments, ideas, ...?
The padding a requirement of libav and not the codec in use, right?
Well, libav is a codec library and their codecs do that, so not sure what to answer ;) I think it could also be a requirement of other codecs, and it makes sense. If the start of each channel is apropriately aligned, you can directly run SIMD operations on each channel separately.
Adding Wim since he's been working in that area recently.
Hi, I've recently started working on this. Following some discussions I had with Olivier and Nicolas, I am working on the approach of having a GstAudioMeta (equivalent to GstVideoMeta for video) that lists the offsets of the various planes. Additionally, we are proposing that this meta will be strictly required for non-interleaved audio, to avoid the confusion that exists with GstVideoMeta being optional. My working branch can be found at https://gitlab.collabora.com/gkiagia/gst-plugins-base/commits/plannar_audio I'd appreciate any comments you might have on the design of GstAudioMeta.
Making the meta mandatory is problematic as various elements can already handle non-interleaved audio. Without meta, it has to be tightly packed. Also why (multiple) offsets for the meta instead of a single "stride"?
(In reply to Sebastian Dröge (slomo) from comment #5) > Making the meta mandatory is problematic as various elements can already > handle non-interleaved audio. Without meta, it has to be tightly packed. Is that really true? Do they really handle non-interleaved audio? Is it tested? It seemed to me that they advertise it, but nobody really supports it. For example, audioresample advertises it, but it actually crashes if you negotiate it with non-interleaved. A reason for making it mandatory is that there is API that won't be able to handle non-interleaved audio otherwise, like gst_audio_buffer_clip() (the arguments of the function are not sufficient). It is also incredibly convenient for mapping the audio buffer without having to take cases. > Also why (multiple) offsets for the meta instead of a single "stride"? This allows to do zero-copy clipping as well as channel reordering. As long as every element uses the new gst_audio_buffer_map/unmap() API, clipping or reordering is just a matter of modifying the offsets and/or the 'samples' variable.
(In reply to George Kiagiadakis from comment #6) > (In reply to Sebastian Dröge (slomo) from comment #5) > > Making the meta mandatory is problematic as various elements can already > > handle non-interleaved audio. Without meta, it has to be tightly packed. > > Is that really true? Do they really handle non-interleaved audio? Is it > tested? It seemed to me that they advertise it, but nobody really supports > it. For example, audioresample advertises it, but it actually crashes if you > negotiate it with non-interleaved. Well, that's a bug :) If nothing ever worked with non-interleaved then it's probably OK to change behaviour. > A reason for making it mandatory is that there is API that won't be able to > handle non-interleaved audio otherwise, like gst_audio_buffer_clip() (the > arguments of the function are not sufficient). It is also incredibly > convenient for mapping the audio buffer without having to take cases. I agree with that, yes. Keep in mind though that metas are not free. > > Also why (multiple) offsets for the meta instead of a single "stride"? > > This allows to do zero-copy clipping as well as channel reordering. As long > as every element uses the new gst_audio_buffer_map/unmap() API, clipping or > reordering is just a matter of modifying the offsets and/or the 'samples' > variable. Ah, good idea indeed. Just requires yet another memory allocation for the meta as there would be n_channels offsets.
So, I have pretty much completed what I had in mind. I also went ahead and changed the libav decoders to output the decoder's native layout instead of interleaving internally. https://gitlab.collabora.com/gkiagia/gst-plugins-base/commits/planar_audio https://gitlab.collabora.com/gkiagia/gst-libav/commits/planar_audio These address all 3 related tickets: this one, plus: https://bugzilla.gnome.org/show_bug.cgi?id=705986 https://bugzilla.gnome.org/show_bug.cgi?id=705977 I also tried to do this for the libav encoders, only to realize later that the base class (GstAudioEncoder) cannot support non-interleaved layouts easily, because it uses a GstAdapter to cut the input into pieces sized to match the implementation's preferences. I am not planning to continue working on this atm, but I left the code in these branches for anyone who wishes to pick it up... https://gitlab.collabora.com/gkiagia/gst-plugins-base/commits/planar_audio_next https://gitlab.collabora.com/gkiagia/gst-libav/commits/planar_audio_next The only thing that remains here is to iterate through every other element in the tree and remove the "non-interleaved" string from its static caps (like I did with adder), or otherwise fix the element to support this layout properly and through GstAudioMeta, as I'm pretty sure things will not just work otherwise.
My only comment on this patch series is that I think you should be using GstAudioInfo inside GstAudioMeta instead of having a copy of the content of it. And you need to s/UNRELEASED/1.16/ Once that's fixed, I think we should merge it as the 1.16 branch is young and we have time to catch regressions.
Sounds like a good idea. I'd like to take a look at this in detail, but probably won't have time for that before the hackfest.
Another thing missing is the relevant documentation update that says: 1. With audio/x-raw non-interleaved, the GstAudioMedia is always required
(In reply to Olivier Crête from comment #11) > Another thing missing is the relevant documentation update that says: > > 1. With audio/x-raw non-interleaved, the GstAudioMedia is always required This is documented in https://gitlab.collabora.com/gkiagia/gst-plugins-base/commit/e18bcef7171e4bc3ecc5f07d9bae60e0e7cf061c#cc0a3565b7960819f3c84df6f8c4a59775422a59_128_177 and I just added it also in gst-docs: https://gitlab.collabora.com/gkiagia/gst-docs/commits/planar_audio Let's go through this in detail at the hackfest this weekend.
Created attachment 371717 [details] [review] libs: audio: Implement GstAudioMeta, a meta to describe mapping properties of audio buffers
Created attachment 371718 [details] [review] tests: audio: add unit tests for GstAudioMeta
Review of attachment 371717 [details] [review]: Only shortly reviewed the header so far :) ::: gst-libs/gst/audio/gstaudiometa.h @@ +144,3 @@ + * the @planes array. For interleaved buffers, the @planes array only contains + * one item, which is the pointer to the beginning of the buffer, and @n_planes + * equals 1. This should mention that the planes are always in GStreamer channel order @@ +151,3 @@ + gint n_planes; + gsize plane_size; + gpointer *planes; Planes? Or should it rather be channels? @@ +158,3 @@ + + gpointer priv_planes_arr[8]; + GstMapInfo priv_map_info_arr[8]; We might want to map only once if it's only a single memory... but that's an optimization for later times @@ +161,3 @@ + + gpointer _gst_reserved[GST_PADDING]; +} GstAudioMapInfo; This should be called GstAudioBuffer, for consistency with gst_audio_buffer_map(). And GstRTPBuffer, GstVideoFrame, etc. @@ +162,3 @@ + gpointer _gst_reserved[GST_PADDING]; +} GstAudioMapInfo; + Should maybe also get into a separate header, and then we can later add all kinds of macros like there are for GstVideoFrame and GstRTPBuffer @@ +199,3 @@ +}; + +GST_EXPORT This should be GST_AUDIO_EXPORT or not? @@ +218,3 @@ + +GST_EXPORT +void gst_audio_buffer_unmap (GstBuffer *buffer, GstAudioMapInfo *info); For videoframe and rtp buffer we also store the buffer, so unmapping only requires the struct and not in addition the buffer. Let's keep this consistent
(In reply to Sebastian Dröge (slomo) from comment #15) > Review of attachment 371717 [details] [review] [review]: > > @@ +151,3 @@ > + gint n_planes; > + gsize plane_size; > + gpointer *planes; > > Planes? Or should it rather be channels? Hmm, it's not channels because you can also use this struct to map an interleaved buffer (which has 1 plane). And this is useful if you want to have a uniform way to access data in both layouts. > @@ +158,3 @@ > + > + gpointer priv_planes_arr[8]; > + GstMapInfo priv_map_info_arr[8]; > > We might want to map only once if it's only a single memory... but that's an > optimization for later times If you look at the code, I just map once. The rest of the map infos are unused. I don't think it's allowed to map multiple times the same memory. Iirc there are locks.
Created attachment 371726 [details] [review] libs: audio: Implement GstAudioMeta, a meta to describe mapping properties of audio buffers
Created attachment 371727 [details] [review] tests: audio: add unit tests for GstAudioMeta
Review of attachment 371726 [details] [review]: ::: gst-libs/gst/audio/audio-buffer.c @@ +60,3 @@ + buffer->map_info = buffer->priv_map_info_arr; + + if (!gst_buffer_map (gstbuffer, &buffer->map_info[0], flags)) For the mapping, maybe take a look at the video frame mapping function. It would be nice to only have to copy together memories when mapping and a single channel is in multiple memories. And to only do a single mapping at most per memory. ::: gst-libs/gst/audio/audio-buffer.h @@ +52,3 @@ + gpointer *planes; + + GstBuffer *gstbuffer; Why the gst prefix? @@ +62,3 @@ + + gpointer _gst_reserved[GST_PADDING]; +} GstAudioBuffer; Should some macros be added here already? For getting the audio info, getting the data of the n-th channel, accessors for the audio info fields too maybe? ::: gst-libs/gst/audio/gstaudiometa.c @@ +314,3 @@ + gst_audio_info_init (&ameta->info); + ameta->samples = 0; + ameta->offsets = NULL; You need to initialize all the fields, otherwise they'll contain garbage @@ +402,3 @@ + /* default offsets assume channels are laid out sequentially in memory */ + for (i = 0; i < info->channels; i++) + meta->offsets[i] = i * samples * info->finfo->width / 8; GST_AUDIO_INFO_BPS :) ::: gst-libs/gst/audio/gstaudiometa.h @@ +156,3 @@ + * Since: 1.16 + */ +struct _GstAudioMeta { Should it maybe be get the word "planar" in here, or do you expect this to be also useful for interleaved audio somehow? @@ +161,3 @@ + GstAudioInfo info; + gsize samples; + gsize *offsets; Is it allowed to have multiple channels with the same offset, to cheaply duplicate a channel? @@ +180,3 @@ +GstAudioMeta * gst_buffer_add_audio_meta (GstBuffer *buffer, + const GstAudioInfo *info, + gsize samples, gsize offsets[]); This should maybe sanity-check the buffer size against the number of samples?
(In reply to Sebastian Dröge (slomo) from comment #19) > ::: gst-libs/gst/audio/audio-buffer.h > @@ +52,3 @@ > + gpointer *planes; > + > + GstBuffer *gstbuffer; > > Why the gst prefix? Well, because the parent structure is also a buffer, so then you write stuff like buffer->buffer that is not so readable, but it's not a big deal. I can remove it. > ::: gst-libs/gst/audio/gstaudiometa.c > @@ +314,3 @@ > + gst_audio_info_init (&ameta->info); > + ameta->samples = 0; > + ameta->offsets = NULL; > > You need to initialize all the fields, otherwise they'll contain garbage Which fields? I do initialze them :) > ::: gst-libs/gst/audio/gstaudiometa.h > @@ +156,3 @@ > + * Since: 1.16 > + */ > +struct _GstAudioMeta { > > Should it maybe be get the word "planar" in here, or do you expect this to > be also useful for interleaved audio somehow? The intention was to be able to attach it in both interleaved and non-interleaved buffers. Although, for interleaved I haven't thought of a use case. > @@ +161,3 @@ > + GstAudioInfo info; > + gsize samples; > + gsize *offsets; > > Is it allowed to have multiple channels with the same offset, to cheaply > duplicate a channel? Sure, why not. > @@ +180,3 @@ > +GstAudioMeta * gst_buffer_add_audio_meta (GstBuffer *buffer, > + const GstAudioInfo *info, > + gsize samples, gsize offsets[]); > > This should maybe sanity-check the buffer size against the number of samples? There is a check in the map function. But now I am thinking, what if you are duplicating channels in the way you just said above? The check will be inaccurate. Maybe we can check that the highest offset + samples * bps is contained in the buffer
(In reply to George Kiagiadakis from comment #20) > (In reply to Sebastian Dröge (slomo) from comment #19) > > ::: gst-libs/gst/audio/audio-buffer.h > > @@ +52,3 @@ > > + gpointer *planes; > > + > > + GstBuffer *gstbuffer; > > > > Why the gst prefix? > > Well, because the parent structure is also a buffer, so then you write stuff > like buffer->buffer that is not so readable, but it's not a big deal. I can > remove it. Without the gst prefix seems nicer but if someone disagrees, I don't care enough :) > > ::: gst-libs/gst/audio/gstaudiometa.h > > @@ +156,3 @@ > > + * Since: 1.16 > > + */ > > +struct _GstAudioMeta { > > > > Should it maybe be get the word "planar" in here, or do you expect this to > > be also useful for interleaved audio somehow? > > The intention was to be able to attach it in both interleaved and > non-interleaved buffers. Although, for interleaved I haven't thought of a > use case. Ok, we can think about that when it happens I guess. There's padding so it can be extended by something if needed. > > @@ +161,3 @@ > > + GstAudioInfo info; > > + gsize samples; > > + gsize *offsets; > > > > Is it allowed to have multiple channels with the same offset, to cheaply > > duplicate a channel? > > Sure, why not. Maybe document that then. It might have some implications for code working on the channels (as the pointers are aliasing then) > > @@ +180,3 @@ > > +GstAudioMeta * gst_buffer_add_audio_meta (GstBuffer *buffer, > > + const GstAudioInfo *info, > > + gsize samples, gsize offsets[]); > > > > This should maybe sanity-check the buffer size against the number of samples? > > There is a check in the map function. But now I am thinking, what if you are > duplicating channels in the way you just said above? The check will be > inaccurate. > > Maybe we can check that the highest offset + samples * bps is contained in > the buffer Or do that check for each offset, yes
(In reply to Sebastian Dröge (slomo) from comment #19) > @@ +62,3 @@ > + > + gpointer _gst_reserved[GST_PADDING]; > +} GstAudioBuffer; > > Should some macros be added here already? For getting the audio info, > getting the data of the n-th channel, accessors for the audio info fields > too maybe? Hmm, GstAudioBuffer doesn't contain the GstAudioInfo.... I was just thinking, maybe it is useful to have it there (like in GstVideoFrame). However, this information will only be available to the map function if the GstBuffer contains a GstAudioMeta, so it's not a very reliable source of information. The other case is to pass it to the map function, but that only complicates the work that the caller needs to do, for no good reason (there is no use case).
(In reply to Sebastian Dröge (slomo) from comment #19) > @@ +402,3 @@ > + /* default offsets assume channels are laid out sequentially in > memory */ > + for (i = 0; i < info->channels; i++) > + meta->offsets[i] = i * samples * info->finfo->width / 8; > > GST_AUDIO_INFO_BPS :) Nope, sorry. I did that and the unit test fails :) BPS checks the finfo->depth, which is the amount of valid bytes. I really need finfo->width, which is the "stride" (valid + invalid bytes)
Created attachment 371747 [details] [review] libs: audio: Implement GstAudioMeta, a meta to describe mapping properties of audio buffers
Created attachment 371748 [details] [review] tests: audio: add unit tests for GstAudioMeta
(In reply to George Kiagiadakis from comment #23) > > GST_AUDIO_INFO_BPS :) > > Nope, sorry. I did that and the unit test fails :) > BPS checks the finfo->depth, which is the amount of valid bytes. > I really need finfo->width, which is the "stride" (valid + invalid bytes) Ugh, I wonder how much code is out there using it wrong. I assumed it to be bpf/channels
Review of attachment 371747 [details] [review]: ::: gst-libs/gst/audio/audio-buffer.c @@ +120,3 @@ + + if (!gst_buffer_map_range (gstbuffer, idx, length, &buffer->map_infos[i], + flags)) This requires that each channel has all samples in the same memory, right? Because if it was in two memories, they would have to be merged but we already have one of the memories mapped then so they can't be merged Should be documented ::: gst-libs/gst/audio/audio-buffer.h @@ +52,3 @@ + gpointer *planes; + + GstBuffer *buffer; I would prefer a GstAudioInfo to be here, and to get it from a parameter in gst_audio_buffer_map(). Without the GstAudioInfo, you always need to carry two things around to have a useful GstAudioBuffer, it would be nicer to have it all in a single data structure (And mismatch between the meta audio info and the parameter would be a g_critical() and we use the one from the meta) ::: gst-libs/gst/audio/gstaudiometa.c @@ +367,3 @@ + * offsets[channel] = channel * @samples * bytes_per_sample + * + * Returns: the #GstAudioMeta that was attached on the @buffer Somewhere it should be documented that it's either allowed for the offsets to overlap, or not. And if not, we should check for that. I'd prefer not :) @@ -307,0 +307,115 @@ + + +static gboolean ... 112 more ... This looks more like a g_critical() ::: gst-libs/gst/audio/gstaudiometa.h @@ +148,3 @@ + * attached and to be mapped with gst_audio_buffer_map() in order to ensure + * correct handling of cliping and channel reordering (which is internally + * implemented by altering the @offsets and the @samples variables). Maybe add a paragraph how this would be used for interleaved audio, and that it's useless at this point but might get additional functionality added at a later time
Created attachment 372125 [details] [review] libs: audio: Implement GstAudioBuffer & GstAudioMeta Added the GstAudioInfo in GstAudioBuffer, plus some access macros, including a "_SAMPLE_STRIDE" one (let me know if you think that's a bad idea) Improved the documentation as well and added a check for overlapping channels in GstAudioMeta. Also added a "samples" variable in the GstAudioBuffer, after looking at the use cases and realizing that every element currently calculates this.
Created attachment 372126 [details] [review] tests: audio: add unit test for GstAudioBuffer & GstAudioMeta Refactored to be a single test, with data read from an array.
Review of attachment 372125 [details] [review]: Looks good to go IMHO, except for some minor bikeshedding below ::: gst-libs/gst/audio/audio-buffer.h @@ +52,3 @@ + GstAudioInfo info; + + gsize samples; Seems inconsistent with below. Maybe should be called n_samples? @@ +55,3 @@ + + gint n_planes; + gsize plane_size; Or plane_stride? ::: gst-libs/gst/audio/gstaudiometa.c @@ +416,3 @@ + "areas to overlap! offsets: %" G_GSIZE_FORMAT " (%d), %" + G_GSIZE_FORMAT " (%d) with plane size %" G_GSIZE_FORMAT, + offsets[i], i, offsets[j], j, plane_size); This limitation should be documented (if not happened already)
(In reply to Sebastian Dröge (slomo) from comment #30) > Review of attachment 372125 [details] [review] [review]: > > Looks good to go IMHO, except for some minor bikeshedding below > > ::: gst-libs/gst/audio/audio-buffer.h > @@ +52,3 @@ > + GstAudioInfo info; > + > + gsize samples; > > Seems inconsistent with below. Maybe should be called n_samples? Agreed. > @@ +55,3 @@ > + > + gint n_planes; > + gsize plane_size; > > Or plane_stride? Hmm, it's called a plane_stride in GstVideoFrame, but it somehow feels wrong here. By convention we usually call "size" the valid amount of information and "stride" the step that you have to make to find the next valid object in memory. In this case plane_size is the amount of valid data. Doing planes[i] + plane_size will NOT necessarily give you a pointer to the next plane. > ::: gst-libs/gst/audio/gstaudiometa.c > @@ +416,3 @@ > + "areas to overlap! offsets: %" G_GSIZE_FORMAT " (%d), %" > + G_GSIZE_FORMAT " (%d) with plane size %" G_GSIZE_FORMAT, > + offsets[i], i, offsets[j], j, plane_size); > > This limitation should be documented (if not happened already) Ugh, I thought I documented it, but apparently I forgot. Will do.
(In reply to George Kiagiadakis from comment #31) > > > @@ +55,3 @@ > > + > > + gint n_planes; > > + gsize plane_size; > > > > Or plane_stride? > > Hmm, it's called a plane_stride in GstVideoFrame, but it somehow feels wrong > here. By convention we usually call "size" the valid amount of information > and "stride" the step that you have to make to find the next valid object in > memory. > > In this case plane_size is the amount of valid data. Doing planes[i] + > plane_size will NOT necessarily give you a pointer to the next plane. Ah then I misunderstood. But then this is redudant or not? That's simply n_samples*width/8, and could just move that calculation into a macro instead of storing it.
(In reply to Sebastian Dröge (slomo) from comment #32) > Ah then I misunderstood. But then this is redudant or not? That's simply > n_samples*width/8, and could just move that calculation into a macro instead > of storing it. It's different for non-interleaved & interleaved - non-interleaved: n_planes = N plane_size = n_samples * width/8 - interleaved: n_planes = 1 plane_size = n_samples * width/8 * channels
Oh hmm, looking at these equations now, I guess we could use a macro that uses the following formula: n_samples * width/8 * channels / n_planes
Ok so what's the status here? :)
Um, sorry, I've been busy and I almost forgot. Let me make these last changes.
Great, please merge it then so it won't get forgotten again until it's time for the release :)
commit e9b4f2b317128cfb5f4469d6d3b925072d670aa0 Author: George Kiagiadakis <george.kiagiadakis@collabora.com> Date: Thu Feb 8 15:20:09 2018 +0200 tests: audio: add unit test for GstAudioBuffer & GstAudioMeta https://bugzilla.gnome.org/show_bug.cgi?id=751605 commit c946e323f6ccd4ed0dd2b2da7920e600f8de28b5 Author: George Kiagiadakis <george.kiagiadakis@collabora.com> Date: Wed Feb 7 14:36:01 2018 +0200 libs: audio: Implement GstAudioBuffer & GstAudioMeta Library bits to support non-interleaved audio https://bugzilla.gnome.org/show_bug.cgi?id=751605