GNOME Bugzilla – Bug 691370
caps intersection is broken for channel-layout / bitmasks
Last modified: 2015-03-01 14:34:20 UTC
Channel layout currently uses bitmask, but this breaks caps intersection. Example: caps1 = [ { channels: 4, layout: FL, FR, RL, RR }, { channels: 4, layout: FL, FR, SL, SR } ] caps2 = { channels: 4, layout: FL, FR, SL, SR } I assume, that after gst_caps_intersect I should get intersection = { channels: 4, layout: FL, FR, SL, SR } // same as second struct of caps1 This is not the case though, what I really get is [ { channels: 4, layout: FL, FR }, { channels: 4, layout: FL, FR } ] This probably happens because GST_BITMASK intersection is done using bitwise and.
Any chance you could make a unit test?
Created attachment 233069 [details] [review] Unit test Here you go
Sebastian, any idea what to do with this? (Why should it not be treated like a normal int? Is there a reason we didn't make it a uint64 instead?)
<slomo> __tim: no idea what to do about the bitmask bug, still had no time to look at it closely... but it's not a normal int because we need different intersection/subset (0x01 is a subset of 0xff but for integers it isn't)
Yeah, except this subset is exactly what doesn't work for channel layout (see the description). If you intersect { channels: 4, layout: FL FR RL RR } with { channels: 4, layout: FL FR SR SR } and get { channels: 4, layout: FL FR } as result it is clearly wrong. It shouldn't intersect in this case.
Which means some bitmask are set, and some (like layout) are not, what about having bitset and bitmask types ? (impossible without API break imho).
GST_TYPE_BITMASK and GST_TYPE_BITSET? If you change the way GST_TYPE_BITMASK intersects (so that audio layout works) and introduce GST_TYPE_BITSET (that intersects just like bitmask currently does) there wouldn't have to be API breaks.
(In reply to comment #5) > Yeah, except this subset is exactly what doesn't work for channel layout (see > the description). > > If you intersect > { channels: 4, layout: FL FR RL RR } > with > { channels: 4, layout: FL FR SR SR } > and get > { channels: 4, layout: FL FR } > as result it is clearly wrong. It shouldn't intersect in this case. The intersection is correct... but elements are supposed to reject this because channels=4 and length(layout)=2 is invalid. This is similar to what we had in 0.10 with the channels field and the channel-layout array: the caps where only valid if the channel-layout had as many elements as channels. Or am I misunderstanding the problem here?
So who's responsibility is it to check and skip the invalid structure during intersect? Because gstaudioconvert takes the first structure after intersection and doesn't skip it if channel count doesn't match channel layout and prints the following warning WARNING **: audioconvert1: Invalid downstream channel-mask with too few bits set 01:27:53 WARN gst.audioconvert gstaudioconvert.c:632 - - - - - unexpected output channel layout Should every code that calls gst_caps_intersect go through the result and filter out invalid structures?
<__tim> slomo, I am not sure I agree about the channel mask thing <slomo> __tim: how was this handled in 0.10? <__tim> what is 'this' exactly now? <__tim> the array was treated as a whole <slomo> __tim: ah right, nevermind :) <__tim> one problem is that we have added a new 'bitmask' type when we really meant a 'channelmask' type, no? <slomo> how do you mean that? <__tim> something that might be correct for a bitmask in general might not be correct for our channel masks <slomo> __tim: i think the problem is more that there are different semantics for unfixed channel masks (the current intersection works for that) and fixed channel masks (where we have the problem) <__tim> really? <__tim> but we would never express unfixed possible masks as all bits set, would we? <slomo> __tim: and the relation between the channels and channel-mask fields in the caps ($channel-mask needs $channels bits set for fixed caps, and >= $channels bits for unfixed caps) <__tim> but rather a list of possible combinations? <__tim> (or just no field for 'everything goes') <slomo> no, unfixed "everything possible" is all bits set <__tim> right, and that's where the problems start, no? :) <__tim> I note that there is not a single element that advertises a channel mask with all bits set on its template caps <__tim> and that special case is really the only justication for doing a bitwise intersection, no? <slomo> __tim: yes, not only "all bits set" but also the case when more bits are set <slomo> __tim: i think this is not used anywhere yet so could be changed (and we just require a list of channel-masks instead) <slomo> __tim: i probably was trying to be too clever there :) <__tim> :) <__tim> well, would you ever do something like channels=2, channel-mask=FR,FL,RR,RL to indicate 'any two channels of those' ? <__tim> I don't think that's something you'd do in practice <slomo> __tim: so the change would be to replace the intersection, union, subset, etc operations with the ones for integers <slomo> __tim: or am i missing something? <slomo> currently they all do set operations on the bits <__tim> I think so <slomo> ok
commit fb3b53328f19489848f6d25473ad41b75d7f6ca6 Author: Sebastian Dröge <sebastian.droege@collabora.co.uk> Date: Wed Feb 13 17:00:23 2013 +0100 value: Remove set-style bitmask intersection/union/subtraction functions Set operations on the bitmasks don't make much sense and result in invalid caps when used as a channel-mask. They are now handled exactly like integers. This functionality was not used anywhere except for tests. https://bugzilla.gnome.org/show_bug.cgi?id=691370
I'm actually wondering what to do in osxaudio. I'm using a virtual Soundflower[1] 64-channel device to exercise osxaudio. Those are 64 channels with GST_AUDIO_CHANNEL_POSITION_NONE, and theoretically I'd like to be able for a source to push its stereo {L,R} audio into two of those channels. (Core Audio happily accepts that.) Does this even make sense? [1] https://rogueamoeba.com/freebies/soundflower/
To clarify, my concern is about the caps that osxaudiosink should return. I understand a fake 64-channel device might be a non-issue. Here's a real-life scenario with a 5.1 Audio Unit. The resulting osxaudiosink caps will be: audio/x-raw, format=(string)F32LE, layout=(string)interleaved, rate=(int)96000, channels=(int)5, channel-mask=(bitmask)0x000000000000003e; audio/x-raw, format=(string){ S8, U8, S16LE, S16BE, U16LE, U16BE, S24_32LE, S24_32BE, U24_32LE, U24_32BE, S32LE, S32BE, U32LE, U32BE, S24LE, S24BE, U24LE, U24BE, S20LE, S20BE, U20LE, U20BE, S18LE, S18BE, U18LE, U18BE, F32LE, F32BE, F64LE, F64BE }, layout=(string)interleaved, rate=(int)[ 1, 2147483647 ], channels=(int)[ 1, 5 ], channel-mask=(bitmask)0x000000000000003e (First preferred caps, then template caps.) In the template caps, I'm allowing [ 1, 5] channels, and I want channel-mask to indicate that I'm willing to take any of the channels in a 5.1 configuration. However, unless I want to add structures of all possible subsets of channel-mask, it's impossible. Another option is to omit channel-mask entirely (from the template-caps), and fail gst_osx_audio_ring_buffer_acquire. This is similar to the check done in gst_audio_get_channel_reorder_map. As much as I could understand, that's my only option at the moment.
I'm really not sure I understand the usecase. Why would you want to allow *any* channels in 5.1 configurations?
Not *any* channels at all, but a subset of 5.1, e.g. (L,R)-stereo or (L,R,C). I realize that some sinks might not be able to easily accept audio buffers that are a subset, i.e. they'll need someone streaming stereo to generate 5.1 buffers with silence chunks for the missing channels. Other sinks can fill in the silence themselves easily, so how can those sinks advertise their channels, while at the same time advertising they can negotiate a subset of these channels?
Then why not do what multichannel encoders do? Generate structures with all acceptable combinations.
That'll be (2^channels) structures in the caps. It's possible but sounds cumbersome.
How did you arrive to that number? If you have 6 channels it will either be L R C SR SL LFE (or L R C RR RL LFE). That's at most two structures for 6 channels. Or do you expect other channels in 5.1 configuration?
... or any combination of them, e.g. (L,R). I'm calculating the number of possible subsets of channel-mask.