Bug 590363 – a52dec negotiates wrong channel downmix

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 590363 - a52dec negotiates wrong channel downmix


Summary:	a52dec negotiates wrong channel downmix


Status:	RESOLVED DUPLICATE of bug 570791

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-ugly
Version:	git master
Hardware:	Other All

Importance:	Normal minor
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2009-07-31 11:43 UTC by LRN
Modified:	2011-05-20 06:51 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description LRN 2009-07-31 11:43:04 UTC

a52dec picks first structure in the caps allowed downstream and uses it.
Instead, a52dec should
1) Search for a structure that matches its natural channel count and channel layout exactly and (if it is found) output audio as-is.
2) if 1) fails, search for a structure that matches its natural channel count (but not layout) and (if it is found) output audio with this channel count, (moving its channels to match the layout in the caps?)
3) if 2) fails, search for a structure that does have channel count less than its natural channel count and output that channel count (with downmixing?).
4) If 3) fails, search for a structure that does have the smaller channel count and ... upmix? Or maybe just use its layout to output native channel count?

P.S. This is really confusing, and somewhat wrong. libavcodec, for example, defines special stereo downmix layout, which is used to tell the decoder to downmix more-than-2-channel-audio to stereo instead of picking front-left and front-right channels and pushing them as-is. For example, there could be a 6-channel audio (left, right, center, lfe, rear left, rear right) and a 5-channel downstream caps with channel-positions="left, right, center, rear left, rear right". It is ambiguous. Should the decoder downmix 6->5 (in this case - mix LFE into all other channels) or just pick 5 channels (output 5 channels as is and discard LFE)? This is not addressed by current channel layout signaling system.

P.P.S. Situation is worsened by the fact that audioconvert (which is usually placed after any audio decoder) does not know how to convey downstream caps exactly and discards channel layout information (and often merges channel count information). Because of that it is not possible to use it for channel reordering (as in 3) and it is not possible to negotiate channel layout (as in 2) when audioconvert is placed after the decoder.

Comment 1 Sebastian Dröge (slomo) 2009-08-14 08:29:24 UTC

(In reply to comment #0)
> a52dec picks first structure in the caps allowed downstream and uses it.
> Instead, a52dec should
> 1) Search for a structure that matches its natural channel count and channel
> layout exactly and (if it is found) output audio as-is.
> 2) if 1) fails, search for a structure that matches its natural channel count
> (but not layout) and (if it is found) output audio with this channel count,
> (moving its channels to match the layout in the caps?)
> 3) if 2) fails, search for a structure that does have channel count less than
> its natural channel count and output that channel count (with downmixing?).
> 4) If 3) fails, search for a structure that does have the smaller channel count
> and ... upmix? Or maybe just use its layout to output native channel count?

Step 2) won't work always as a52dec can't convert to every possible channel layout. For example it can't convert to (REAR_LEFT,REAR_RIGHT). Not that this is a useful channel layout but it would be a problem ;)
In step 2 you could check if there's a structure with the natural channel count and a permutation of the natural channel layout.

Step 3) should be more a search for a structure with higher channel count and "compatible" channel layout, i.e. one that a52dec can convert to. Next step would be the same with lower channel count.

And if all that fails return NOT_NEGOTIATED ;)

Also note, that if downstream doesn't specify a channel layout but only a channel count this means that you can provide whatever channel layout you want (i.e. downstream doesn't care at all what the channel layout is, the volume element for example).

> P.S. This is really confusing, and somewhat wrong. libavcodec, for example,
> defines special stereo downmix layout, which is used to tell the decoder to
> downmix more-than-2-channel-audio to stereo instead of picking front-left and
> front-right channels and pushing them as-is. For example, there could be a
> 6-channel audio (left, right, center, lfe, rear left, rear right) and a
> 5-channel downstream caps with channel-positions="left, right, center, rear
> left, rear right". It is ambiguous. Should the decoder downmix 6->5 (in this
> case - mix LFE into all other channels) or just pick 5 channels (output 5
> channels as is and discard LFE)? This is not addressed by current channel
> layout signaling system.

a52dec has also special N->2 downmix matrices and they're used when downmixing in a52dec IIRC. In your 6->5 example IMHO a52dec should use it's internal downmixing matrix to downmix to the 5 channels, if such thing doesn't exist it should probably just drop LFE.

> P.P.S. Situation is worsened by the fact that audioconvert (which is usually
> placed after any audio decoder) does not know how to convey downstream caps
> exactly and discards channel layout information (and often merges channel count
> information). Because of that it is not possible to use it for channel
> reordering (as in 3) and it is not possible to negotiate channel layout (as in
> 2) when audioconvert is placed after the decoder.

audioconvert tries to pass the channel layout downstream as is if possible, then it tries to pass a permutation of the channel layout downstream and if everything fails it does magic mixing (magic because it has no special knowledge about the channels except their positions, i.e. it can't use a special 5.1->2 downmix matrix as should be used for MPEG for example).

If audioconvert is after a52dec, what caps do you get? You should probably get the downstream caps as the first structures and then the "audio ANY caps" afterwards. In that case you can still do you special handling in a52dec.
If this is not the case audioconvert should be fixed I guess ;)

Do you want to work on this? :)

Comment 2 Stefan Sauer (gstreamer, gtkdoc dev) 2010-07-09 11:38:59 UTC

Maybe a52dec could also send dynamic downmix matrix as event so that audioconvert would pick them up or are the "special N->2 downmix matrices" you mentioned static ones?

Comment 3 Tim-Philipp Müller 2010-07-09 13:54:21 UTC

Stefan: negotiation bugs aside, this doesn't really help, because the problem is usually that the sink doesn't advertise the correct number of channels and things are then downmixed outside of GStreamer (alsa, pulse etc.).

Comment 4 Sebastian Dröge (slomo) 2011-05-20 06:51:53 UTC

Thanks for the bug report. This particular bug has already been reported into our bug tracking system, but please feel free to report any further bugs you find.

*** This bug has been marked as a duplicate of bug 570791 ***