Bug 360246 – [audioconvert] Optionally apply dithering

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 360246 - [audioconvert] Optionally apply dithering


Summary:	[audioconvert] Optionally apply dithering


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-base
Version:	git master
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	0.10.14
Assigned To:	Sebastian Dröge (slomo)
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Duplicates:	428935 (view as bug list)
Depends on:
Blocks:

Reported:	2006-10-06 18:58 UTC by René Stadler
Modified:	2007-06-28 20:36 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
dither.diff (56.90 KB, patch) 2007-05-03 21:17 UTC, Sebastian Dröge (slomo)	none	Details \| Review
dither.diff (56.31 KB, patch) 2007-05-07 15:41 UTC, Sebastian Dröge (slomo)	none	Details \| Review
dither.diff (57.53 KB, patch) 2007-06-27 21:30 UTC, Sebastian Dröge (slomo)	committed	Details \| Review

Description René Stadler 2006-10-06 18:58:48 UTC

Dithering is used to reduce the harmonic distortion that can occur as part of the conversion of audio data from e.g. 32-bit floating point to 16-bit integer format (data coming from effect processing converted for playback or final long term storage).

See http://en.wikipedia.org/wiki/Dithering for more information about (audio) dithering.  At least one obvious implementation under LGPL is available, as part of the WaveGain program (http://www.rarewares.org/others.html).

It's a useful feature to have at some point in the future.  The target is most probably the audioconvert element, as this is where any reduction of the bandwidth actually happens.

Comment 1 Stefan Sauer (gstreamer, gtkdoc dev) 2007-02-02 08:40:20 UTC

I would like to see this added. If you can come up with a patch I will review and support it.

Maybe first Bug #339837 needs to be solved and then the dithering should also support 64 bit float.

Comment 2 Sebastian Dröge (slomo) 2007-02-05 13:01:42 UTC

René, are you already working on this? Otherwise I'll take a look at the wavegain dithering code and try to port it to audioconvert in the next days.

Comment 3 René Stadler 2007-02-05 14:49:11 UTC

I looked into this more closely, but there isn't any code yet really.  Feel free to go ahead.

It might make sense to place this functionality into a separate element after all.  Having it in audioconvert makes it easier to use it, but does not cover all use cases it seems.

Comment 4 Stefan Sauer (gstreamer, gtkdoc dev) 2007-02-05 15:09:05 UTC

dithering should only be active when going from float/double to 16bit int or when going from >16bit int to 16bit int. It sounds bad when used with 8bit (noise is too loud). The implementation should use a enum to select:
???-mode={None, Dithering, NoiseShaping }

http://en.wikipedia.org/wiki/Dithering
http://en.wikipedia.org/wiki/Noise_shaping
http://www.harmony-central.com/articles/tips/dithering/
http://www.digitalsignallabs.com/noiseb.ps

Comment 5 Sebastian Dröge (slomo) 2007-02-05 16:36:54 UTC

René, which use cases do you mean?

Thanks for the URLs, Stefan. I'll look at them later tonight :)
At least RPDF/TPDF should be fairly easy to implement, for noise shaping no idea yet...

Comment 6 René Stadler 2007-02-06 15:00:26 UTC

There is the edge case of having a (16 bit) integer-only pipeline with a single processing step.  In this case, the conversion to float and back is a no-op and can be skipped.

Also, since this is technically just an effect filter like any other, people might expect to be able to use it as such.  This is not possible if it's tied to the conversion as part of audioconvert.

Like already said, being part of audioconvert makes it easier to use it (and probably harder to misuse it).  I must say that I can't really decide on either way, so consider myself neutral :-)

Comment 7 Sebastian Dröge (slomo) 2007-02-06 15:14:41 UTC

So the best would probably be to add a "audiodither" element that would be added by decodebin/playbin if useful and specified by some property (or even default?).

This element then would implement dithering and noise shaping, selected by a property.

Comment 8 Tim-Philipp Müller 2007-04-12 08:55:58 UTC

*** Bug 428935 has been marked as a duplicate of this bug. ***

Comment 9 David Schleef 2007-04-13 21:47:01 UTC

I have some test code for dithering, error feedback, and noise shaping that works fairly well.  I think it would be a good idea to put these into audioconvert and enable them by default.  I see audioconvert as being an element that Just Works for the average user -- if we want to provide duplicate functionality somewhere else that is more configurable, great.

The techniques tend to need to be used together, since they counteract each other's bad features.  When all used together, the main badness is that it adds a lot of noise at high frequencies.  This is ok if the output is going straight to a sound card and then to speakers or headphones, but is very bad if you intend to do any kind of post-processing.  Like, say, letting alsa's dmix touch it.

Another problem that I'm trying to get around is that the literature designs filters based on the standard F-weighted equi-loudness curves.  This assumes that the average person cannot hear above about 16 kHz, so it pushes most of the noise to that region.  Someone with better than average hearing (you know, about *half* of people) will hear this clearly and find it stressful and/or annoying.

Hopefully I'll find a pointer to research exploring the 90% percentile of hearing, or similar.  I think that would be more useful.

Comment 10 Sebastian Dröge (slomo) 2007-04-14 07:58:55 UTC

David, can you put your test code online somewhere?

I planned to do this as a small part of my SoC project and already have everything needed in my head, only the code is missing but should be possible to be written in a day :)
Having a look at your sample code for some additional ideas might be nice though unless you have everything ready anyway...

And contrary to what I said before I planned it as an addition to audioconvert but wasn't sure if it should be enabled by default or not only because of people maybe assuming the old behaviour (just look at the unit test of audioconvert).

Comment 11 David Schleef 2007-04-14 22:06:31 UTC

git clone git://people.freedesktop.org/~ds/cog

in cog/gst/gstdither.c.

The code in gstdither.c quantizes int16_t audio into values that can be directly shifted into int8_t.  I.e., the low 8 bits of each output sample is 0.  I tried integrating this code into audioconvert.c and got frustrated -- I'd want to rewrite audioconvert.c into four stages: unpack, mix, quantize, pack, with better control over intermediate buffers.

Since audioconvert uses int32_t as the unpacked format, the code needs to be adjusted to check for overflows and/or use int64_t as intermediates.  Another possibility is to shift the 32 bit values right by 8 bits (thus losing 8 LSBs) before quantizing -- converting 32->24 doesn't need dithering anyway.

There's also code for converting float to int16_t, but it creates the float samples by converting from int16_t.

The random number generator (for dithering) is from liboil.  Liboil has good random number generation, but it's used very poorly here.  Ideally, audioconvert should have a random number generator context that it uses only to generate dithering, and reset it when other element state gets reset (READY->PAUSED).  Liboil doesn't have contextual RNG, whereas glib does.  So I'd recommend using glib's RNG until liboil is fixed.

Comment 12 Sebastian Dröge (slomo) 2007-04-15 10:00:51 UTC

My plan was to put the dithering and noise shaping in the pack functions which would get an additional parameter (saved in the audioconvert structure) containing the random number generator for dithering, the errors for noise shaping, etc.

And then we would get many many different functions...
I wanted to implement for dithering:
a) RPDF
b) TPDF
c) high frequency TPDF
d) maybe gaussian

and for noise shaping 1st order, 3rd order and maybe 16 or 20th order. Coefficients for the latter would be generated by dmaker[0].

Unfortunately this would give us either 5*3 unpack functions per target format or a few function calls per sample in the unpack functions.

Your idea of refactoring audioconvert into unpack, mix, quantize and pack sounds better, I'll care for it unless you want to do it :)

For the random numbers I planned to use GRandom anyway for now as it's fairly fast...

As default settings the best would probably be:
- X to Y bits with Y >= X: nothing
- X to Y bits with Y >= 20: nothing
- X to Y bits with Y < 20: TPDF + 1st order noise shaping

Then either have just this implemented in audioconvert and create and additional element which has all this configurable or have this as default in audioconvert and add some parameters for every other possibility...


[0] http://shibatch.sourceforge.net/

Comment 13 Sebastian Dröge (slomo) 2007-04-16 17:05:47 UTC

If the default in audioconvert is to add dithering and noise shaping and it can't be disabled at all we have a problem with the unit test btw.

So probably the better solution would be to have it enabled by default and a few parameters to change the methods used (and disable it for the unit test).

Comment 14 David Schleef 2007-04-16 18:36:20 UTC

It shouldn't matter.  Multiple runs should still produce the same result, although the result will be different from before the change.

Comment 15 Sebastian Dröge (slomo) 2007-04-16 18:46:45 UTC

This would require that we initialize GRandom always with the same value, that's fine.

Still I would prefer to have everything in audioconvert and some parameters to set it all up and then have sane defaults. Otherwise we would have nearly the same element twice somewhere :/

Where did you get your coefficients for noise shaping from btw?


And you thought about the refactoring like this?
unpack: blow everything up to 32 bit ints (or doubles)
mix: mix the channels
quantize: add dithering and noise shaping, correctly round the values and set the lower n bits to zero while still working on 32 bit ints
pack: shift and write to the target format


Using 64 bit integers as intermediate format sounds like a large performance degradation though but would make everything much easier. I guess the best would be to stay with 32 bit ints/doubles as intermediate format until everybody uses a 64 bit architecture or what do you think? :)

Comment 16 David Schleef 2007-04-16 19:06:48 UTC

I designed the filter myself.  It's one of my superpowers.

32-bit intermediates should be fine for quantizing 28-bit to anything smaller, plus anything less than 28 bits.  There's no point in using 64 bit intermediates except to make the code slightly less complicated and a lot slower.

Comment 17 Sebastian Dröge (slomo) 2007-04-20 12:40:43 UTC

OK, I have everything almost finished, it only needs some polishing and harder testing next week, then I'll attach a patch.

This will include a split of the pack functions into quantize and pack, 4 dithering modes (none, rpdf, tpdf and tpdf high frequency) and 4 noise shaping modes (none, simple one-pole, medium 6-pole with David's coefficients, high quality 20-pole with coefficients generated by dmaker for the most common sampling rates and the one nearest to the current sampling rate is chosen).

Dithering is tpdf by default when target depth < source depth and target depth < 20, noise shaping is disabled by default.
This decision is made because of the following reasons:
a) soundcards' AD converter currently only have a SNR up to something lower than what 20 bit samples provide, thus the noise generated by the AD converter is already higher than the noise that would be added by dithering and as such adding dithering is useless (and would even add more noise).
b) high frequency tpdf and noise shaping should only be used on the conversion step, otherwise (if used more than once one the same samples) it will cause distortions.

Apps like buzztard for example would then use audioconvert with default settings for everything but the last conversion and would use tpdf-hf and some kind of noise shaping for the last conversion. The same could probably be done for playbin too.


Whenever noise shaping is used audioconvert uses double as intermediate format as otherwise the computation would become a bit ugly to impossible. For only dithering or none of both the current intermediate formats are used and for both disabled the results are exactly the same as before.

Comment 18 Sebastian Dröge (slomo) 2007-04-20 12:41:45 UTC

Ah, one question David... are your noise shaping coefficients working good for all sample rates or will they move the noise to undesired frequencies for some?

Comment 19 David Schleef 2007-04-20 17:04:29 UTC

It is designed for 44100.  It will work ok for 48000.  I don't recommend it for anything below 44100.

Below 44100, (actually, around 32000), you'd probably want to use a (roughly) flat curve.  You'll get some advantage by reshaping out of 1-5 khz, but it's minimal.

Comment 20 Sebastian Dröge (slomo) 2007-04-21 06:40:28 UTC

What about sampling rates above 48000? For example 96000?

Also do you mean with a flat curve? How would this be done?

Comment 21 Sebastian Dröge (slomo) 2007-05-03 21:17:01 UTC

Created attachment 87492 [details] [review]
dither.diff

Ok, here is the current version.

The only thing that needs improvement is the FIXME in gstaudioquantize.c at the top.

For some reason when either doing dithering with integer as intermediate format or dithering + noise shaping with double as intermediate format one gets harmonics.
When doing dithering only with double as intermediate format (just change the NS stuff in line 323 to NONE_FUNC for example) there are no harmonics and a constant, white, noise floor. Does anybody know why this happens?

Would be nice if someone could review the patch :)

Comment 22 Stefan Sauer (gstreamer, gtkdoc dev) 2007-05-04 06:19:49 UTC

One comment regading the source, we should not use 'this' as it is a c++ keyword. I recommend using 'self' like elsewhere.

The code look good to me although its becomming a complex beast. maybe it would help to replace things like:
if ((ctx->in.is_int || ctx->out.is_int) && ctx->ns == NOISE_SHAPING_NONE) {...
with macros
if (CTX_IS_NO_CONVERT(ctx)) { ...
to express what the condition is meant to check for (please use appropriate name :)).

Comment 23 Sebastian Dröge (slomo) 2007-05-07 15:41:45 UTC

Created attachment 87719 [details] [review]
dither.diff

Ok, here now with the macro Stefan suggested and a bugfix for the noise shaping. Now there are no (very small) harmonics when using noise shaping instead of fairly large ones. The harmonics for dithering on integer are still there though :/

Comment 24 Sebastian Dröge (slomo) 2007-05-15 08:10:02 UTC

Ok, does someone have an oppinion on the latest patch, maybe you David? IMHO this can be committed as is. It's not yet _perfect_ but definitely an improvement to what was there before.

Comment 25 David Schleef 2007-05-15 19:30:36 UTC

The patch is hard to follow, I'll have to spend some time with the actual code.  However, I've noticed:

 - the scale factor is still (1<<31)-1 instead of 1<<31

 - there's at least one place doing a float divide by a constant instead of a multiply.  (Note that the compiler does the right thing for x = y*(1/12345.0))

 - Don't label things with "high quality" (e.g., NOISE_SHAPING_HQ), since whether or not the quality is high depends on your standards.  Describe exactly what it is, then add the modifier "(highest quality)" if appropriate.

 - Put references in the source for any research you've done, especially where the filter coefficients came from.

After writing liboil, I find it very difficult to review mathematical code that isn't written dirt-simple.  I'd encourage you to rewrite this in such a way that there's an easy-to-understand reference implementation, and whatever complicated optimized implementation, and some way to measure that they give the same results.

Comment 26 Sebastian Dröge (slomo) 2007-06-27 21:30:59 UTC

Created attachment 90786 [details] [review]
dither.diff

new patch, some more docs/explainations...
HQ is now just HIGH, does anybody have a better name?

I guess this can be committed now unless someone strongly disagrees

Comment 27 Sebastian Dröge (slomo) 2007-06-28 20:36:04 UTC

Ok, committed:

2007-06-28  Sebastian Dröge  <slomo@circular-chaos.org>

	* gst/audioconvert/Makefile.am:
	* gst/audioconvert/audioconvert.c: (audio_convert_get_func_index),
	(check_default), (audio_convert_prepare_context),
	(audio_convert_clean_context), (audio_convert_convert):
	* gst/audioconvert/audioconvert.h:
	* gst/audioconvert/gstaudioconvert.c:
	(gst_audio_convert_dithering_get_type),
	(gst_audio_convert_ns_get_type), (gst_audio_convert_class_init),
	(gst_audio_convert_init), (gst_audio_convert_set_caps),
	(gst_audio_convert_set_property), (gst_audio_convert_get_property):
	* gst/audioconvert/gstaudioconvert.h:
	* gst/audioconvert/gstaudioquantize.c:
	(gst_audio_quantize_setup_noise_shaping),
	(gst_audio_quantize_free_noise_shaping),
	(gst_audio_quantize_setup_dither),
	(gst_audio_quantize_free_dither),
	(gst_audio_quantize_setup_quantize_func),
	(gst_audio_quantize_setup), (gst_audio_quantize_free):
	* gst/audioconvert/gstaudioquantize.h:
	Implement dithering and noise shaping in audioconvert. By default now
	TPDF dithering (and no noise shaping) will be used when converting
	from a higher bit depth to 20 bit depth or smaller, otherwise
	everything will be as it is now.
	For the last audioconvert in a pipeline it would make sense to
	use some kind of noise shaping, enabling it by default for all
	conversions would give undesired results though. Fixes #360246.
	* tests/check/elements/audioconvert.c: (setup_audioconvert),
	(GST_START_TEST):
	Adjust unit test for the new audioconvert.