GNOME Bugzilla – Bug 746661
audioconvert: slow dithering on architectures without 64-bit integer divide (e.g. armv7)
Last modified: 2015-03-24 15:53:07 UTC
audioconvert's quantization functions (e.g. gst_audio_quantize_quantize_signed_tpdf_none) are slow on architectures that don't have 64-bit integer divide (e.g. ARMv7). The culprit is gst_fast_random_int32_range: static inline gint32 gst_fast_random_int32_range (gint32 start, gint32 end) { gint64 tmp = gst_fast_random_uint32 (); tmp = (tmp * (end - start)) / G_MAXUINT32 + start; return (gint32) tmp; } A 64-bit integer is divided by G_MAXUINT32. On armv7 iOS, this calls library function __divdi3, which is 20 times (!) slower compared to same iOS device running an arm64 build. According to Sebastian Dröge, since gst_fast_random_int32_range is only used for dither noise generation, we might replace it with whatever works better for that specific purpose.
According to what I see, in all integer quantizations: dither = (1<<(scale - 1)) or (1<<(scale)) ... [1] gst_fast_random_int32_range (bias - dither, bias + dither) or [2] gst_fast_random_int32_range (bias - dither, bias + dither - 1) Therefore, (end - start) is always either 2^n or 2^n-1 (for some 'n'). So why can't we just have two functions: return start + (gst_fast_random_uint32 () & (end - start - 1)); // for [1] and return start + (gst_fast_random_uint32 () & (end - start)); // for [2] ?
(For [2], we'll be going out of bounds slightly, so we should do something in that case -- e.g. conditionally subtract.)
Created attachment 300167 [details] [review] Patch - audioconvert: Avoid int division in quantization I'm not sure why: 1) we're adding bias twice in RPDF 2) some dither values are chosen out of 2^n, others out of (2^n)+1 About (2), RPDF is supposed to be simply rolling a 6-face dice twice instead of a 12-face dice once (TPDF), right? I took the liberty to unify it. Let me know if I've done something stupid. GIT COMMIT MESSAGE ------------------ Since range size is always 2^n, we can simply use modulo (implemented with a bitmask). The previous implementation used 64-bit integer division, which is done in software on ARMv7. Although the divisor was constant, the division could not be transformed into "multiplication by magic number" since the dividend was 64-bit. Further, we fix TPDF's noise range to be the same as RPDF's. Previously, RPDF's noise ranged: { bias - dither, bias + dither } while TPDF's noise ranged: { bias - dither/2, bias + dither/2 - 1 } + { bias - dither/2, bias + dither/2 - 1 } = { 2 * bias - dither, 2 * bias + dither - 2 } Now, both range: { bias - dither, bias + dither - 1 } Removing unused (expensive) gst_fast_random_(u)int32_range functions.
In the message above, I swapped TPDF and RPDF -- should've read: I'm not sure why: 1) we're adding bias twice in TPDF 2) some dither values are chosen out of 2^n, others out of (2^n)+1 About (2), TPDF is supposed to be simply rolling a 6-face dice twice instead of a 12-face dice once (RPDF), right?
How does this perform on 64-bit arch compared to the old code? Ie on armv8/amd64? If it's slower, it may make sense to have some kind of #ifdef.
On armv7, gst_audio_quantize_quantize_signed_tpdf_none is down to 0.5% from 23%. I've just tested on arm64 -- gst_audio_quantize_quantize_signed_tpdf_none is down to 0.5% from 1.0%. I'd be surprised if shift + bitwise-and will be slower than integer division, on any architecture. What I'm more concerned about is correctness. Please have a look, and I'm also waiting for Sebastian to have a look since he wrote this (back in 2007...).
Seems I need to do some archaeology and research :) I'm pretty sure the values used there are correct, I did some experiments with measuring the quantization noise with the different dither methods and different parameters there.
Review of attachment 300167 [details] [review]: ::: gst/audioconvert/gstaudioquantize.c @@ +173,3 @@ #define ADD_DITHER_TPDF_I() \ + rand = bias + RANDOM_INT_DITHER(dither) \ + + RANDOM_INT_DITHER(dither); \ This is wrong now, see INIT_DITHER_TPDF_I(). It was halving the bias, and then adding it here twice. It's still halving now but you only add the bias once now. Just change it in INIT_DITHER_TPDF_I(). @@ -169,3 @@ #define ADD_DITHER_TPDF_I() \ - rand = gst_fast_random_int32_range (bias - dither, \ - bias + dither - 1) \ The -1 here is very curious, I don't know yet where it comes from. It doesn't seem to make any sense. @@ +198,1 @@ rand = tmp_rand - last_random[chan_pos]; \ Note that here you have bias twice again. Once in tmp_rand, once in last_random[chan_pos]. Maybe just add the bias once, and not have it in last_random[chan_pos]
The -1 is bogus AFAICS, there's no reason for it to be there. Patch looks good except for the things mentioned above :)
Created attachment 300205 [details] [review] Patch - audioconvert: Avoid int division in quantization Changes: * Get rid of half-bias * Don't discard bias in TPDF_HF * Correct git commit message about TPDF having double-bias
Created attachment 300207 [details] [review] Patch - Eliminate unsigned quantizers
commit 3dc3aa4e3b47341c199c6e7e2e8974b340031288 Author: Ilya Konstantinov <ilya.konstantinov@gmail.com> Date: Tue Mar 24 17:28:51 2015 +0200 audioconvert: Eliminate unsigned quantizers audio_convert_convert unpacks to default format (signed) before calling quantize, and the unsigned variants were equivalent to signed anyway, so we just get rid of them. commit 7b398701cf003d4346b4f58a4597d4bafe78c839 Author: Ilya Konstantinov <ilya.konstantinov@gmail.com> Date: Tue Mar 24 03:01:22 2015 +0200 audioconvert: Avoid int division in quantization Since range size is always 2^n, we can simply use modulo (implemented with a bitmask). The previous implementation used 64-bit integer division, which is done in software on ARMv7. Although the divisor was constant, the division could not be transformed into "multiplication by magic number" since the dividend was 64-bit. The now-unused and not-so-fast gst_fast_random_(u)int32_range functions were removed. Also, implementing bug fixes: 1) ADD_DITHER_TPDF_HF_I no longer discards bias. 2) We change TPDF's noise range to be the same as RPDF's. Previously, RPDF's noise ranged: { bias - dither, bias + dither } while TPDF's noise ranged: { bias/2 - dither/2, bias/2 + dither/2 - 1 } + { bias/2 - dither/2, bias/2 + dither/2 - 1 } = { bias - dither, bias + dither - 2 } Now, both range: { bias - dither, bias + dither - 1 } https://bugzilla.gnome.org/show_bug.cgi?id=746661