Bug 746661 – audioconvert: slow dithering on architectures without 64-bit integer divide (e.g. armv7)

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 746661 - audioconvert: slow dithering on architectures without 64-bit integer divide (e.g. armv7)


Summary:	audioconvert: slow dithering on architectures without 64-bit integer divide (...


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-base
Version:	git master
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	1.5.1
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-03-23 18:28 UTC by Ilya Konstantinov
Modified:	2015-03-24 15:53 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch - audioconvert: Avoid int division in quantization (5.12 KB, patch) 2015-03-24 02:04 UTC, Ilya Konstantinov	none	Details \| Review
Patch - audioconvert: Avoid int division in quantization (5.99 KB, patch) 2015-03-24 15:43 UTC, Ilya Konstantinov	committed	Details \| Review
Patch - Eliminate unsigned quantizers (3.50 KB, patch) 2015-03-24 15:44 UTC, Ilya Konstantinov	committed	Details \| Review

Description Ilya Konstantinov 2015-03-23 18:28:46 UTC

audioconvert's quantization functions (e.g. gst_audio_quantize_quantize_signed_tpdf_none) are slow on architectures that don't have 64-bit integer divide (e.g. ARMv7).

The culprit is gst_fast_random_int32_range:

  static inline gint32
  gst_fast_random_int32_range (gint32 start, gint32 end)
  {
    gint64 tmp = gst_fast_random_uint32 ();

    tmp = (tmp * (end - start)) / G_MAXUINT32 + start;

    return (gint32) tmp;
  }

A 64-bit integer is divided by G_MAXUINT32. On armv7 iOS, this calls library function __divdi3, which is 20 times (!) slower compared to same iOS device running an arm64 build.

According to Sebastian Dröge, since gst_fast_random_int32_range is only used for dither noise generation, we might replace it with whatever works better for that specific purpose.

Comment 1 Ilya Konstantinov 2015-03-23 18:47:47 UTC

According to what I see, in all integer quantizations:
dither = (1<<(scale - 1))
                or
         (1<<(scale)) 
...
[1] gst_fast_random_int32_range (bias - dither, bias + dither)
   or
[2] gst_fast_random_int32_range (bias - dither, bias + dither - 1)

Therefore, (end - start) is always either 2^n or 2^n-1 (for some 'n').
So why can't we just have two functions:

 return start + (gst_fast_random_uint32 () & (end - start - 1)); // for [1]

and

 return start + (gst_fast_random_uint32 () & (end - start)); // for [2]

?

Comment 2 Ilya Konstantinov 2015-03-23 19:07:24 UTC

(For [2], we'll be going out of bounds slightly, so we should do something in that case -- e.g. conditionally subtract.)

Comment 3 Ilya Konstantinov 2015-03-24 02:04:50 UTC

Created attachment 300167 [details] [review]
Patch - audioconvert: Avoid int division in quantization

I'm not sure why:
 1) we're adding bias twice in RPDF
 2) some dither values are chosen out of 2^n, others out of (2^n)+1

About (2), RPDF is supposed to be simply rolling a 6-face dice twice instead of a 12-face dice once (TPDF), right?

I took the liberty to unify it. Let me know if I've done something stupid.

GIT COMMIT MESSAGE
------------------

Since range size is always 2^n, we can simply use modulo (implemented
with a bitmask).

The previous implementation used 64-bit integer division, which is
done in software on ARMv7. Although the divisor was constant, the
division could not be transformed into "multiplication by magic number"
since the dividend was 64-bit.

Further, we fix TPDF's noise range to be the same as RPDF's.
Previously, RPDF's noise ranged:
  { bias - dither, bias + dither }
while TPDF's noise ranged:
  { bias - dither/2, bias + dither/2 - 1 } +
  { bias - dither/2, bias + dither/2 - 1 } =
  { 2 * bias - dither, 2 * bias + dither - 2 }
Now, both range:
  { bias - dither, bias + dither - 1 }

Removing unused (expensive) gst_fast_random_(u)int32_range functions.

Comment 4 Ilya Konstantinov 2015-03-24 02:06:22 UTC

In the message above, I swapped TPDF and RPDF -- should've read:

I'm not sure why:
 1) we're adding bias twice in TPDF
 2) some dither values are chosen out of 2^n, others out of (2^n)+1

About (2), TPDF is supposed to be simply rolling a 6-face dice twice instead of a 12-face dice once (RPDF), right?

Comment 5 Olivier Crête 2015-03-24 02:11:38 UTC

How does this perform on 64-bit arch compared to the old code? Ie on armv8/amd64? If it's slower, it may make sense to have some kind of #ifdef.

Comment 6 Ilya Konstantinov 2015-03-24 02:51:32 UTC

On armv7, gst_audio_quantize_quantize_signed_tpdf_none is down to 0.5% from 23%.

I've just tested on arm64 -- gst_audio_quantize_quantize_signed_tpdf_none is down to 0.5% from 1.0%.

I'd be surprised if shift + bitwise-and will be slower than integer division, on any architecture.

What I'm more concerned about is correctness. Please have a look, and I'm also waiting for Sebastian to have a look since he wrote this (back in 2007...).

Comment 7 Sebastian Dröge (slomo) 2015-03-24 13:35:56 UTC

Seems I need to do some archaeology and research :) I'm pretty sure the values used there are correct, I did some experiments with measuring the quantization noise with the different dither methods and different parameters there.

Comment 8 Sebastian Dröge (slomo) 2015-03-24 14:08:48 UTC

Review of attachment 300167 [details] [review]:

::: gst/audioconvert/gstaudioquantize.c
@@ +173,3 @@
 #define ADD_DITHER_TPDF_I()                                             \
+        rand = bias + RANDOM_INT_DITHER(dither)                         \
+                    + RANDOM_INT_DITHER(dither);                        \

This is wrong now, see INIT_DITHER_TPDF_I(). It was halving the bias, and then adding it here twice. It's still halving now but you only add the bias once now. Just change it in INIT_DITHER_TPDF_I().

@@ -169,3 @@
 #define ADD_DITHER_TPDF_I()                                             \
-        rand = gst_fast_random_int32_range (bias - dither,              \
-                   bias + dither - 1)                                   \

The -1 here is very curious, I don't know yet where it comes from. It doesn't seem to make any sense.

@@ +198,1 @@
         rand = tmp_rand - last_random[chan_pos];                        \

Note that here you have bias twice again. Once in tmp_rand, once in last_random[chan_pos]. Maybe just add the bias once, and not have it in last_random[chan_pos]

Comment 9 Sebastian Dröge (slomo) 2015-03-24 14:26:45 UTC

The -1 is bogus AFAICS, there's no reason for it to be there. Patch looks good except for the things mentioned above :)

Comment 10 Ilya Konstantinov 2015-03-24 15:43:58 UTC

Created attachment 300205 [details] [review]
Patch - audioconvert: Avoid int division in quantization

Changes:

* Get rid of half-bias
* Don't discard bias in TPDF_HF
* Correct git commit message about TPDF having double-bias

Comment 11 Ilya Konstantinov 2015-03-24 15:44:48 UTC

Created attachment 300207 [details] [review]
Patch - Eliminate unsigned quantizers

Comment 12 Sebastian Dröge (slomo) 2015-03-24 15:52:43 UTC

commit 3dc3aa4e3b47341c199c6e7e2e8974b340031288
Author: Ilya Konstantinov <ilya.konstantinov@gmail.com>
Date:   Tue Mar 24 17:28:51 2015 +0200

    audioconvert: Eliminate unsigned quantizers
    
    audio_convert_convert unpacks to default format (signed) before calling
    quantize, and the unsigned variants were equivalent to signed anyway,
    so we just get rid of them.

commit 7b398701cf003d4346b4f58a4597d4bafe78c839
Author: Ilya Konstantinov <ilya.konstantinov@gmail.com>
Date:   Tue Mar 24 03:01:22 2015 +0200

    audioconvert: Avoid int division in quantization
    
    Since range size is always 2^n, we can simply use modulo (implemented
    with a bitmask).
    
    The previous implementation used 64-bit integer division, which is
    done in software on ARMv7. Although the divisor was constant, the
    division could not be transformed into "multiplication by magic number"
    since the dividend was 64-bit.
    
    The now-unused and not-so-fast gst_fast_random_(u)int32_range functions
    were removed.
    
    Also, implementing bug fixes:
    
    1) ADD_DITHER_TPDF_HF_I no longer discards bias.
    
    2) We change TPDF's noise range to be the same as RPDF's. Previously,
    RPDF's noise ranged:
      { bias - dither, bias + dither }
    while TPDF's noise ranged:
      { bias/2 - dither/2, bias/2 + dither/2 - 1 } +
      { bias/2 - dither/2, bias/2 + dither/2 - 1 } =
      { bias - dither, bias + dither - 2 }
    Now, both range:
      { bias - dither, bias + dither - 1 }
    
    https://bugzilla.gnome.org/show_bug.cgi?id=746661