Bug 772864 – audioconvert: mask calculation optimization

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 772864 - audioconvert: mask calculation optimization


Summary:	audioconvert: mask calculation optimization


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-base
Version:	1.8.3
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	1.11.1
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-10-13 14:21 UTC by Petr Kulhavy
Modified:	2016-11-01 18:02 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Proposed fix (2.28 KB, patch) 2016-10-14 20:43 UTC, Petr Kulhavy	committed	Details \| Review

Description Petr Kulhavy 2016-10-13 14:21:53 UTC

The n_bits_set() function in gstaudioconvert.c, i.e. the "popcount", uses the worst algorithm available, with complexity of O(n).

There are better algorithms available, see: https://en.wikipedia.org/wiki/Hamming_weight
I'd opt for the popcount_3 or popcount_4 algorithms mentioned at Wikipedia.

GCC also has the __builtin_popcountl() function, which translates to a CPU instruction if the underlying hardware supports it.

Comment 1 Petr Kulhavy 2016-10-13 14:40:41 UTC

Just to make clear why this is an issue:
In a common case with a stereo mask (0x03) it takes about 4k (!) cycles to calculate find_suitable_mask(). This is because n_bits_set() always takes 64 cycles and it is called about 61-times in find_suitable_mask().


A better approach would be something like this:

static guint64 find_suitable_mask (guint64 mask, gint n_chans)
{
  guint64 intersection;
  gint i, n_bits;

  n_bits = n_bits_set (mask)

  g_assert (n_bits >= n_chans);

  intersection = mask;
  for (i = 1 << 63; n_bits > n_chans && i; i >>= 1);
  {
    n_bits -= !!(intersection & i);
    intersection = intersection & (~i);
  }

  if (i)
    return intersection;
  return 0;
}

Comment 2 Tim-Philipp Müller 2016-10-13 14:44:16 UTC

Please feel free to propose a patch, just keep in mind that it's likely not worthwhile to do anything that increases the complexity or maintenance burden here since that function is only ever called during negotiation and the performance impact here is likely pretty much ~0 already in practice anyway, so it seems unlikely to me that there'd be any performance gains here, we just get to feel better about using a more clever algorithm ;)

Comment 3 Tim-Philipp Müller 2016-10-13 14:45:17 UTC

Did this actually come up in a profile for you?

Comment 4 Petr Kulhavy 2016-10-13 14:52:48 UTC

(In reply to Tim-Philipp Müller from comment #2)
> Please feel free to propose a patch, just keep in mind that it's likely not
> worthwhile to do anything that increases the complexity or maintenance
> burden here since that function is only ever called during negotiation and
> the performance impact here is likely pretty much ~0 already in practice
> anyway, so it seems unlikely to me that there'd be any performance gains
> here, we just get to feel better about using a more clever algorithm ;)

See my proposal in comment 1.

I agree with you that it's affecting just the setup-phase, which is not as critical as playback. However it's still worth to reduce the initialization/set-up time in an obvious case like this: 64 loop cycles instead of 4k loop cycles ;-)

Comment 5 Petr Kulhavy 2016-10-13 14:59:50 UTC

(In reply to Tim-Philipp Müller from comment #3)
> Did this actually come up in a profile for you?

No, just by reading the code ;-)

This was just a casual find, in fact I'm looking for something else. Trying to optimize the endian conversion, which in my opinion is taking too much CPU time. So I might file some more enhancement proposals for the audio convert.

Comment 6 Tim-Philipp Müller 2016-10-13 15:12:27 UTC

Great. Please attach patches in git format-patch format then :)

Comment 7 Petr Kulhavy 2016-10-13 15:21:05 UTC

(In reply to Tim-Philipp Müller from comment #6)
> Great. Please attach patches in git format-patch format then :)

Will do! Just regarding the popcount algorithm, can we assume that only few bits in the mask are set? I.e. the number of channels is rather low? Thanks.

Comment 8 Tim-Philipp Müller 2016-10-13 15:36:20 UTC

I think it would be reasonable to optimise for the common case, which is < 8 channels, if that makes a difference.

Comment 9 Petr Kulhavy 2016-10-14 20:43:46 UTC

Created attachment 337741 [details] [review]
Proposed fix

This patch reduces the complexity to the number of set bits in the mask :-)

Comment 10 Sebastian Dröge (slomo) 2016-11-01 18:02:33 UTC

commit ca7e31f80d8b801e95ce019a14f773728c1acca8
Author: Petr Kulhavy <brain@jikos.cz>
Date:   Fri Oct 14 22:31:41 2016 +0200

    audioconvert: optimize mask calculation
    
    find_suitable_mask() had complexity O(n^2) on the number of bits.
    For common case like 2-channel audio the mask was calculated in about 4k loop
    cycles.
    
    Optimize both n_bits_set() and find_suitable_mask() to O(n) where n is the
    number of bits set in the mask.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=772864