GNOME Bugzilla – Bug 772864
audioconvert: mask calculation optimization
Last modified: 2016-11-01 18:02:48 UTC
The n_bits_set() function in gstaudioconvert.c, i.e. the "popcount", uses the worst algorithm available, with complexity of O(n). There are better algorithms available, see: https://en.wikipedia.org/wiki/Hamming_weight I'd opt for the popcount_3 or popcount_4 algorithms mentioned at Wikipedia. GCC also has the __builtin_popcountl() function, which translates to a CPU instruction if the underlying hardware supports it.
Just to make clear why this is an issue: In a common case with a stereo mask (0x03) it takes about 4k (!) cycles to calculate find_suitable_mask(). This is because n_bits_set() always takes 64 cycles and it is called about 61-times in find_suitable_mask(). A better approach would be something like this: static guint64 find_suitable_mask (guint64 mask, gint n_chans) { guint64 intersection; gint i, n_bits; n_bits = n_bits_set (mask) g_assert (n_bits >= n_chans); intersection = mask; for (i = 1 << 63; n_bits > n_chans && i; i >>= 1); { n_bits -= !!(intersection & i); intersection = intersection & (~i); } if (i) return intersection; return 0; }
Please feel free to propose a patch, just keep in mind that it's likely not worthwhile to do anything that increases the complexity or maintenance burden here since that function is only ever called during negotiation and the performance impact here is likely pretty much ~0 already in practice anyway, so it seems unlikely to me that there'd be any performance gains here, we just get to feel better about using a more clever algorithm ;)
Did this actually come up in a profile for you?
(In reply to Tim-Philipp Müller from comment #2) > Please feel free to propose a patch, just keep in mind that it's likely not > worthwhile to do anything that increases the complexity or maintenance > burden here since that function is only ever called during negotiation and > the performance impact here is likely pretty much ~0 already in practice > anyway, so it seems unlikely to me that there'd be any performance gains > here, we just get to feel better about using a more clever algorithm ;) See my proposal in comment 1. I agree with you that it's affecting just the setup-phase, which is not as critical as playback. However it's still worth to reduce the initialization/set-up time in an obvious case like this: 64 loop cycles instead of 4k loop cycles ;-)
(In reply to Tim-Philipp Müller from comment #3) > Did this actually come up in a profile for you? No, just by reading the code ;-) This was just a casual find, in fact I'm looking for something else. Trying to optimize the endian conversion, which in my opinion is taking too much CPU time. So I might file some more enhancement proposals for the audio convert.
Great. Please attach patches in git format-patch format then :)
(In reply to Tim-Philipp Müller from comment #6) > Great. Please attach patches in git format-patch format then :) Will do! Just regarding the popcount algorithm, can we assume that only few bits in the mask are set? I.e. the number of channels is rather low? Thanks.
I think it would be reasonable to optimise for the common case, which is < 8 channels, if that makes a difference.
Created attachment 337741 [details] [review] Proposed fix This patch reduces the complexity to the number of set bits in the mask :-)
commit ca7e31f80d8b801e95ce019a14f773728c1acca8 Author: Petr Kulhavy <brain@jikos.cz> Date: Fri Oct 14 22:31:41 2016 +0200 audioconvert: optimize mask calculation find_suitable_mask() had complexity O(n^2) on the number of bits. For common case like 2-channel audio the mask was calculated in about 4k loop cycles. Optimize both n_bits_set() and find_suitable_mask() to O(n) where n is the number of bits set in the mask. https://bugzilla.gnome.org/show_bug.cgi?id=772864