GNOME Bugzilla – Bug 371670
Faster g_bit_* operations
Last modified: 2014-01-01 19:27:27 UTC
There are three bit operation functions in glib: g_bit_storage g_bit_nth_lsf g_bit_nth_msf These are made inlineable, but the implementation is the naive loop. There are a couple of ways to make some of these faster: - Bithacks: http://graphics.stanford.edu/~seander/bithacks.html None of them look appealing though, as they all are sizeof(long)-dependent and the fast ones use memory lookups. - gcc builtin functions: __builtin_ffsl, __builtin_clzl, __builtin_ctzl. These use hardware instructions. Attaching test that uses these for the implementation. I also suggest adding a g_bit_* function to get the number of ones in a long (__builtin_popcountl).
Created attachment 76112 [details] test This test case includes implementions using gcc builtins, correct naive implementations, and a wrapper around glib's to fix bugs mentioned in bug 371631. Now I agree that the naive approach /may/ be faster for nth_lsf and nth_msf's amortized time when going over all bits and if the architecture doesn't have a instruction for the builtins (or maybe even if it has?). But for g_bit_storage the builtin is definitely superior. gcc -O2 generates this code for it: builtin_bit_storage: movl 4(%esp), %eax movl $1, %edx testl %eax, %eax je .L61 bsrl %eax, %eax movb $32, %dl xorl $31, %eax subl %eax, %edx .L61: movl %edx, %eax ret Anyway, to fix bug 371631 one can pick the naive_* implementations from this test.
I think we should pick the naive implementations and maybe add the gcc builtin version for g_bit_storage, if it is better as you say. If you commit this, please also add your testcases somewhere in tests/, maybe as a separate c file.
Committed to both branches. A quick Google search made me assume that gcc >= 4.0 is enough to rely on __builtin_clzl. 2007-01-03 Behdad Esfahbod <behdad@gnome.org> * glib/gutils.h: Fix bug in g_bit_nth_lsf (#371631) and use __builtin_clzl for g_bit_storage if available (#371670). * tests/Makefile.am: * tests/bit-test.c: New test, to test g_bit_* operations against naive and builtin implementations.
I experimented with this builtin a while ago, because as it happens I also noticed the poor implementation of g_bit_storage(). During my experimentation, I found that the darn 1-complement gcc does after the "bsr" to match the builtin API can be undone by xor-ing the result again with 0x1F, which makes the optimizer remove both xor's. Then just add 1 and you're done. (__builtin_clzl(value) ^ (8 * sizeof(value) - 1)) + 1 This would also avoid the partial register access your compiler produces for some reason.
Created attachment 79656 [details] [review] patch Thanks Daniel. That indeed makes the generated code very sweet. Attaching patch. The patch doesn't change the implementation in the test, for good reason. Matthias, mind if I reopen?
Why would I ? A better implementation is always appreciated
Cool. Also I just verified that the Intel Compiler accepts this builtin, too (though the generated code isn't as sweet). This matters because it masquerades as GCC.
Matthias, trunk only? It's optimization, so doesn't really qualify for branch.
Agreed
2007-01-09 Behdad Esfahbod <behdad@gnome.org> * glib/gutils.h: Use a more optimized g_bit_storage() when gcc is available. (#371670, Daniel Elstner)