After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 371670 - Faster g_bit_* operations
Faster g_bit_* operations
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other Linux
: Normal enhancement
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2006-11-06 19:19 UTC by Behdad Esfahbod
Modified: 2014-01-01 19:27 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
test (3.00 KB, text/plain)
2006-11-06 20:49 UTC, Behdad Esfahbod
  Details
patch (506 bytes, patch)
2007-01-07 17:48 UTC, Behdad Esfahbod
committed Details | Review

Description Behdad Esfahbod 2006-11-06 19:19:07 UTC
There are three bit operation functions in glib:

  g_bit_storage
  g_bit_nth_lsf
  g_bit_nth_msf

These are made inlineable, but the implementation is the naive loop.  There are a couple of ways to make some of these faster:

  - Bithacks: http://graphics.stanford.edu/~seander/bithacks.html
    None of them look appealing though, as they all are sizeof(long)-dependent and the fast ones use memory lookups.

  - gcc builtin functions: __builtin_ffsl, __builtin_clzl, __builtin_ctzl.  These use hardware instructions.  Attaching test that uses these for the implementation.

I also suggest adding a g_bit_* function to get the number of ones in a long (__builtin_popcountl).
Comment 1 Behdad Esfahbod 2006-11-06 20:49:51 UTC
Created attachment 76112 [details]
test

This test case includes implementions using gcc builtins, correct naive implementations, and a wrapper around glib's to fix bugs mentioned in bug 371631.

Now I agree that the naive approach /may/ be faster for nth_lsf and nth_msf's amortized time when going over all bits and if the architecture doesn't have a instruction for the builtins (or maybe even if it has?).  But for g_bit_storage the builtin is definitely superior.  gcc -O2 generates this code for it:

builtin_bit_storage:
        movl    4(%esp), %eax
        movl    $1, %edx
        testl   %eax, %eax
        je      .L61
        bsrl    %eax, %eax
        movb    $32, %dl
        xorl    $31, %eax
        subl    %eax, %edx
.L61:
        movl    %edx, %eax
        ret


Anyway, to fix bug 371631 one can pick the naive_* implementations from this test.
Comment 2 Matthias Clasen 2007-01-01 06:51:06 UTC
I think we should pick the naive implementations and maybe add the gcc builtin version for g_bit_storage, if it is better as you say. 

If you commit this, please also add your testcases somewhere in tests/, maybe
as a separate c file.
Comment 3 Behdad Esfahbod 2007-01-03 20:11:26 UTC
Committed to both branches.  A quick Google search made me assume that gcc >= 4.0 is enough to rely on __builtin_clzl.

2007-01-03  Behdad Esfahbod  <behdad@gnome.org>

        * glib/gutils.h: Fix bug in g_bit_nth_lsf (#371631) and use 
        __builtin_clzl for g_bit_storage if available (#371670).

        * tests/Makefile.am:
        * tests/bit-test.c: New test, to test g_bit_* operations against
        naive and builtin implementations.

Comment 4 Daniel Elstner 2007-01-07 08:19:39 UTC
I experimented with this builtin a while ago, because as it happens I also noticed the poor implementation of g_bit_storage().  During my experimentation, I found that the darn 1-complement gcc does after the "bsr" to match the builtin API can be undone by xor-ing the result again with 0x1F, which makes the optimizer remove both xor's.  Then just add 1 and you're done.

  (__builtin_clzl(value) ^ (8 * sizeof(value) - 1)) + 1

This would also avoid the partial register access your compiler produces for some reason.
Comment 5 Behdad Esfahbod 2007-01-07 17:48:49 UTC
Created attachment 79656 [details] [review]
patch

Thanks Daniel.  That indeed makes the generated code very sweet.  Attaching patch.  The patch doesn't change the implementation in the test, for good reason.  Matthias, mind if I reopen?
Comment 6 Matthias Clasen 2007-01-07 18:55:01 UTC
Why would I ? A better implementation is always appreciated
Comment 7 Daniel Elstner 2007-01-07 22:44:51 UTC
Cool.  Also I just verified that the Intel Compiler accepts this builtin, too (though the generated code isn't as sweet).  This matters because it masquerades as GCC.
Comment 8 Behdad Esfahbod 2007-01-08 22:15:58 UTC
Matthias, trunk only?  It's optimization, so doesn't really qualify for branch.
Comment 9 Matthias Clasen 2007-01-09 18:51:05 UTC
Agreed
Comment 10 Behdad Esfahbod 2007-01-09 19:06:48 UTC
2007-01-09  Behdad Esfahbod  <behdad@gnome.org>

        * glib/gutils.h: Use a more optimized g_bit_storage() when gcc is
        available.  (#371670, Daniel Elstner)