Bug 371670 – Faster g_bit_* operations

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 371670 - Faster g_bit_* operations


Summary:	Faster g_bit_* operations


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2006-11-06 19:19 UTC by Behdad Esfahbod
Modified:	2014-01-01 19:27 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
test (3.00 KB, text/plain) 2006-11-06 20:49 UTC, Behdad Esfahbod		Details
patch (506 bytes, patch) 2007-01-07 17:48 UTC, Behdad Esfahbod	committed	Details \| Review

Description Behdad Esfahbod 2006-11-06 19:19:07 UTC

There are three bit operation functions in glib:

  g_bit_storage
  g_bit_nth_lsf
  g_bit_nth_msf

These are made inlineable, but the implementation is the naive loop.  There are a couple of ways to make some of these faster:

  - Bithacks: http://graphics.stanford.edu/~seander/bithacks.html
    None of them look appealing though, as they all are sizeof(long)-dependent and the fast ones use memory lookups.

  - gcc builtin functions: __builtin_ffsl, __builtin_clzl, __builtin_ctzl.  These use hardware instructions.  Attaching test that uses these for the implementation.

I also suggest adding a g_bit_* function to get the number of ones in a long (__builtin_popcountl).

Comment 1 Behdad Esfahbod 2006-11-06 20:49:51 UTC

Created attachment 76112 [details]
test

This test case includes implementions using gcc builtins, correct naive implementations, and a wrapper around glib's to fix bugs mentioned in bug 371631.

Now I agree that the naive approach /may/ be faster for nth_lsf and nth_msf's amortized time when going over all bits and if the architecture doesn't have a instruction for the builtins (or maybe even if it has?).  But for g_bit_storage the builtin is definitely superior.  gcc -O2 generates this code for it:

builtin_bit_storage:
        movl    4(%esp), %eax
        movl    $1, %edx
        testl   %eax, %eax
        je      .L61
        bsrl    %eax, %eax
        movb    $32, %dl
        xorl    $31, %eax
        subl    %eax, %edx
.L61:
        movl    %edx, %eax
        ret


Anyway, to fix bug 371631 one can pick the naive_* implementations from this test.

Comment 2 Matthias Clasen 2007-01-01 06:51:06 UTC

I think we should pick the naive implementations and maybe add the gcc builtin version for g_bit_storage, if it is better as you say. 

If you commit this, please also add your testcases somewhere in tests/, maybe
as a separate c file.

Comment 3 Behdad Esfahbod 2007-01-03 20:11:26 UTC

Committed to both branches.  A quick Google search made me assume that gcc >= 4.0 is enough to rely on __builtin_clzl.

2007-01-03  Behdad Esfahbod  <behdad@gnome.org>

        * glib/gutils.h: Fix bug in g_bit_nth_lsf (#371631) and use 
        __builtin_clzl for g_bit_storage if available (#371670).

        * tests/Makefile.am:
        * tests/bit-test.c: New test, to test g_bit_* operations against
        naive and builtin implementations.

Comment 4 Daniel Elstner 2007-01-07 08:19:39 UTC

I experimented with this builtin a while ago, because as it happens I also noticed the poor implementation of g_bit_storage().  During my experimentation, I found that the darn 1-complement gcc does after the "bsr" to match the builtin API can be undone by xor-ing the result again with 0x1F, which makes the optimizer remove both xor's.  Then just add 1 and you're done.

  (__builtin_clzl(value) ^ (8 * sizeof(value) - 1)) + 1

This would also avoid the partial register access your compiler produces for some reason.

Comment 5 Behdad Esfahbod 2007-01-07 17:48:49 UTC

Created attachment 79656 [details] [review]
patch

Thanks Daniel.  That indeed makes the generated code very sweet.  Attaching patch.  The patch doesn't change the implementation in the test, for good reason.  Matthias, mind if I reopen?

Comment 6 Matthias Clasen 2007-01-07 18:55:01 UTC

Why would I ? A better implementation is always appreciated

Comment 7 Daniel Elstner 2007-01-07 22:44:51 UTC

Cool.  Also I just verified that the Intel Compiler accepts this builtin, too (though the generated code isn't as sweet).  This matters because it masquerades as GCC.

Comment 8 Behdad Esfahbod 2007-01-08 22:15:58 UTC

Matthias, trunk only?  It's optimization, so doesn't really qualify for branch.

Comment 9 Matthias Clasen 2007-01-09 18:51:05 UTC

Agreed

Comment 10 Behdad Esfahbod 2007-01-09 19:06:48 UTC

2007-01-09  Behdad Esfahbod  <behdad@gnome.org>

        * glib/gutils.h: Use a more optimized g_bit_storage() when gcc is
        available.  (#371670, Daniel Elstner)