GNOME Bugzilla – Bug 63621
Add GAtomicInt for atomic integer operations
Last modified: 2011-02-18 16:07:08 UTC
See http://moniker.gnome.org/archives/gtk-devel-list/2001-November/msg00044.html
The following patch implements atomic ints. It has fast implementations for the following architectures: i386/powerpc/sparc/alpha and is tested on all of them (mostly with linux) on compilefarm.sf.net. Q: So why doesn't the patch use inline functions in headers to implement atomic int instead of the extern functions? A: Only gcc understands the used asm syntax and thus we would risk having different libs linked together, where some use the asm versions and other the fallback implementation (because some are compiled with gcc and others aren't) and mixing them of course is unsafe. Q: Why is there no special version for sparc v9, which can handle atomic operation much better than older sparcs? A: Most environments doesn't seem to be able to assemble v9 code on solaris and linux, even if the processor itself is v9. Q: Why are all implementations in one file instead of being scattered in arch-specific dirs or files as for glibc or other projects? A: It just fits better into the GLib struture. While we have arch-specific files (threads, exec), I think this setup is better than having 5 nearlty empty files, but that's just taste. Q: How fast are asm atomic int compared to the generic implementation. A: I'll also attach a speed test program.
Created attachment 7529 [details] [review] Patch to implement GAtomic for GLib
Created attachment 7530 [details] gatomic-speed-test.c: Test Program for speed of GAtomic
Looking this over, my main concern would be over the return values; all of the functions currently return TRUE if the result of the operation over 0. * Is having a return value at all for the operations too general? When we want to do an atomic_inc, we don't need a return value at all, typically. Is there a performance penalty for having the return value on some architectures? * Is having the return value be whether the result is 0 or not too specific? Are there cases where something else is needed? Linux-2.5 has: atomic_add [ no return, 63 ] atomic_sub [ no return, 46 ] atomic_sub_and_test [ TRUE if result is 0, 1 ] atomic_inc [ no return, 693 ] atomic_dec [ no return, 263 ] atomic_dec_and_test [ TRUE if result is 0, 143 ] atomic_inc_and_test [ TRUE if result is 0, 0 ] atomic_add_negative [ TRUE if result is < 0, 18 ] Where the numbers are the total number of uses in the kernel source tree. (all 18 uses of add_negative are in the kernel's semaphore implementation; this is the only use of a test for something other than 0.) I think having the return value be the new value (or possibly the old value) is more natural, but if dec-and-test semantics are significantly more expensive, then they probably make sense. Aside from the question of "what is needed", probably the other interesting question is: what is the performance difference between - dec, no test - dec, test for 0 - dec, return old value On various platforms.
If sub/inc/dec are just macros, I'm not sure I'd bother with them. When I saw them in the sample code I assumed they existed because they could be optimized in some way vs. just adding.
Returning the new/old value is expensive on some platforms but the dec_and_test_zero stuff is rather important to fast refcounting. The code needs work however. The spins are not fair and dont work in the presence of real time threads. They should probably be buzz locks. Also you'd want to check the rules on page faults for various platforms, not all get faulted atomic ops atomic. Doesn't matter to a kernel does matter to a user app.
OK, expanding out Alan's comments some, based on IRC conversation: - It makes sense to have atomic_inc()/atomic_dec_and_test() operations with those semantics since they can be a lot faster than the generic operations, and are needed for refcounting. - Buzz locks are apparently locks which spin for a while, then fall back to a heavy weight lock, to avoid a) deadlock issues with priorities b) spinning for long periods of time if the process holding the lock gets swapped out. Supposedly they are used within Mozilla, but I don't see that in the code, which just seems to fall back to heavyweight locks in the absence of atomic operations. Maybe it's how the non-native thread stuff in NSPR works? - For the page fault problem, basically, what is being said is that copying the way the kernel does things won't always work. On some platforms the kernel may have an atomic operation, but it is necessary to use a heavyweight lock in user-space.
The problem here is to find a good balance between a useful and consistent interface and the speed (which of course here is most important, because its the only reason to implement this interface in the first place). There is the possibility to only implement, what we really need, i.e. 'inc' and 'dec_and_test' (and not call the thing GAtomicInt then, but GRefCount or something). Or we implement a super-generic GAtomicInt, which would suck speed-wise on at least i386. I chose a way in between originally, but in hindsight the first solution would have been best. So what about We implement GRefCount with the operations g_ref_count_inc and g_ref_count_dec_and_test? -------------------------------------------------------------------- Now for the implementation side: I checked glibc. They are 1. userspace and 2. LGPL and thus should be safe to use. It contains sysdeps/.../atomicity.h files, which could be used as starting point. They all define 'exchange_and_add' and 'atomic_add' (for many platforms, alas except i386). So we could have: struct _GRefCount { volatile guint32 value; }; void g_ref_count_inc (GRefCount *refcount) { atomic_add (refcount->value, 1); } gboolean g_ref_count_dec_and_test (GRefCount *refcount) { return exchange_and_add (refcount->value, -1) == 1; } For i386 we would have to see, whether the implementation __asm__ __volatile__ ("lock; addl %2,%0; sete %1" :"=m" (atomic->value), "=qm" (retval) :"ir" (add), "m" (atomic->value) : "memory"); is save to use. Alan? [ There also is the function compare_and_swap declared in atomicity.h, which is _very_ useful for all kind of multithreading stuff. We might as well export this...... Why doesn't libc? ] [ Downside of all this is of course, that only gcc users get the better performance (or is this a plus? ;-) ]
I've been seeing comments about dumping the generic functions in favor of the refcounting-oriented functions. I don't completely disagree. However, I would be most interested in the following two functions: gboolean g_test_and_set (gboolean *flag); gboolean g_test_and_clear (gboolean *flag); g_test_and_set would atomically get the value of flag and set it to TRUE. The same for g_test_and_clear, except that it would set the value to FALSE. These two functions are very simple but also very useful.
New patch attached: The interface now looks that way: gint32 g_atomic_exchange_and_add (GAtomic *atomic, gint32 val); void g_atomic_add_fallback (GAtomic *atomic, gint32 val); gboolean g_atomic_compare_and_swap (GAtomic *atomic, gint32 oldval, gint32 newval); void g_atomic_set (GAtomic *atomic, gint32 val); gint32 g_atomic_get (GAtomic *atomic); void g_atomic_inc (GAtomic *atomic); gboolean g_atomic_dec_and_test (GAtomic *atomic); I took the 'exchange_and_add' and the 'add' functions from libc. They really form a good, though not very convenient interface. 'compare_and_swap' follows without cost from libc as well. It isn't strictly necessary, but nice to have. 'inc' and 'dec_and_test' are convenient wrappers to be used for reference counting. For the implementation I used code snippets from the libc this time. It's LGPL and userspace and thus save to use. Until now I only implemented the fast assembler routines for i486 and sparc64, but we can without problems add other archs later. [Actually it seems there is no save way to implement this without locking on i386, but on anything better than i486 it works.] One very important rule is, that all part of a program must use either the fast assembler or the slow locking functions, a mixture is unsafe. To ensure this, I determine at configure-time, whether to use the fast assembler versions. If they are found to be available, they are also used as inline functions in the gatomic.h header for programs using GLib. Should e.g. GLib be compiled with gcc on i486, then the fast version will be used. If now another compiler is used to compile a program using GLib, then this program will not get the inlined versions, but it will use the fallback implementations, which in turn will use the assembler version, so we are on the save side. When glib isn't compiled with gcc, then the slow locking versions are always used, even if later programs using GLib are compiled with gcc. Of course other compilers can be added to that list, maybe the sun people will add assembler code for their platform/compiler. Regarding the timing: Using inlined assembler versions of g_atomic_inc and g_atomic_dec_test is _ten times_ as fast as using the locking versions on a pentium, so it really pays using them.
Created attachment 14993 [details] [review] new patch for GAtomic
Note, that to see the effect on i386. You have to call ./configure --build=i486-linux I'll also attach a new version of the speed test program. The results so far i486: (my athlon 1.4 GHz) glib : 2.618763 sec fallback : 33.455265 sec alpha: (sourceforge compilefarm, COMPAQ AlphaServer DS20E 666 MHz) glib : 35.722361 sec fallback : 153.221434 sec [ alpha is not yet in my patch above, but I implemented it since then locally ]
Created attachment 15014 [details] new speed test program
A new version of the patch. Now includes specialcase implementation for undefined G_THREADS_ENABLED. Also alpha code is includes now. sparc64 now test in configure, whether the v9 instructions really work. Some compilers, most notably gcc <= 3.0 can't compile that. For sparc64 the speed values are as follows (test on TI UltraSparc II (BlackBird) on compilefarm.soureforge.net) glib : 32.938069 sec fallback : 234.396596 sec
Created attachment 15017 [details] [review] yet another patch for GAtomic
Adding myself to keep in touch for GStreamer and make our atomic stuff be easily transferable once GAtmoic hits the streets. The currently used general API is a much better choice for GStreamer than the refcount only API as it should use atomic ints quite a bit more general.
Benjamin, can we expect more feedback regarding the atomic needs and concerns of GStreamer ? Or are you fine with Sebastians latest patch ?
We (mainly Owen and GStreamer's Wim Taymans) had a discussion about this at GUADEC. The only atomic stuff that GStreamer would need and can't easily do with this API are atomic list operations. But they are outside the scope of an atomic int I think. So I would say the current implementation is fine for most of what GStreamer does and the rest is so special that it's probably better to implement in GStreamer (and maybe add it to glib later if the need arises).
Hmm, it looks like you've based the assembly implementations on obsolete parts of glibc. glibc now has atomic_exchange_and_add() defined for almost every platform, in terms of a 'atomic_compare_and_exchange_bool_acq' primitive when it's not defined natively. include/atomic.h seems to be the current glibc internal header file for atomic operations. It has a large number of different operations.... inside glibc, ignoring NPTL, the operations used seem to be: atomic_exchange_and_add atomic_increment atomic_decrement atomic_compare_and_exchange_bool_acq atomic_decrement_and_test One difference from what's implemented here is that most of the macros use GCC-specific magic to work on all widths - the compare-and-exchange operation is most useful when done on pointer types. I suspect the public interfaces in the last patch are OK for a start. I'd really like to see broader coverage for architecture coverage before we put this in GLib ... compare-and-exchange-based fallback implementations might reduce the pain of that. One thing we have to be careful about is the "small functions and macros of 10 lines or less" restriction in the LGPL. Who knows *exactly* what it means, but I think copying more than 10 line macros from glibc and putting them in our public headers probably should be avoided.
> One difference from what's implemented here is that most > of the macros use GCC-specific magic to work on all > widths - the compare-and-exchange operation is most useful > when done on pointer types. So would you like to have the full coverage for all bit-sizes or just extra functions for gint32 and gpointer? I don't think the glibc magic of determining the operand size at compiler time is in the spirit of GLib > I suspect the public interfaces in the last patch are > OK for a start. I'd really like to see broader coverage > for architecture coverage before we put this in GLib ... > compare-and-exchange-based fallback implementations might > reduce the pain of that. I'll have a look. > One thing we have to be careful about is the "small functions > and macros of 10 lines or less" restriction in the LGPL. > Who knows *exactly* what it means, but I think copying > more than 10 line macros from glibc and putting them in > our public headers probably should be avoided. Where is the problem? glibc and GLib both are LGPL. So copying should be legally OK. IANAL though.
- I don't think the glibc magic compile time width handling is "not in the spirit of GLib", but it certainly isn't *possible* with GLib, since we are compiler-neutral. My comment was basically that a "GAtomicInt" specific compare-and-exchange wouldn't be that useful. - The point is not using more-than-10-lines macros inside of GLib, but putting them in the GLib public headers; if we write such a macro ourselves, we can say it's allowed to use it as a special exception to the LGPL, but we can't do that if we borrow code from glibc.
Tim mentions wanting to change gobject refcounting to be atomic. If we do that, then typedef struct _GAtomic GAtomic; struct _GAtomic { volatile gint32 value; }; Is a problem, because I bet there are at least a few g_assert (object->ref_count > 0) bits of code out there. In order to preserve source and binary compatibilty, we'd really need a 32-bit or sizeof(int) (either is compatible enough) integral type.
New patch. This time with a changed interface. There's no longer an extra type GAtomic, we work directly on gint32 or gpointer resp. That of course makes it a bit easier to abuse by just setting atomic vars, which is not safe in general, but all in all that is outweight by better backward compatibility. The interface: gint32 g_atomic_int_exchange_and_add (gint32 *atomic, gint32 val); void g_atomic_int_add (gint32 *atomic, gint32 val); gboolean g_atomic_int_compare_and_swap (gint32 *atomic, gint32 oldval, gint32 newval); gboolean g_atomic_pointer_compare_and_swap (gpointer *atomic, gpointer oldval, gpointer newval); void g_atomic_int_set (gint32 *atomic, gint32 val); void g_atomic_pointer_set (gpointer *atomic, gpointer val); exchange_and_add and add are not implemented for pointers, as that would be pointless in my eyes. What do you think? There's only one inline implementation (i486) right now, until we decide on the interface. Then it should be quite straightforward to add other architectures. Do we need g_atomic_int_get and g_atomic_pointer_get? I don't think so, as reading an 32-bit-integer and a pointer should be an atomic operation on all platforms, GLib runs on. The function names are a bit long now, we might better use g_atomic_int_cmp_and_xchg instead of g_atomic_int_compare_and_swap A new speed test program is also attached. As for the copying from libc: I have to idea, whether it is safe to copy those 3-liners. Do you think, I should simply ask for permission on the libc list?
Created attachment 20596 [details] [review] yet another patch for atomic ints
Created attachment 20597 [details] new speed test program
PING? Any chance of this getting in. What is blocking here? Implementaions for other archs than i386? That should however not prevent from getting the API is before freeze.
I think the blocking issue for the freeze is convincing ourselves that it is *possible* to implement this API in reasonably fast manner on the important platforms, then it's OK. Important platforms, in my opinion: i386 Sparc (because it's slow to begin with...) ia64 x86_64 ppc32/64 (because of OS X more than the IBM stuff) Everything else is pretty much marginal.
Regarding the feasibility of compare_and_swap for 32 (and 64 bit for platforms with 64bit pointers). (All other proposed functions could be implemented by means of compare_and_swap): i386 - no way without locking i486 and upwards - no problem for 32 bit sparc - no way without locking sparcv9 - no problem for 32 bit (I don't think, there are still pre-v9 sparcs out there, sparc-v9 shipped beginning 1996) sparc64 - no problem for 32 bit and 64 bit ia64 - no problem for 32 bit and 64 bit. (seems to be a compiler builtin for gcc: __sync_bool_compare_and_swap_di and __sync_bool_compare_and_swap_si) I have no access to a test platform though. x86_64 - no problem for 32 bit (cmpxchgl) and 64 bit (cmpxchgq) ppc32/64 - no problem for 32 bit (stwcx) and 64 bit (stdcx) So far native support for the following platforms is implemented: sparc-v9 (sparc-solaris1.sourceforge.net): ------------------------------------------ glib : 8.969010 sec fallback : 48.226671 sec sparc64: ----------- No access to test platform, which can compile and link sparc64 stuff. It should work however, its quite similar to sparc32. We might ask the sun people to try it. alpha (usf-cf-alpha-linux-1.sourceforge.net): --------------------------------------------- glib : 22.489575 sec fallback : 76.471542 sec i486 (athlon; my computer; 1 GHz): ---------------------------------- glib : 3.938015 sec fallback : 23.864003 sec ia64: ----- Not done yet. No access to test platform. x86_64 (amd64-linux1.sourceforge.net): -------------------------------------- glib : 2.207783 sec fallback : 12.074824 sec ppc32/64 --------- Not done yet. No access to test platform. The new patch is attached. Food for thought: exchange_and_add does not exchange at all. It reads, in GLib's terms: it gets. So get_and_add would be a more appropriate name, while of course if might confuse people which are used to the widely adopted exchange_and_add terminology. Also compare_and_swap would be a better name than compare_and_exchange (shorter and according to google more widely used)
Created attachment 24061 [details] [review] yet another patch for atomic ints
I can test on the missing platforms, probably today or tomorrow. Personally, I'd like to see inc, dec, and dec_and_test in the API, since those make it a lot clearer for refcounting, as well as a possible slight efficiency gain.
It would be nice if you could test sparc64. Run tests/atomic-test and post the numbers of gatomic-speed-test, as attached here. ppc32/64 and ia64 aren't ready yet. But I'll try to update that tomorrow.
The new status (BTW, the time values are output by gatomic-speed-test, as attached above, and show how much faster the new glib refcounting primitives are compared to the fallback with locking) The g_atomic functions are now implemented for all major platform, they are however untested on the last three (ia64, sparc64, powerpc64) I would lile anyone with access to one of those platforms to give it a try. First run ./configure and look, whether the right platform is detected (grep for 'inline assembler') and then compile GLib. Then compile and run tests/atomic-test and finally compile and run the gatomic-speed-test program attched above and post the results. Thanks. sparc-v9 (sparc-solaris1.sourceforge.net): ------------------------------------------ glib : 8.969010 sec fallback : 48.226671 sec alpha (usf-cf-alpha-linux-1.sourceforge.net): --------------------------------------------- glib : 22.489575 sec fallback : 76.471542 sec i486 (Athlon GHz; my computer): ------------------------------- glib : 3.938015 sec fallback : 23.864003 sec x86_64 (amd64-linux1.sourceforge.net): -------------------------------------- glib : 2.207783 sec fallback : 12.074824 sec powerpc32 (ppc-osx2.cf.sourceforge.net): ---------------------------------------- glib : 4.271049 sec fallback : 41.643611 sec ia64: ----- Implemented. No access to test platform. sparc64: -------- Implemented. No access to test platform. powerpc64: ---------- Implemented. No access to test platform.
Created attachment 24091 [details] [review] yet another patch for atomic ints
Thanks for doing this work Sebastian, this is looking very good. I can test on ppc64 and and ia64. One question that is still in my mind is "how much good is inlining doing here" ... as someone pointed out earlier, putting all the assembly inline does have some disadvantages: * We can't bug-fix / improve the assembly and have apps take advantage of that. * We run into the > 10-line inline function licensing issue. I've tried modifying the speed-test program to also time trivial wrapper functions around the inline versions, I get: glib : 4.376484 sec function : 7.325000 sec fallback : 23.984717 sec Which doesn't really answer the question definitively... it's slower but not a lot slower.
Created attachment 24108 [details] [review] Speed test with function wrapper
Tested OK on ppc64: ppc64: (RS64-IV, 668mhz) glib : 1.961480 sec function : 3.328593 sec fallback : 31.004492 sec On ia64, compilation gives the warnings: In file included from glib.h:33, from ghash.c:33: ../glib/gatomic.h: In function `g_atomic_int_exchange_and_add': ../glib/gatomic.h:453: warning: passing arg 1 of `__sync_fetch_and_add_di' from incompatible pointer type ../glib/gatomic.h: In function `g_atomic_int_add': ../glib/gatomic.h:460: warning: passing arg 1 of `__sync_fetch_and_add_di' from incompatible pointer type ../glib/gatomic.h: In function `g_atomic_pointer_compare_and_swap': ../glib/gatomic.h:476: warning: passing arg 1 of `__sync_bool_compare_and_swap_di' from incompatible pointer type ../glib/gatomic.h:476: warning: passing arg 2 of `__sync_bool_compare_and_swap_di' makes integer from pointer without a cast ../glib/gatomic.h:476: warning: passing arg 3 of `__sync_bool_compare_and_swap_di' makes integer from pointer without a cast The functions have the signatures: extern long __sync_fetch_and_add_di (long *, long); extern int __sync_bool_compare_and_swap_di (long *, long, long); Using __sync_fetch_and_add_si() instead and inserting casts: return __sync_bool_compare_and_swap_di ((long *)atomic, (long)oldval, (long)newval); fixes the warnings. The tests then pass, and the timings are (Itanium 2, 1100mhz) glib : 5.493671 sec function : 5.859069 sec fallback : 25.820156 sec
Sparc64, Debian unstable: glib : 13.522815 sec function : 12.172104 sec fallback : 111.402083 sec With debian's gcc __sparcv8 is *never* defined, but __sparc_v9__ is, so the check in gatomic.h needs to be changed. Which compiler does the sourceforge machine have?
wilhelmi@sparc-solaris2:~$ gcc -v Reading specs from /usr/local/lib/gcc-lib/sparc-sun-solaris2.9/3.3.2/specs Configured with: ../configure --with-as=/usr/ccs/bin/as --with-ld=/usr/ccs/bin/ld --disable-nls Thread model: posix gcc version 3.3.2 wilhelmi@sparc-solaris2:~$ echo "" >test.c wilhelmi@sparc-solaris2:~$ gcc -dM -E test.c >flags wilhelmi@sparc-solaris2:~$ gcc -mcpu=v9 -dM -E test.c >flags.v9 wilhelmi@sparc-solaris2:~$ diff -u flags flags.v9 --- flags Mon Feb 9 01:18:52 2004 +++ flags.v9 Mon Feb 9 01:18:55 2004 @@ -63,5 +63,6 @@ #define __REGISTER_PREFIX__ #define __LDBL_DIG__ 33 #define __NO_INLINE__ 1 +#define __sparcv8 1 #define __FLT_MANT_DIG__ 24 #define __VERSION__ "3.3.2" So it seems, only the gcc on sparc does it wrong. Maybe Debian patched their gcc up. In any case we should test for and accept both.
Debian didn't patch their gcc. Looks like gcc has one set of definitions for solaris, and another set for all other sparc. My interpretation of gcc's definitions: Solaris 32-bit, -mcpu=v9: -D__sparcv8 Solaris 64-bit: -D__sparcv9 Non-Solaris, 32 & 64-bit: -D__sparc_v9__
So defined(__sparcv8) || defined(__sparcv9) || defined(__sparc_v9__) should be our test. __sparcv8 is only defined in the Solaris definition, so non-Solaris platforms should be safe from that screwup. My guess from looking at the gcc code is that the Sun toolchain expects __sparcv9 to imply 64-bit as well, so they couldn't safely define __sparcv9 in 32-bit mode.
Thanks for checking that. I've updated my patch accordingly.
I've attached a new patch, this time complete with docs. Changes: I've removed the _set functions. They are not needed. While it isn't save to set an int/pointer, when it is concurrently accesses by other threads, it is rarely needed and can be achieved by using g_atomic_(int|pointer)_compare_and_swap. I've renamed g_atomic_int_exchange_and_add to g_atomic_int_get_and_add. It's shorter and adheres more to the GLib naming scheme and is more correct (Nothing is exchanged here). The changes for both ia64 and sparc64 as proposed by Manish and Owen are incorporated. As for the function <-> inline function argument, I think, we should leave it as inline functions. Performance is the only reason to add these functions. So I would use the faster version. Of course I see the license implication. But I'm pretty sure, we would get permission to use the snippets as LGPL header stuff.
Created attachment 24520 [details] [review] newest version of g_atomic patch
New version. Added g_atomic_int_get g_atomic_pointer_get which also act as a memory barrier. This makes possible gpointer list; do { head = g_atomic_pointer_get (&list); new_head = head->next; } while (!g_atomic_pointer_compare_and_swap (&list, head, new_head)); to quickly pop the first element off a list.
Created attachment 24611 [details] [review] newest version of g_atomic patch
Because of GPE (gpe.handhelds.org), ARM could be an important platform too.
Created attachment 24693 [details] [review] patch \infty
This last patch renames the functions as Owen wanted. The interface now looks like that: gint32 g_atomic_int_get (gint32 *atomic); void g_atomic_int_add (gint32 *atomic, gint32 val); gint32 g_atomic_int_exchange_and_add (gint32 *atomic, gint32 val); gboolean g_atomic_int_compare_and_exchange (gint32 *atomic, gint32 oldval, gint32 newval); gpointer g_atomic_pointer_get (gpointer *atomic); gboolean g_atomic_pointer_compare_and_exchange (gpointer *atomic, gpointer oldval, gpointer newval); void g_atomic_int_inc (gint32 *atomic); gboolean g_atomic_int_dec_and_test (gint32 *atomic);
2004-02-26 Sebastian Wilhelmi <seppi@seppi.de> * glib/gatomic.c, glib/gatomic.h: New files to implement atomic operations for different platforms. Fixes bug #63621. * glib/glib.h: Include gatomic.h. * configure.in: Add test for assembler routines for atomic operations. * glib/Makefile.am: Add gatomic.c, gatomic.h. * tests/Makefile.am, tests/atomic-test.c: Unit test for atomic operations. * glib/glib-overrides.txt, glib/glib-sections.txt, glib/glib-docs.sgml, glib/tmpl/atomic_operations.sgml: Add docs for atomic operations.
According to Ulrich Drepper we should better replace the inline assembler functions by normal functions in the .c-files. Also it turned out, that we better use gint instead of gint32 (they are most often the same anyway) to be able to replace reference counting for ABI-fixed structures as well. I will shortly attache a fist version of that.
Created attachment 24929 [details] [review] patch \infty + 1
Some bugs fixed and tested on the following platforms: sparc-v9 (sparc-solaris1.sourceforge.net): ------------------------------------------ glib : 31.482255 sec fallback : 44.756862 sec alpha (usf-cf-alpha-linux-1.sourceforge.net): --------------------------------------------- glib : 26.235578 sec fallback : 76.845411 sec i486 (Athlon GHz; my computer): ------------------------------- glib : 6.249439 sec fallback : 34.432714 sec x86_64 (amd64-linux1.sourceforge.net): -------------------------------------- glib : 2.695916 sec fallback : 12.063377 sec Other platforms not tested yet. sparc-v9 has really become slower. The rest is OK. As we are not using inline functions any more I had to do some macro magic, when the _add_ functions are defined in terms of cmp_xchg. Otherwise on sparc glib would have been slower than fallback ;-)
Created attachment 24934 [details] [review] patch \infty + 2
2004-02-29 Sebastian Wilhelmi <seppi@seppi.de> * configure.in, glib/gatomic.c, glib/gatomic.h: Moved the assembler functions from gatomic.h to gatomic.c, which makes for better maintainability. Also use gint instead of gint32 to be able to use reference counting for ABI-fixed structures with gint/guint. * glib/gthread.h: Adapted accordingly. * tests/atomic-test.c: Updated to test for G_MAXINT and G_MININT.
Sebastien asked my to try this on ppc64: ppc64: (RS64-IV, 668mhz) glib : 5.707164 sec function : 8.714315 sec fallback : 34.817229 sec