GNOME Bugzilla – Bug 617491
g_once() implementation is inefficient
Last modified: 2011-05-25 15:27:48 UTC
The g_once() implementation from gthread.h reads: gpointer g_once_impl (GOnce *once, GThreadFunc func, gpointer arg); #ifdef G_ATOMIC_OP_MEMORY_BARRIER_NEEDED # define g_once(once, func, arg) g_once_impl ((once), (func), (arg)) #else /* !G_ATOMIC_OP_MEMORY_BARRIER_NEEDED*/ # define g_once(once, func, arg) \ (((once)->status == G_ONCE_STATUS_READY) ? \ (once)->retval : \ g_once_impl ((once), (func), (arg))) #endif /* G_ATOMIC_OP_MEMORY_BARRIER_NEEDED */ That is, if no memory barriers are needed, g_once() uses a simple inline == unguarded == check, but if barriers are needed the slow g_once_impl() function is called. The problem however is how G_ATOMIC_OP_MEMORY_BARRIER_NEEDED is defined in configure.in: while with non-gcc compilers, we switch on the architecture and correctly mark that no barriers are needed on x86, on gcc we do something completely different: if gcc has builtin atomic operations, we define as memory barriers needed. Problem is, even on x86, gcc has builtin atomic operations defined, so we end up taking the slow path no matter what. glib_cv_gcc_has_builtin_atomic_operations=no if test x"$GCC" = xyes; then AC_MSG_CHECKING([whether GCC supports build-in atomic intrinsics]) AC_TRY_LINK([], [int i; __sync_synchronize (); __sync_bool_compare_and_swap (&i, 0, 1); __sync_fetch_and_add (&i, 1); ], [glib_cv_gcc_has_builtin_atomic_operations=yes], [glib_cv_gcc_has_builtin_atomic_operations=no]) AC_MSG_RESULT($glib_cv_gcc_has_builtin_atomic_operations) if test $glib_cv_gcc_has_builtin_atomic_operations = yes; then glib_memory_barrier_needed=yes else case $host_cpu in i386) AC_MSG_RESULT([none]) glib_memory_barrier_needed=no ;; i?86) AC_MSG_RESULT([i486]) AC_DEFINE_UNQUOTED(G_ATOMIC_I486, 1, [i486 atomic implementation]) glib_memory_barrier_needed=no ;; ...
Do you have a patch ?
The bug also affects all g_atomic_* operations. Previously they were macros expanding to nothing (normal access of the var) if barriers are not needed. Now they expand to function calls. I think the correct way to use gcc atomic operations is to use the builtin intrinsic in the g_atomic_* macros, not in a function implementation, such that the compiler can do the right thing.
The configure part is completely wrong as it conditions the processor switch on using gcc! At this point I suggest simply reverting the patch that "fixed" bug 531902.
Created attachment 161161 [details] [review] Proposed fix The patch of bug #531902 had wrong lines that defining glib_memory_barrier_needed. This patch removes those lines and implements g_atomic_(pointer|int)_(get|set) if G_ATOMIC_OP_MEMORY_BARRIER_NEEDED is defined.
Hmm, so two things: 1) I believe we should let the compiler figure out when barriers are needed: put __sync_synchronize() unconditionally, and leave it to the compiler to replace that by a NOP. Don't try to outsmart gcc. You can only lose. 2) The gcc intrinsics should always be called as macros, never as functions. There's just no point.
I agree. Lennart, can you prepare a patch? We still need to know if barriers are needed such that we can do double-checked locking.
Created attachment 188325 [details] [review] a patch for proper use of gcc builtins
Created attachment 188326 [details] [review] slightly faster bitlock g_bitlock_unlock does 2 atomic operations where a single gcc builtin suffices.
Review of attachment 188325 [details] [review]: Please use "git format-patch" style patches; it's much easier to review a patch when I can see the developer rationale for a change. ::: configure.ac @@ +2587,3 @@ main (void) { + /* its not like this actually runs or anything... */ The previous use of "it's" was correct as a contraction in "it is not like..." @@ +3635,2 @@ g_memory_barrier_needed="$glib_memory_barrier_needed" +g_gcc_atomic_ops="$glib_cv_gcc_has_builtin_atomic_operations" There's no AC_SUBST for this...so why are you setting it? ::: glib/gatomic-gcc.c @@ +28,1 @@ { This is OK, but it may be nicer to just do: #undefine g_atomic_int_exchange_and_add Then it's a lot clearer to the reader what's going on. ::: glib/gatomic.h @@ +57,3 @@ gpointer newval); +#if defined(__GNUC__) && defined(G_ATOMIC_OP_USE_GCC_BUILTINS) Nothing actually #defines G_ATOMIC_OP_USE_GCC_BUILTINS that I see.
(In reply to comment #9) > { > + /* its not like this actually runs or anything... */ > > The previous use of "it's" was correct as a contraction in "it is not like..." Its correct, but it throws off syntax highlighting in vi. > @@ +3635,2 @@ > g_memory_barrier_needed="$glib_memory_barrier_needed" > +g_gcc_atomic_ops="$glib_cv_gcc_has_builtin_atomic_operations" > > There's no AC_SUBST for this...so why are you setting it? This is getting funneled into glibconfig.h. It is a little confusing how this works, but it most certainly does. This is entirely analogous to how we treat the 'memory barrier needed' define. > ::: glib/gatomic-gcc.c > @@ +28,1 @@ > { > > This is OK, but it may be nicer to just do: > > #undefine g_atomic_int_exchange_and_add > > Then it's a lot clearer to the reader what's going on. Maybe. I'm just expanding the existing practice in gatomic-gcc.c. > ::: glib/gatomic.h > @@ +57,3 @@ > gpointer newval); > > +#if defined(__GNUC__) && defined(G_ATOMIC_OP_USE_GCC_BUILTINS) > > Nothing actually #defines G_ATOMIC_OP_USE_GCC_BUILTINS that I see. As mentioned above, it ends up in glibconfig.h.
(In reply to comment #10) > (In reply to comment #9) > > > { > > + /* its not like this actually runs or anything... */ > > > > The previous use of "it's" was correct as a contraction in "it is not like..." > > Its correct, but it throws off syntax highlighting in vi. Change it to "it is" then? > > @@ +3635,2 @@ > > g_memory_barrier_needed="$glib_memory_barrier_needed" > > +g_gcc_atomic_ops="$glib_cv_gcc_has_builtin_atomic_operations" > > > > There's no AC_SUBST for this...so why are you setting it? > > This is getting funneled into glibconfig.h. It is a little confusing how this > works, but it most certainly does. This is entirely analogous to how we treat > the 'memory barrier needed' define. Oh I see now, yes.
Review of attachment 188326 [details] [review]: Just use g_atomic_int_dec_and_test()?
(In reply to comment #12) > Review of attachment 188326 [details] [review]: > > Just use g_atomic_int_dec_and_test()? Err, my bad. I meant g_atomic_int_exchange_and_add()
(In reply to comment #13) > (In reply to comment #12) > > Review of attachment 188326 [details] [review] [details]: > > > > Just use g_atomic_int_dec_and_test()? > > Err, my bad. I meant g_atomic_int_exchange_and_add() I guess that would work too
__sync_fetch_and_and is also useful for g_datalist_unset_flags, and __sync_fetch_and_or for g_datalist_set_flags. These are used for things like weak refs and toggle refs, so they are not unimportant.
I've moved 'expand the set of atomic ops' to bug 650823.
ok, I guess I'll close this bug and proceed in the other one.
An unfortunate minor regression was introduced here. The new implementation does this: #define g_atomic_pointer_get(atomic) \ __extension__ ({ __sync_synchronize(); *(atomic); }) but in gthread.c we see some code like this: gboolean g_once_init_enter_impl (volatile gsize *value_location) { gboolean need_init = FALSE; g_mutex_lock (g_once_mutex); if (g_atomic_pointer_get (value_location) == NULL) which means that we end up comparing gsize to NULL. The old macros dealt with this situation by always casting the result to gpointer. Perhaps the new macros should do the same.
Since I'm lazy, can someone just briefly describe at a high level why these things are being optimized? What app had a bottleneck etc?
It was a regression from another patch. Fixed now.
> Since I'm lazy, can someone just briefly describe at a high level why these > things are being optimized? What app had a bottleneck etc? Probably not the original motivation for these patches, but GStreamer for example heavily uses atomic ops, so we're really thankful for these optimisations.