After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 617491 - g_once() implementation is inefficient
g_once() implementation is inefficient
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gthread
2.28.x
Other Linux
: Normal minor
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2010-05-03 04:32 UTC by Behdad Esfahbod
Modified: 2011-05-25 15:27 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Proposed fix (1.54 KB, patch)
2010-05-16 11:49 UTC, Hiroyuki Ikezoe
none Details | Review
a patch for proper use of gcc builtins (5.27 KB, patch)
2011-05-22 04:35 UTC, Matthias Clasen
committed Details | Review
slightly faster bitlock (657 bytes, patch)
2011-05-22 04:36 UTC, Matthias Clasen
none Details | Review

Description Behdad Esfahbod 2010-05-03 04:32:18 UTC
The g_once() implementation from gthread.h reads:

gpointer g_once_impl (GOnce *once, GThreadFunc func, gpointer arg); 
 
#ifdef G_ATOMIC_OP_MEMORY_BARRIER_NEEDED 
# define g_once(once, func, arg) g_once_impl ((once), (func), (arg)) 
#else /* !G_ATOMIC_OP_MEMORY_BARRIER_NEEDED*/ 
# define g_once(once, func, arg) \ 
  (((once)->status == G_ONCE_STATUS_READY) ? \ 
   (once)->retval : \ 
   g_once_impl ((once), (func), (arg))) 
#endif /* G_ATOMIC_OP_MEMORY_BARRIER_NEEDED */ 


That is, if no memory barriers are needed, g_once() uses a simple inline == unguarded == check, but if barriers are needed the slow g_once_impl() function is called.

The problem however is how G_ATOMIC_OP_MEMORY_BARRIER_NEEDED is defined in configure.in: while with non-gcc compilers, we switch on the architecture and correctly mark that no barriers are needed on x86, on gcc we do something completely different: if gcc has builtin atomic operations, we define as memory barriers needed.  Problem is, even on x86, gcc has builtin atomic operations defined, so we end up taking the slow path no matter what.


glib_cv_gcc_has_builtin_atomic_operations=no
if test x"$GCC" = xyes; then
  AC_MSG_CHECKING([whether GCC supports build-in atomic intrinsics])
  AC_TRY_LINK([],
              [int i;
               __sync_synchronize ();
               __sync_bool_compare_and_swap (&i, 0, 1);
               __sync_fetch_and_add (&i, 1);
              ],
              [glib_cv_gcc_has_builtin_atomic_operations=yes],
              [glib_cv_gcc_has_builtin_atomic_operations=no])

  AC_MSG_RESULT($glib_cv_gcc_has_builtin_atomic_operations)
  if test $glib_cv_gcc_has_builtin_atomic_operations = yes; then
    glib_memory_barrier_needed=yes
  else
    case $host_cpu in
      i386)
        AC_MSG_RESULT([none])
        glib_memory_barrier_needed=no
        ;;
      i?86)
        AC_MSG_RESULT([i486])
        AC_DEFINE_UNQUOTED(G_ATOMIC_I486, 1,
                           [i486 atomic implementation])
        glib_memory_barrier_needed=no
        ;;
...
Comment 1 Matthias Clasen 2010-05-09 01:25:24 UTC
Do you have a patch ?
Comment 2 Behdad Esfahbod 2010-05-11 17:42:23 UTC
The bug also affects all g_atomic_* operations.  Previously they were macros expanding to nothing (normal access of the var) if barriers are not needed.  Now they expand to function calls.

I think the correct way to use gcc atomic operations is to use the builtin intrinsic in the g_atomic_* macros, not in a function implementation, such that the compiler can do the right thing.
Comment 3 Behdad Esfahbod 2010-05-11 17:46:14 UTC
The configure part is completely wrong as it conditions the processor switch on using gcc!  At this point I suggest simply reverting the patch that "fixed" bug 531902.
Comment 4 Hiroyuki Ikezoe 2010-05-16 11:49:07 UTC
Created attachment 161161 [details] [review]
Proposed fix

The patch of bug #531902 had wrong lines that defining glib_memory_barrier_needed.

This patch removes those lines and implements g_atomic_(pointer|int)_(get|set) if G_ATOMIC_OP_MEMORY_BARRIER_NEEDED is defined.
Comment 5 Lennart Poettering 2010-06-16 13:05:24 UTC
Hmm, so two things:

1) I believe we should let the compiler figure out when barriers are needed: put  __sync_synchronize() unconditionally, and leave it to the compiler to replace that by a NOP. Don't try to outsmart gcc. You can only lose.

2) The gcc intrinsics should always be called as macros, never as functions. There's just no point.
Comment 6 Behdad Esfahbod 2010-06-16 20:44:31 UTC
I agree.  Lennart, can you prepare a patch?

We still need to know if barriers are needed such that we can do double-checked locking.
Comment 7 Matthias Clasen 2011-05-22 04:35:19 UTC
Created attachment 188325 [details] [review]
a patch for proper use of gcc builtins
Comment 8 Matthias Clasen 2011-05-22 04:36:43 UTC
Created attachment 188326 [details] [review]
slightly faster bitlock

g_bitlock_unlock does 2 atomic operations where a single gcc builtin suffices.
Comment 9 Colin Walters 2011-05-22 16:29:16 UTC
Review of attachment 188325 [details] [review]:

Please use "git format-patch" style patches; it's much easier to review a patch when I can see the developer rationale for a change.

::: configure.ac
@@ +2587,3 @@
 main (void)
 {
+  /* its not like this actually runs or anything... */

The previous use of "it's" was correct as a contraction in "it is not like..."

@@ +3635,2 @@
 g_memory_barrier_needed="$glib_memory_barrier_needed"
+g_gcc_atomic_ops="$glib_cv_gcc_has_builtin_atomic_operations"

There's no AC_SUBST for this...so why are you setting it?

::: glib/gatomic-gcc.c
@@ +28,1 @@
 {

This is OK, but it may be nicer to just do:

#undefine g_atomic_int_exchange_and_add

Then it's a lot clearer to the reader what's going on.

::: glib/gatomic.h
@@ +57,3 @@
 						gpointer           newval);
 
+#if defined(__GNUC__) && defined(G_ATOMIC_OP_USE_GCC_BUILTINS)

Nothing actually #defines G_ATOMIC_OP_USE_GCC_BUILTINS that I see.
Comment 10 Matthias Clasen 2011-05-22 16:35:07 UTC
(In reply to comment #9)

>  {
> +  /* its not like this actually runs or anything... */
> 
> The previous use of "it's" was correct as a contraction in "it is not like..."

Its correct, but it throws off syntax highlighting in vi.

> @@ +3635,2 @@
>  g_memory_barrier_needed="$glib_memory_barrier_needed"
> +g_gcc_atomic_ops="$glib_cv_gcc_has_builtin_atomic_operations"
> 
> There's no AC_SUBST for this...so why are you setting it?

This is getting funneled into glibconfig.h. It is a little confusing how this works, but it most certainly does. This is entirely analogous to how we treat the 'memory barrier needed' define.

> ::: glib/gatomic-gcc.c
> @@ +28,1 @@
>  {
> 
> This is OK, but it may be nicer to just do:
> 
> #undefine g_atomic_int_exchange_and_add
> 
> Then it's a lot clearer to the reader what's going on.

Maybe. I'm just expanding the existing practice in gatomic-gcc.c.


> ::: glib/gatomic.h
> @@ +57,3 @@
>                          gpointer           newval);
> 
> +#if defined(__GNUC__) && defined(G_ATOMIC_OP_USE_GCC_BUILTINS)
> 
> Nothing actually #defines G_ATOMIC_OP_USE_GCC_BUILTINS that I see.

As mentioned above, it ends up in glibconfig.h.
Comment 11 Colin Walters 2011-05-22 16:53:53 UTC
(In reply to comment #10)
> (In reply to comment #9)
> 
> >  {
> > +  /* its not like this actually runs or anything... */
> > 
> > The previous use of "it's" was correct as a contraction in "it is not like..."
> 
> Its correct, but it throws off syntax highlighting in vi.

Change it to "it is" then?

> > @@ +3635,2 @@
> >  g_memory_barrier_needed="$glib_memory_barrier_needed"
> > +g_gcc_atomic_ops="$glib_cv_gcc_has_builtin_atomic_operations"
> > 
> > There's no AC_SUBST for this...so why are you setting it?
> 
> This is getting funneled into glibconfig.h. It is a little confusing how this
> works, but it most certainly does. This is entirely analogous to how we treat
> the 'memory barrier needed' define.

Oh I see now, yes.
Comment 12 Behdad Esfahbod 2011-05-22 17:25:00 UTC
Review of attachment 188326 [details] [review]:

Just use g_atomic_int_dec_and_test()?
Comment 13 Behdad Esfahbod 2011-05-22 17:25:50 UTC
(In reply to comment #12)
> Review of attachment 188326 [details] [review]:
> 
> Just use g_atomic_int_dec_and_test()?

Err, my bad.  I meant g_atomic_int_exchange_and_add()
Comment 14 Matthias Clasen 2011-05-22 17:49:37 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > Review of attachment 188326 [details] [review] [details]:
> > 
> > Just use g_atomic_int_dec_and_test()?
> 
> Err, my bad.  I meant g_atomic_int_exchange_and_add()

I guess that would work too
Comment 15 Alexander Larsson 2011-05-22 20:23:31 UTC
__sync_fetch_and_and is also useful for g_datalist_unset_flags, and __sync_fetch_and_or for g_datalist_set_flags. These are used for things like weak refs and toggle refs, so they are not unimportant.
Comment 16 Matthias Clasen 2011-05-22 21:43:41 UTC
I've moved 'expand the set of atomic ops' to bug 650823.
Comment 17 Matthias Clasen 2011-05-23 01:43:46 UTC
ok, I guess I'll close this bug and proceed in the other one.
Comment 18 Allison Karlitskaya (desrt) 2011-05-24 05:59:11 UTC
An unfortunate minor regression was introduced here.  The new implementation does this:


#define g_atomic_pointer_get(atomic) \
  __extension__ ({ __sync_synchronize(); *(atomic); })


but in gthread.c we see some code like this:

gboolean
g_once_init_enter_impl (volatile gsize *value_location)
{
  gboolean need_init = FALSE;
  g_mutex_lock (g_once_mutex);
  if (g_atomic_pointer_get (value_location) == NULL)


which means that we end up comparing gsize to NULL.


The old macros dealt with this situation by always casting the result to gpointer.  Perhaps the new macros should do the same.
Comment 19 Colin Walters 2011-05-25 14:46:39 UTC
Since I'm lazy, can someone just briefly describe at a high level why these things are being optimized?  What app had a bottleneck etc?
Comment 20 Behdad Esfahbod 2011-05-25 15:12:35 UTC
It was a regression from another patch.  Fixed now.
Comment 21 Tim-Philipp Müller 2011-05-25 15:27:48 UTC
> Since I'm lazy, can someone just briefly describe at a high level why these
> things are being optimized?  What app had a bottleneck etc?

Probably not the original motivation for these patches, but GStreamer for example heavily uses atomic ops, so we're really thankful for these optimisations.