After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 166020 - use GAtomic for refcounting
use GAtomic for refcounting
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gobject
2.25.x
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
: 311496 489682 (view as bug list)
Depends on:
Blocks: 343278 533427 607513
 
 
Reported: 2005-02-02 10:04 UTC by Wim Taymans
Modified: 2012-09-05 09:01 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
proposed patch using GAtomic (67.92 KB, patch)
2005-02-02 10:11 UTC, Wim Taymans
none Details | Review
updated patch without inlining of atomic operations (33.77 KB, patch)
2005-02-03 09:48 UTC, Wim Taymans
none Details | Review
Patch to add inlining (35.49 KB, patch)
2005-02-03 16:08 UTC, Wim Taymans
none Details | Review
add refcounting testcases (13.92 KB, patch)
2005-02-03 18:47 UTC, Wim Taymans
none Details | Review
the patch (14.17 KB, patch)
2005-03-10 05:02 UTC, Matthias Clasen
none Details | Review
atomic refcounting patch (18.94 KB, patch)
2005-03-31 16:56 UTC, Wim Taymans
none Details | Review
adds testcases (14.38 KB, patch)
2005-07-08 14:05 UTC, Wim Taymans
committed Details | Review
new patch to make gobject refcounting atomic (16.05 KB, patch)
2005-07-08 14:08 UTC, Wim Taymans
none Details | Review
includes gclosure.c (30.68 KB, patch)
2005-07-12 13:23 UTC, Wim Taymans
none Details | Review
updated patch (33.27 KB, patch)
2005-07-12 14:49 UTC, Wim Taymans
none Details | Review
another test (4.98 KB, patch)
2005-07-15 15:03 UTC, Matthias Clasen
committed Details | Review
updated patch (36.72 KB, patch)
2005-07-15 15:54 UTC, Wim Taymans
none Details | Review
gobject-notify.diff (4.48 KB, patch)
2008-05-08 09:57 UTC, Sebastian Dröge (slomo)
rejected Details | Review
proposed patch (8.85 KB, patch)
2009-03-18 15:06 UTC, Wim Taymans
none Details | Review
new proposed patch (13.13 KB, patch)
2009-03-27 18:59 UTC, Ole André Vadla Ravnås
rejected Details | Review
Make GObject property notify thread safe (8.45 KB, patch)
2010-06-29 23:56 UTC, Olivier Crête
none Details | Review
Make GObject property change notifications thread safe (17.37 KB, patch)
2010-06-30 01:01 UTC, Olivier Crête
none Details | Review
Make GObject property change notifications thread safe (improved) (17.57 KB, patch)
2010-06-30 16:02 UTC, Olivier Crête
committed Details | Review

Description Wim Taymans 2005-02-02 10:04:01 UTC
Please describe the problem:
Various structures are refcounted in an unsafe way. This makes multithreaded
applications fail at random. The most notable cases are signal emision (which
refcounts various objects) and GObject refcounting.

Steps to reproduce:
Attached path contains testcases to show the problems.



Actual results:


Expected results:


Does this happen every time?


Other information:
Comment 1 Wim Taymans 2005-02-02 10:11:21 UTC
Created attachment 36860 [details] [review]
proposed patch using GAtomic

Since the refcount field of closures is using bitfields, a lock is used to
protect the refcount.
An internal structure in gsignal.c was grew in size by making the refcount a
full gint. 
The patch allows for inlining of the atomic operations.
Atomic refcounting has a performance overhead.
This patch is ABI and API compatible with 2.6.1, gnome runs fine with it.
Comment 2 Owen Taylor 2005-02-02 17:49:20 UTC
The patch needs explanation and perhaps splitting into smaller patches.

I'd rather see it done *without* inlining the atomic ops first,
then we can look at whether inlining the atomic ops with libgobject
makes sense from a performance point of view.
Comment 3 Christian Fredrik Kalager Schaller 2005-02-02 18:12:34 UTC
Wim's original patch was without inlining and it made the testcase run things 6
times slower compared to the 2 times slower of the inlining version (compared to
current implementation)
Comment 4 Owen Taylor 2005-02-02 23:13:24 UTC
Are you saying that the current implementation makes signal emission two
times slower?

Any explanation for the why the lack of inlining would make things so 
much worse? Is it PLT overhead for calls between libraries? (Simple
function call overhead doesn't make sense ... making something that much
slower would imply that almost 80+% of the total number of function calls
in the test case are to the atomic operations.
Comment 5 Matthias Clasen 2005-02-03 05:56:21 UTC
seeing the test case might help to understand this...
Comment 6 Wim Taymans 2005-02-03 08:54:19 UTC
The slowdown is noted in the very synthetic testcase objects2.c, which
continuously refs and unrefs a single object, so yes more than 80% of the calls
are atomic operations. Running gnome desktop with the patched glib does not show
any noticable slowdown. 
Comment 7 Wim Taymans 2005-02-03 09:48:46 UTC
Created attachment 36907 [details] [review]
updated patch without inlining of atomic operations

New patch that does not include the inlining of the atomic operations.
Comment 8 Wim Taymans 2005-02-03 16:08:13 UTC
Created attachment 36927 [details] [review]
Patch to add inlining

Patch to add inlining
Comment 9 Wim Taymans 2005-02-03 18:47:12 UTC
Created attachment 36938 [details] [review]
add refcounting testcases

The problem
-----------

A typical _ref function looks like this:

_ref (object)
{
 ...
 object->refcount++;
 ..
}

which on x86 translates to:

     mov    0x4(%esi),%eax
     inc    %eax
     mov    %eax,0x4(%esi)

When two threads both call the _ref function at the same time
undefined behaviour can occur depending on where the context switch
occurs. Mostly the result is that one object does not get its
refcount incremented. The sequence is as follows:

  <thread 1>
     mov    0x4(%esi),%eax	; eax contains refcount

  ****context switch here, thread2 starts to run***
  <thread 2>
     mov    0x4(%esi),%eax	; eax contains refcount
     inc    %eax		; eax contains refcount+1
     mov    %eax,0x4(%esi)	; refcount = refcount+1
     ....

  ****context switch here, thread1 continues ***
  <thread 1>
     inc    %eax		; eax contains refcount+1
     mov    %eax,0x4(%esi)	; refcount = refcount+1

After switching back to thread1 the refcount increment performed by
thread2 is undone, resulting in a lost refcount.
Similar problems exist un the _unref functions.


Testcases
---------

The testcases in this patch exposes the following bugs:

 objects

   Create two threads that ref/unref the object. no object should be
   disposed. The testcase crashes almost instantly because of the above
   explanation.

 objects2

   ref/unref a single object from a single thread. Used to benchmark
   different refcounting implementations.

 signal1 (from signals.c)

   Fire a signal from two objects of the same type from 40 threads. Does
   not crash very often because the window is very small but it
   occasionally crashes because of unsafe unreffing of the shared closure.

 signal2 (from signals.c)

   Fire a signal from two objects of the same type from 40 threads with a
   custom handler in the object. Segfaults almost immediatly.

 signal3 (from signals.c)

   Fire a property notify signal. Segfaults almost immediatly.


Solution
--------
Solution
--------

A solution to this problem is making the increment one atomic operation
by either using a lock around the increment/decrement-test code or by
using the faster GAtomic primitives.

The proposed patch uses GAtomic to make the refcounting safe and uses
a mutex where the refcount is not manageble by a GAtomic (gclosure.h
where the ref_count is expressed in a bitfield). Ideally the lock
cases should be converted to GAtomic too but that would mean an ABI
breakage.


Performance evaluation
----------------------

Preformance can be measured with a microbenchmark (objects2) that does
nothing else than ref/unref an object from a single thread.

Tests ran on pentium 4 HT 2.8GHz.

unpatched glib 2.6.1

   >>time ./objects2
   init 0x804ef78

   real    0m6.085s
   user    0m5.653s
   sys	   0m0.016s

patched glib 2.6.1 with atomic refcount

   >>time ./objects2
   init 0x804ef98

   real    0m31.695s
   user    0m31.620s
   sys	   0m0.014s

   Performance negatively inpacted because of the extra function calls
   made for each ref/unref.

patched glib 2.6.1 with atomic refcount inlined

   >>time ./objects2
   init 0x804ef98

   real    0m13.226s
   user    0m13.204s
   sys	   0m0.016s

   Performance negatively inpacted because of the extra lock; assembly
   call that synchronizes the CPU caches.

ABI/API
-------

no ABI/API changes.
Comment 10 Christian Fredrik Kalager Schaller 2005-02-08 11:54:12 UTC
Owen and Matthias, are you happy about this 3-way split of the patch? or do you
want even further splitting?
Comment 11 Matthias Clasen 2005-02-08 13:16:00 UTC
The splitting looks fine to me. What I discussed with Owen yesterday, and what
needs to be addressed in a different way, is the closures. We certainly don't
want to take a lock every time we ref/unref a closure, since that is done a
couple of times per signal emission. Breaking the abi is not an option either.

So, if there is no way to perform an atomic increment on the low 15 bits of a
32bit value, one option we considered was to make use of the fact that
notifiers is a private pointer in the closure struct pointing to an array
of various things. So there would be two options for associating a 32bit
refcount with a closure:

a) make the refcount occupy the first 4 bytes of the memory block pointed
to by notifiers

b) change 

struct {
...
  GClosureNotifyData *notifiers;
} GCLosure;

to

struct {
  gint ref_count;
  GClosureNotifyData *notifiers;
} GClosurePrivate;


struct {
  ...
  GClosurePrivate *private;
} GClosure;


Comment 12 Wim Taymans 2005-02-08 17:46:52 UTC
Option a) might not work since the memblock pointed to by notifiers might change
at when a g_renew of the block is performed in various functions.
It _is_ possible to use _compare_and_swap to modify just the 15 bits of the
ref_count but it would involve finding out where those 15 bits are in the
struct, which is not defined by any C standards. Option b) is therefore the most
feasable solution IMO.
Comment 13 Matthias Clasen 2005-02-08 17:54:19 UTC
It would be interesting to see the how the locks affect the timing of the signal
tests, btw.
Comment 14 Matthias Clasen 2005-03-08 00:36:51 UTC
Any news on this ?
Comment 15 Matthias Clasen 2005-03-10 05:02:28 UTC
Created attachment 38488 [details] [review]
the patch
Comment 16 Matthias Clasen 2005-03-10 12:30:29 UTC
To help this effort, I have implemented option b) to get a full integer
ref_count for closures
Comment 17 Wim Taymans 2005-03-31 16:56:30 UTC
Created attachment 39517 [details] [review]
atomic refcounting patch

New patch, uses compare and swap to implement the closure refcounting.
Uses a recursive mutex to make the object_notify work reliably.
Remove atomic read in the g_assert/g_return_if_fail debug statements.
Comment 18 Wim Taymans 2005-03-31 16:57:36 UTC
patch id=39517 is against 2.6.3
Comment 19 Robert Love 2005-05-20 18:30:52 UTC
We definitely want to use a full int (as the current patch does) because
different architectures have different requirements wrt atomic operations.  Many
need a full word.  Others don't have instruction-level support for atomic
arithmetic at all and thus need them emulated, which might require creating a
lock and thus wasting another 8- or 16-bits.  SPARC32, for example, suffers from
this affliction.

Oh, and depending on usage we probably want to make the ref count volatile.
Comment 20 Matthias Clasen 2005-06-08 16:44:08 UTC
Wim, ping

My understanding from talking with timj is that you will produce another version
of the patch ?
Comment 21 Matthias Clasen 2005-07-01 15:42:50 UTC
Johan, did you mean to move this back to 2.8 API freeze, or did you confuse it
with the constructor issue ?
Comment 22 Johan (not receiving bugmail) Dahlin 2005-07-01 15:47:25 UTC
Matthias, Yes, sorry. Moving back to 2.8 Freze.
Comment 23 Wim Taymans 2005-07-08 14:05:29 UTC
Created attachment 48828 [details] [review]
adds testcases

a new patch to add testcases against current HEAD cvs
Comment 24 Wim Taymans 2005-07-08 14:08:30 UTC
Created attachment 48829 [details] [review]
new patch to make gobject refcounting atomic

This is a reviewed patch to add atomic GObject refcounting against current HEAD
CVS. 

GType is not patched anymore.
GObject last_unref merged with _unref, taking into account remarks from Tim
Janik. Toggle refs are included.

This patch does not yet contain the gclosure.c patch. To be done.
Comment 25 Wim Taymans 2005-07-12 13:23:21 UTC
Created attachment 49016 [details] [review]
includes gclosure.c 

This patch also includes gclosure.c atomic refcounting.
Comment 26 Wim Taymans 2005-07-12 13:28:11 UTC
previous patch is not yet 100% threadsafe.
Comment 27 Matthias Clasen 2005-07-12 14:23:05 UTC
Wim, is the last patch ready to be sent to timj for review ?
Comment 28 Wim Taymans 2005-07-12 14:49:51 UTC
Created attachment 49023 [details] [review]
updated patch

use atomic reads in closure_invoke_notifiers(). Adding/removing notifiers while
invoking is still not threadsafe, is this important?
Comment 29 Matthias Clasen 2005-07-15 04:32:45 UTC
Wim, 

+typedef struct
+{
+  union {
+    GClosureBits bits;
+    gint atomic;
+  } data;
+} GAtomicClosureBits;

why the extra wrapping in a struct, would

typedef union {
  GClosureBits bits;
  gint atomic;
}

not work as well ? Also, should it be gint32, since GClosureBits is
32 bits wide ? But we only have atomic operations on gint.

+#define CLOSURE_UNREF(closure, is_zero)  				\
+G_STMT_START {								\
+  GClosureBits old, new;						\
+  do {									\
+    CLOSURE_READ_BITS2 (closure, &old, &new);				\
+    if (old.ref_count == 1)	/* last unref, invalidate first */	\
+      g_closure_invalidate ((closure));				\
+    new.ref_count--;							\
+    is_zero = (new.ref_count == 0);					\
+  }									\
+  while (!CLOSURE_SWAP_BITS (closure, &old, &new));			\
+} G_STMT_END

Will this not end up invalidating the closure twice, if the loop
body gets executed more than once ? Or could it invalidate the closure,
but it could still leave the closure with a refcount of 1, if another
thread refs the closure between the invalidation and the swap ?

There are a few uses of the meta_marshal, is_invalid and in_marshal
bits in gsignal.c and gobject.c, which probably need to be done 
atomically.
Comment 30 Matthias Clasen 2005-07-15 15:03:59 UTC
Created attachment 49242 [details] [review]
another test

Here is another test which checks that the notify queue works correctly when 
multiple threads change properties simultaneously. Segfaults easily without the

patch.
Comment 31 Wim Taymans 2005-07-15 15:54:10 UTC
Created attachment 49249 [details] [review]
updated patch

This patch cleans up the struct in union and uses an atomic read in gsignal.c
to access the meta_marshal bit.
Comment 32 Matthias Clasen 2005-07-15 17:02:33 UTC
Committed the last patch and the tests. 
We may want to add a configure check that gint is 32 bits
Comment 33 Tim Janik 2005-07-30 18:24:01 UTC
matthias, comitting the patch and closing the bugreport as fixed was not of your
business, you need to get approval from me first for such changes. regardless of
whether you think the final patch still has issues or not (which it has).
Comment 34 Tim Janik 2005-07-30 19:51:50 UTC
i committed the following changes. with that, we have basic atomic reference
counting for params and gobject, but not gclosure. also, gobject atomic
reference counting conflicts the recently comitted toggle references.

Sat Jul 30 21:10:26 2005  Tim Janik  <timj@gtk.org>
        * gobject.c: reverted notify_mutex introduction, since this prevents
        parallelized setting of object properties on different objects, and
        serves no apparent purpose (to me at least).
        g_object_real_dispose(): removed non-atomic reference count 
        modifications.
        g_object_unref(): make sure the closures array is destroyed when
        destroying signal handlers.
        * gparam.c: cosmetic changes.
        * gsignal.c: comment fixup. allow 16bit blocking count.
        * gsignal.c: reverted GClosure related changes.
        * gclosure.c: reverted premature commit of atomic reference
        counting attempt.
Comment 35 Tim Janik 2005-07-30 20:07:57 UTC
wim, can you please explain why you added notify_mutex locking around accesses
to property_notify_context?
Comment 36 Matthias Clasen 2005-08-01 04:09:52 UTC
Tim, I just love you. Reverting the gclosure changes without further explanation
is not bringing us closer to a working solution. Unless you are going to commit
your superior solution soon (which I doubt).

And cut out that my-business-your-business bickering
Comment 37 Christian Fredrik Kalager Schaller 2005-08-01 08:29:25 UTC
FYI, Wim is on vacation for two weeks and might not be able to comment on this
before somewhere around the 10th.
Comment 38 Thomas Vander Stichele 2005-08-01 09:41:32 UTC
I'm a bit confused here.  Why is there a 15 day gap between Matthias commiting
and Tim complaining ? If I follow correctly, Tim is the maintainer of this chunk
of code, but somehow a commit to his code was unnoticed for 2 weeks ?

I'm trying to understand how the process should have gone in this case, with a
stable release coming up so soon I would expect the maintainer to be following
relevant bugs closely enough, and definately commits to his code.

As it stands Wim was led to believe that he did all he could to bring the patch
to the level expected, and now two weeks after that chunks get reverted at the
unfortunate moment he has no chance to reply.

How should we proceed here ? Can this still be resolved before the next stable
glib release ? What should be done on Wim's end ?
Comment 39 Matthias Clasen 2005-08-01 13:09:07 UTC
Thomas, Tims notion of maintainership is somewhat different from ours...
Comment 40 Tim Janik 2005-08-01 21:22:24 UTC
Atomic reference counting for closures went into CVS now:

Mon Aug  1 23:00:42 2005  Tim Janik  <timj@imendio.com>
* gclosure.c: turned all modifications to the first 32 integer bits in a 
closure into atomic accesses. wrapped write accesses into special macros
to keep the atomic modification logic in a single place. comment cleanups.

Aparently owen has reasonings on why we're save with GObject reference counting
and toggle references, owen please comment.
Comment 41 Tim Janik 2005-08-01 21:31:54 UTC
Matthias, if you're unhappy with my way of maintaining GObject, i'm sorry, but
if improper, unreviewed code is committed to CVS, it has to be reverted.

Thomas, unfortunately there's been some miscommunication going on between me and
Matthias. That, and because i'm always pretty busy with lots of other projects
i'm also responsible for, has lead into me not noticing the erroneous commit
until after aproximately 10 days. This was by no means wim's fault, he's been
doing a great job on tackling the atomic ref counting in the first place, and
without him, we'd not have this for 2.8. At this point, no further action from
him is required, but it'd be nice if he outlined his ideas about notify_mutex.
Comment 42 Owen Taylor 2005-08-01 23:58:04 UTC
On toggle references: there are a couple of properties of toggle references
that make them reasonably robust against thread safety issues even without
explicit locking.

 1. It only makes sense for *one* toggle reference to be in place at once,
    if two toggle references are added to the same object, then neither
    will ever be notified and the object will never be freed. 

    So, to a first approximation, we can ignore the case where two threads
    try to add toggle references to an object at the same time.

 2. Given the normal and expected use of toggle references, a toggle
    reference is removed only immediately before the the object is destroyed:
    that is, when no other reference to the object exist. So will be
    no thread safety issues with removing toggle references.

   (http://mail.gnome.org/archives/gtk-devel-list/2005-April/msg00095.html
   has some diagrams of how toggle references work)

Thus, all situations of concern are between toggle_refs_notify() and
g_object_add_toggle_ref() or between two invocations of toggle_refs_notify()

Possible concerns:

 - g_object_add_toggle_ref() sets the OBJECT_HAS_TOGGLE_REF_FLAG 
   flag, this is seen in g_object_ref() or g_object_unref() and 
   it runs toggle_refs_notify() before the object data is set.

   Because g_object_get/set_data() use a mutex, reversing the order
   of setting the flag and object data should fix this; the mutex
   establishes sufficient memory coherency even though setting the
   flag isn't done within the mutex.

   (Having a mutex lock/unlock between the initial ref() and 
   the subsequent setting the flag in g_object_add_toggle_ref()
   is also important. It makes sure that the two actions are
   seen as ordered by other threads so that the toggle reference will 
   never be notified until the calling thread drops the reference count
   it must have had to call g_object_add_toggle_ref())

 - One thread decrements the refcount from 2 to 1, another increments
   the refcount from 1 to 2, the two subsequent toggle_refs_notify()
   calls get called in the wrong order.   

   This one is much harder to deal with. I don't have any good ideas
   to fix it ... you'd probably have to switch to mutex locking for
   refcounting on objects that have toggle references.

My basic conclusion here is that toggle references are probably best not
mixed together with atomic reference counts. GStreamer should not directly
expose atomic-reference counted objects to language bindings that use
toggle references.

Note also that atomic-reference counting always causes another problem
for toggle references: the toggle notifiers an be called from arbitrary
threads, so that the toggle reference code in the language binding would
have to be written very carefully.

Weak references have a much more straightforward set of problems - 
add_weak_ref(), remove_weak_ref(), and notification (triggered by
g_object_real_dispose(), say) can all happen from any thread at
any time, so you'd have to mutex protect everything to get it to work.
be called from any thread, so they all need to be 
Comment 43 Andy Wingo 2005-08-02 14:02:11 UTC
*** Bug 311496 has been marked as a duplicate of this bug. ***
Comment 44 Andy Wingo 2005-08-02 16:41:22 UTC
AFAIK (which is not far), declaring the bitfields of GClosure as volatile is
insufficient to guarantee that reading them will return up-to-date values (as
g_atomic_int_read would). Not sure. Tim?

The notify_mutex that was in gobject.c serves the purpose of guaranteeing the
consistency of the notify queues if multiple threads are setting properties on
the same object. It is a rather coarse lock, yes. I suppose threadsafe property
setting/getting is somewhat outside of the scope of this bug though.

A bit of background to show where we're coming from -- GStreamer sets up media
processing pipelines, and when they get set to PLAYING they create threads and
start to run (always in 0.9, only in some cases in 0.8). In 0.8 this is a very
tricky operation -- once you set the thing to PLAYING, you can't really do
anything with the objects the threads touch because you have no guarantee that
what you access is valid or that what you try to set is not corrupting some data
structure.

So, in 0.9 we tried to redesign the API so that it is MT-safe for every API
call. A much simpler statement to make than the one in the past. We have a whole
mess of locks and signals and conds underneath to make sure this works while the
pipeline spins up, while it's running, when it's not, etc. But without atomic
recounting this is just not possible to do.

The same is true for signals and properties. If an API call will cause a signal
emission or a property notification (not always possible to tell if you call
into plugin code), we'd like for the GObject infrastructure in between to not
crash. So, that was the purpose of the very coarse notify_mutex, and of having
more threads in tests/refcount/signals.c.

So although concurrent object access is not normally supported in glib, the
signals and property threadsafety patches are make GObject as threadsafe as the
object implementation is -- which is more in line with GObject's existence as an
open data type (as opposed to hash tables, for instance).

The statement of the goals above is true; the implications for what we need out
of GLib is only AFAIK -- corrections of my ignorance are of course welcome...
Comment 45 Tim Janik 2005-08-02 17:28:32 UTC
a toggle-reference check is still missing in g_object_unref():

      /* decrement the last reference */
      is_zero = g_atomic_int_dec_and_test (&object->ref_count);
      
      /* may have been re-referenced meanwhile */
      if (G_LIKELY (is_zero)) 
Comment 46 Wim Taymans 2005-08-23 09:47:35 UTC
Reading Owen's comments I agree with his prososal to use an object mutex lock to
guarantee the toggle_ref_notify() order. functions would go like this:

_ref:

  if (G_UNLIKELY (OBJECT_HAS_TOGGLE_REF (object)) {
    G_OBJECT_REC_LOCK (object);
    object->refcount++;
    /* went from 1->2, notify toggle refs */
    if (object->refcount == 2)
      toggle_refs_notify (object, FALSE);
    G_OBJECT_REC_UNLOCK (object);
  }
  else {
    g_atomic_int_inc (&object->ref_count);
  }


_unref:

  if (G_UNLIKELY (OBJECT_HAS_TOGGLE_REF (object)) {
    G_OBJECT_REC_LOCK (object);
    object->refcount--;
    /* went from 2->1, notify toggle refs */
    if (object->refcount == 1) {
      toggle_refs_notify (object, TRUE);
    }
    else if (object->refcount == 0) {
       ...
    }
    G_OBJECT_REC_UNLOCK (object);
  }
  else {
    gboolean is_zero;

    is_zero = g_atomic_int_dec_and_test (&object->ref_count);
    /* may have been re-referenced meanwhile */
    if (G_LIKELY (is_zero)) {
      ...
    }
  }

  - a global recursive object mutex is used to make sure the refcounting on 
    toggle reffed objects is atomic together with the notify. The mutex is
    recursive so that the toggle ref handler can _ref/_unref the object. 

  - non toggle reffed objects update the refcount field atomically. It is 
    assumed that modifying the toggle ref on an object happens from one thread.

Any comments before I try to implement this?
Comment 47 Matthias Clasen 2005-08-23 14:01:57 UTC
Wim, I believe Tim had a patch in the queue for handling the toggle references.
I fear that he lost interest before finishing it, though. Then we could be in
for a long wait again...
Comment 48 Sebastian Dröge (slomo) 2008-05-06 12:52:15 UTC
Is this bug still valid?
Comment 49 Sebastian Dröge (slomo) 2008-05-08 08:43:37 UTC
Yes it is... some parts of the patch are obsolete now though. I'll be looking at which parts are still valid in the next days, port it to latest SVN and attach patches for all separate parts of the patch.
Comment 50 Sebastian Dröge (slomo) 2008-05-08 09:57:17 UTC
Created attachment 110569 [details] [review]
gobject-notify.diff

Only part of the old patch that is still valid.

The GObject notification stuff uses the global property_notify_context variable and if more than one notification is emitted at the same time from different threads weird crashes will happen. We have quite some bugs on GStreamer caused by this.
Comment 51 Sebastian Dröge (slomo) 2008-05-08 09:59:06 UTC
Some other places that use refcounts without atomic operations are:
glib/ghook.c
glib/gcache.c
glib/gmain.c

I'm not sure if these are problems and should be fixed though.
Comment 52 Sebastian Dröge (slomo) 2008-05-08 10:10:32 UTC
The two testcases are already in SVN since some time it seems.
Comment 53 Sebastian Dröge (slomo) 2008-05-14 06:01:30 UTC
*** Bug 489682 has been marked as a duplicate of this bug. ***
Comment 54 Matthias Clasen 2008-05-15 05:06:02 UTC
Moving to the right component. You probably want to poke timj on irc to get the patch reviewed.
Comment 55 Tim Janik 2008-05-16 08:29:02 UTC
(In reply to comment #50)
> Created an attachment (id=110569) [edit]
> gobject-notify.diff
> 
> Only part of the old patch that is still valid.
> 
> The GObject notification stuff uses the global property_notify_context variable
> and if more than one notification is emitted at the same time from different
> threads weird crashes will happen. We have quite some bugs on GStreamer caused
> by this.

Please take a look at GObjectNotifyContext property_notify_context:
struct _GObjectNotifyContext {
  GQuark                       quark_notify_queue;
  GObjectNotifyQueueDispatcher dispatcher;
  GTrashStack                 *_nqueue_trash; /* unused */
};
quark_notify_queue and dispatcher are setup during class_init, retain their values after, and _nqueue_trash is unused. So this can't be the cause of crshes you're seeing for concurrent notifications.

Just to make sure, you're talking about concurrent object notifications in different threads from *different* objects right? I.e. you're not trying to emit notifications from the same object in different threads simultaneously?
Comment 56 Sebastian Dröge (slomo) 2008-05-16 09:37:54 UTC
Ok, so I've probably misunderstood the code. From looking at it again it seems that this will only lead to problems if it happens on the same object instance from different threads at the same time.

So this is simply something you shouldn't do?
Comment 57 Tim Janik 2008-05-16 09:58:56 UTC
(In reply to comment #56)
> Ok, so I've probably misunderstood the code. From looking at it again it seems
> that this will only lead to problems if it happens on the same object instance
> from different threads at the same time.
> 
> So this is simply something you shouldn't do?

Objects and classes are simialr to other GLib data structures in that regard. E.g. if you use a GHashTable in 2 threads, you need to make sure to lock access around it in both so only one thread at a time uses the structure. The same is true for object and class structures.
Comment 58 Sebastian Dröge (slomo) 2008-05-16 10:32:40 UTC
Ok, then this is simply no GLib/GObject bug... let's close it here then ;)

Unfortunately this can't be easily worked around in gstreamer so maybe we have to implement our own thread safe properties there in the future.
Comment 59 Wim Taymans 2009-03-18 15:01:47 UTC
I disagree. Setting properties on the same object from multiple threads should work as expected without the crashing that happens now. 

Any locking that needs to be done to protect the object data structures is of course not GObject's task but at least we should make it to the object implementation.

I was poking at this some more and came up with a very simple patch that I will attach next.

  
Comment 60 Wim Taymans 2009-03-18 15:06:45 UTC
Created attachment 130890 [details] [review]
proposed patch

This patch adds a testcase that modifies a property on a testobject from multiple threads. Without the patch this crashes instantly.

Also included is a patch to the gobjectnotifyqueue.c so that it does not keep the notify queue on the object itself but as a threadlocal variable.

The effect is that when multiple threads freeze and thaw the notify queue they will see a different queue instead of stepping on eachothers toes. The test runs fine after applying this patch.
Comment 61 Ole André Vadla Ravnås 2009-03-27 18:59:45 UTC
Created attachment 131515 [details] [review]
new proposed patch

Updated patch based on Wim's proposed patch.

Takes nested notifies on multiple objects into account (test included) and makes the previous test a bit shorter by using G_DEFINE_TYPE() and removing unused dispose() method. (Guess the former should be done for the existing tests as well?)
Comment 62 Olivier Crête 2009-04-10 18:46:42 UTC
I can reproduce this bug pretty easily, is it possible to have it reviewed/committed?
Comment 63 Laurent Glayal 2009-04-28 09:41:35 UTC
Same thing, the patch avoids spurious segfaults in our code.
Comment 64 Tim Janik 2009-10-06 14:12:13 UTC
(In reply to comment #59)
> I disagree. Setting properties on the same object from multiple threads should
> work as expected without the crashing that happens now. 

If those multiple threads all make sure to hold a global or object-specific lock around accessing object properties, there won't occour any problems.

(In reply to comment #60)
> Also included is a patch to the gobjectnotifyqueue.c so that it does not keep
> the notify queue on the object itself but as a threadlocal variable.

This breaks our API. Currently, thread A may freeze an obejct, thread B can set a property on it, and thread A may then thaw and emit notifies on the object. Such scenario easily occour e.g. with using thread pools. With your patch, thawing doesn't take effect in other threads, and property notifications can be arbitrarily duplicated.
Comment 65 Sebastian Dröge (slomo) 2009-10-06 14:57:16 UTC
(In reply to comment #64)
> (In reply to comment #59)
> > I disagree. Setting properties on the same object from multiple threads should
> > work as expected without the crashing that happens now. 
> 
> If those multiple threads all make sure to hold a global or object-specific
> lock around accessing object properties, there won't occour any problems.

That doesn't work. What if the same object is used at different places? You propose to always export an object-specific lock everywhere that should be used then? Doesn't sound like a good solution

> (In reply to comment #60)
> > Also included is a patch to the gobjectnotifyqueue.c so that it does not keep
> > the notify queue on the object itself but as a threadlocal variable.
> 
> This breaks our API. Currently, thread A may freeze an obejct, thread B can set
> a property on it, and thread A may then thaw and emit notifies on the object.
> Such scenario easily occour e.g. with using thread pools. With your patch,
> thawing doesn't take effect in other threads, and property notifications can be
> arbitrarily duplicated.

That's actually a very interesting use case... could you point to some code examples where this is done? Just curious

But of course you have a point here, this API behaviour would be broken which is not nice. Did the docs say something about this being possible? ;)
Comment 66 Benjamin Otte (Company) 2010-01-20 07:22:21 UTC
So, I want progress on this. Apparently there is 3 possible behaviors we could implement:
1) Guarantee that the notify is emitted from the thread that modified the property (current patch). This does not keep the guarantee that a frozen object doesn't emit notify signals - or if we want to keep it we must block threads, and we don't want that.
2) Emit all notify events whenever the last object thaws the queue. This can cause multiple emissions of notify events in different threads at the same time, or can just cause one notify event depending on race conditions. But it keeps the current API guarantees.
3) WONTFIX this and not support concurrent property access at all.

I have to say that I do like the third solution most and would prefer GStreamer having a nicer API, since both 1) and 2) are ugly solutions to the problem and I can't find a non-ugly one.
But in the end this is quite a common bug and rather simple to fix inside gobject and GStreamer is happy no matter what solution we use. So I'd just pick solution 2) and explicitly state that the thread emitting the notify signal is undefined for concurrent access. And I'd pick that solution because it's the one that can guarantee that:
- at least one notify is emitted for every property modification
- no notify is emitted when the object is frozen
- all notifies are emitted when the object is unfrozen
Comment 67 Olivier Crête 2010-06-29 23:56:43 UTC
Created attachment 164938 [details] [review]
Make GObject property notify thread safe

Here is an implementation of the second option, the property notify will be emitted by the last thread to thaw. They will be emitted only once, etc. It seem to keep the current behavior in the single threaded case. And well, the behavior in the multi-threaded case was to segfault randomly, so this is definitely an improvement.
Comment 68 Colin Walters 2010-06-30 00:48:39 UTC
Review of attachment 164938 [details] [review]:

This appears to be missing the new test files.
Comment 69 Olivier Crête 2010-06-30 01:01:33 UTC
Created attachment 164940 [details] [review]
Make GObject property change notifications thread safe

Oops, forgot to git-add them..
Comment 70 Benjamin Otte (Company) 2010-06-30 08:44:27 UTC
There's some brokenness from evolution in your patch it seems:
- The count variable in g_object_notify_queue_freeze() seems unused
- g_object_notify_queue_thaw() does not seem to return a boolean
- g_critical("Free queue for %p is larger than 65526", object); should be 65536. I think I'd prefer if it said "Called g_object_notify_queue_freeze() too often. Forgot to call g_object_notify_queue_thaw() or infinite loop?" or something like this, so Joe Random Hacker has a clue what it means.

A few observations:
- You seem to try to keep the warnings in the current code as they are. As warnings are just a means to tell the user something is fundamentally broken, we just need be sure we do emit warnings, but not that they are the same, which should make the code a lot simpler:
- We don't need the create parameter to g_object_notify_queue_freeze(), as we only use FALSE to be able to emit the warning (when the queue must exist anyway). So if we emit a warning  in g_object_notify_queue_thaw(), everything should be fine.
- The object->ref_count check in g_object_notify_queue_thaw() probably needs to use g_atomic_int_get().
- Related to that check, I think we can move it to the other check at top of the file. Allowing reducing the freeze count to anything but zero looks broken to me. Or am I missing something weird in the object destruction process?

And my biggest worry:
- Have you (or anybody) benchmarked the effect of a big global lock that we take 2+ times for every property notify?
Comment 71 Olivier Crête 2010-06-30 16:02:18 UTC
Created attachment 164972 [details] [review]
Make GObject property change notifications thread safe (improved)

(In reply to comment #70)
> There's some brokenness from evolution in your patch it seems:
> - The count variable in g_object_notify_queue_freeze() seems unused

Fixed

> - g_object_notify_queue_thaw() does not seem to return a boolean

Oops.

> - g_critical("Free queue for %p is larger than 65526", object); should be
> 65536. I think I'd prefer if it said "Called g_object_notify_queue_freeze() too
> often. Forgot to call g_object_notify_queue_thaw() or infinite loop?" or
> something like this, so Joe Random Hacker has a clue what it means.

Improved the warning.

> A few observations:
> - You seem to try to keep the warnings in the current code as they are. As
> warnings are just a means to tell the user something is fundamentally broken,
> we just need be sure we do emit warnings, but not that they are the same, which
> should make the code a lot simpler:

I tried to not change what already works.

> - We don't need the create parameter to g_object_notify_queue_freeze(), as we
> only use FALSE to be able to emit the warning (when the queue must exist
> anyway). So if we emit a warning  in g_object_notify_queue_thaw(), everything
> should be fine.

Actually, _queue_freeze() is now also used as a getter function because the _queue_thaw() function takes a GObjectNotifyQueue as a parameter and it seemed like the best way to get it. 

> - The object->ref_count check in g_object_notify_queue_thaw() probably needs to
> use g_atomic_int_get().

None of the other uses of g_return_if_fail() with the ref_count in gobject.c seem to use the g_atomic_int_get().. but I added it anyway. Better safe than sorry.

> - Related to that check, I think we can move it to the other check at top of
> the file. Allowing reducing the freeze count to anything but zero looks broken
> to me. Or am I missing something weird in the object destruction process?

Moved

> And my biggest worry:
> - Have you (or anybody) benchmarked the effect of a big global lock that we
> take 2+ times for every property notify?

No, but it already takes a lock twice every time..  g_datalist_id_{get,set}* functions also have a global lock inside. We could reduce the number of times the global lock is taken by adding a lock inside the GObjectNotifyQueue struct and then using it the _add_() function and by making the freeze_count atomic (so we don't have to take the lock in _thaw() unless its the last one). That said, I'm not sure at all if its worth it. In single threaded cases, there is no cost. In multi-threaded cases, I think its a lot better to contend than to crash...
Comment 72 Chris Shoemaker 2010-07-22 14:28:31 UTC
It's really great to see the renewed efforts to fix this bug.  (Thanks Olivier!)

I realize the patch hasn't been benchmarked yet, but since I don't see any alternate proposals for implementing correct locking behavior, could we go with the thread-safe solution for now, and optimize it only if necessary?
Comment 73 Benjamin Otte (Company) 2010-07-22 20:01:45 UTC
commit f6d3e224dfa9e8b69403a8c79a27a58b5c9f66b7
Author: Olivier Crête <olivier.crete@collabora.co.uk>
Date:   Thu Jul 22 21:52:54 2010 +0200

    notify: Add tests for threadsafe object notifies
    
    https://bugzilla.gnome.org/show_bug.cgi?id=166020

commit 0483ef000ae323415bb0c7b762dd587f1384b1f3
Author: Benjamin Otte <otte@redhat.com>
Date:   Thu Jul 22 21:51:28 2010 +0200

    notify: Make dedup code clearer
    
    A for loops seems easier to understand to me than gotos.

commit 9026b11e380a9b6479c0fcb852a5aba297ed9dd4
Author: Olivier Crête <olivier.crete@collabora.co.uk>
Date:   Thu Jul 22 21:26:32 2010 +0200

    notify: Make GObject property change notifications thread safe
    
    Adds locking around object property change notification handling. The
    notifications are only emitted after all threads have called
    g_object_thaw_notify().
    
    https://bugzilla.gnome.org/show_bug.cgi?id=166020

commit 65797f7e54061a21866261d00a1c5533cd6a4f81
Author: Olivier Crête <olivier.crete@collabora.co.uk>
Date:   Thu Jul 22 21:20:35 2010 +0200

    notify: Remove g_object_notify_queue_from_object()
    
    The function will not be safe with object locking in place, so we remove
    it. The workaround is somewhat ugly, but it works.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=166020


commit 83026092eba98c2e6e0a96112555f980382a99e7
Author: Olivier Crête <olivier.crete@collabora.co.uk>
Date:   Thu Jul 22 20:25:00 2010 +0200

    notify: Refactor g_object_notify_queue_thaw()
    
    This adds better error reporting and simplifies the code for adding
    thread safety.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=166020

commit 0201a81f04ae5fd46c6db25859e90dde1fc35d4b
Author: Olivier Crête <olivier.crete@collabora.co.uk>
Date:   Thu Jul 22 09:02:18 2010 +0200

    notify: Remove unused g_object_notify_queue_clear()
    
    https://bugzilla.gnome.org/show_bug.cgi?id=166020

commit a2c5bba31d19dcd7f3726c99280ff794cbf744a9
Author: Benjamin Otte <otte@redhat.com>
Date:   Thu Jul 22 08:52:25 2010 +0200

    notify: Emit a g_critical() instead g_return_if_fail()ing
    
    This does not change the code semantically in any way but avoids a
    return in the middle of the code.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=166020
Comment 74 Benjamin Otte (Company) 2010-07-22 20:02:43 UTC
Comment on attachment 164972 [details] [review]
Make GObject property change notifications thread safe (improved)

I basically committed this in a split form and did some cleanups.
Comment 75 Frederic Peters 2010-07-23 08:41:53 UTC
I don't know the history but gobjectnotifyqueue.c (.c!) is listed in "GObject library header files for public installation" (gobject/Makefile.am, gobject_public_h_sources), and it's included in gtk/gtkwidget.c

The removal of g_object_notify_queue_from_object and g_object_notify_queue_clear makes gtk+ build  fail.

libtool: link: gcc -std=gnu99 -DGDK_PIXBUF_DISABLE_DEPRECATED -DG_DISABLE_DEPRECATED -g -O2 -Wall -o .libs/gtk-query-immodules-3.0 queryimmodules.o -pthread  -L/home/jhbuild/sid-ia32/build/lib ./.libs/libgtk-x11-3.0.so ../gdk/.libs/libgdk-x11-3.0.so /home/jhbuild/sid-ia32/build/lib/libpangocairo-1.0.so -lX11 -lXcomposite -lXdamage -lXfixes /home/jhbuild/sid-ia32/build/lib/libatk-1.0.so /home/jhbuild/sid-ia32/build/lib/libcairo.so /home/jhbuild/sid-ia32/build/lib/libgdk_pixbuf-2.0.so -lm -lpng12 /home/jhbuild/sid-ia32/build/lib/libgio-2.0.so /home/jhbuild/sid-ia32/build/lib/libpangoft2-1.0.so /home/jhbuild/sid-ia32/build/lib/libpango-1.0.so /usr/lib/libfreetype.so /home/jhbuild/sid-ia32/build/lib/libfontconfig.so /home/jhbuild/sid-ia32/build/lib/libgobject-2.0.so /home/jhbuild/sid-ia32/build/lib/libgmodule-2.0.so /home/jhbuild/sid-ia32/build/lib/libgthread-2.0.so -lrt /home/jhbuild/sid-ia32/build/lib/libglib-2.0.so -pthread -Wl,-rpath -Wl,/home/jhbuild/sid-ia32/build/lib
./.libs/libgtk-x11-3.0.so: undefined reference to `g_object_notify_queue_from_object'
./.libs/libgtk-x11-3.0.so: undefined reference to `g_object_notify_queue_clear'
Comment 76 Benjamin Otte (Company) 2010-07-23 08:53:56 UTC
Woah, omg, you are right.
It'd never have occured to me that we indeed _install_ a _source_ file.

I've pushed commits to fix this now.
Comment 77 Ralf Ebert 2010-09-08 12:42:27 UTC
Thanks for fixing this; the commits can be applied to glib-2-24 using 'git cherry-pick a2c5bba~1..f6d3e22'; this fixed bug 628775 for me.
Comment 78 Hieu T. Le 2012-09-05 03:57:31 UTC
I want to move this patch to GLib 2-20, I don't understand when to use TRUE/FALSE for the create parameter.
Comment 79 Hieu T. Le 2012-09-05 09:01:04 UTC
Especially in following function

void
g_object_notify (GObject     *object,
		 const gchar *property_name)
{
  GParamSpec *pspec;
  
  g_return_if_fail (G_IS_OBJECT (object));
  g_return_if_fail (property_name != NULL);
  if (g_atomic_int_get (&object->ref_count) == 0)
    return;
  
  g_object_ref (object);
  /* We don't need to get the redirect target
   * (by, e.g. calling g_object_class_find_property())
   * because g_object_notify_queue_add() does that
   */
  pspec = g_param_spec_pool_lookup (pspec_pool,
				    property_name,
				    G_OBJECT_TYPE (object),
				    TRUE);

  if (!pspec)
    g_warning ("%s: object class `%s' has no property named `%s'",
	       G_STRFUNC,
	       G_OBJECT_TYPE_NAME (object),
	       property_name);
  else
    {
      GObjectNotifyQueue *nqueue;
      
      nqueue = g_object_notify_queue_freeze (object, &property_notify_context,
                                             TRUE);

>> TRUE or FALSE here????!!

      g_object_notify_queue_add (object, nqueue, pspec);
      g_object_notify_queue_thaw (object, nqueue);
    }
  g_object_unref (object);
}