GNOME Bugzilla – Bug 166020
use GAtomic for refcounting
Last modified: 2012-09-05 09:01:04 UTC
Please describe the problem: Various structures are refcounted in an unsafe way. This makes multithreaded applications fail at random. The most notable cases are signal emision (which refcounts various objects) and GObject refcounting. Steps to reproduce: Attached path contains testcases to show the problems. Actual results: Expected results: Does this happen every time? Other information:
Created attachment 36860 [details] [review] proposed patch using GAtomic Since the refcount field of closures is using bitfields, a lock is used to protect the refcount. An internal structure in gsignal.c was grew in size by making the refcount a full gint. The patch allows for inlining of the atomic operations. Atomic refcounting has a performance overhead. This patch is ABI and API compatible with 2.6.1, gnome runs fine with it.
The patch needs explanation and perhaps splitting into smaller patches. I'd rather see it done *without* inlining the atomic ops first, then we can look at whether inlining the atomic ops with libgobject makes sense from a performance point of view.
Wim's original patch was without inlining and it made the testcase run things 6 times slower compared to the 2 times slower of the inlining version (compared to current implementation)
Are you saying that the current implementation makes signal emission two times slower? Any explanation for the why the lack of inlining would make things so much worse? Is it PLT overhead for calls between libraries? (Simple function call overhead doesn't make sense ... making something that much slower would imply that almost 80+% of the total number of function calls in the test case are to the atomic operations.
seeing the test case might help to understand this...
The slowdown is noted in the very synthetic testcase objects2.c, which continuously refs and unrefs a single object, so yes more than 80% of the calls are atomic operations. Running gnome desktop with the patched glib does not show any noticable slowdown.
Created attachment 36907 [details] [review] updated patch without inlining of atomic operations New patch that does not include the inlining of the atomic operations.
Created attachment 36927 [details] [review] Patch to add inlining Patch to add inlining
Created attachment 36938 [details] [review] add refcounting testcases The problem ----------- A typical _ref function looks like this: _ref (object) { ... object->refcount++; .. } which on x86 translates to: mov 0x4(%esi),%eax inc %eax mov %eax,0x4(%esi) When two threads both call the _ref function at the same time undefined behaviour can occur depending on where the context switch occurs. Mostly the result is that one object does not get its refcount incremented. The sequence is as follows: <thread 1> mov 0x4(%esi),%eax ; eax contains refcount ****context switch here, thread2 starts to run*** <thread 2> mov 0x4(%esi),%eax ; eax contains refcount inc %eax ; eax contains refcount+1 mov %eax,0x4(%esi) ; refcount = refcount+1 .... ****context switch here, thread1 continues *** <thread 1> inc %eax ; eax contains refcount+1 mov %eax,0x4(%esi) ; refcount = refcount+1 After switching back to thread1 the refcount increment performed by thread2 is undone, resulting in a lost refcount. Similar problems exist un the _unref functions. Testcases --------- The testcases in this patch exposes the following bugs: objects Create two threads that ref/unref the object. no object should be disposed. The testcase crashes almost instantly because of the above explanation. objects2 ref/unref a single object from a single thread. Used to benchmark different refcounting implementations. signal1 (from signals.c) Fire a signal from two objects of the same type from 40 threads. Does not crash very often because the window is very small but it occasionally crashes because of unsafe unreffing of the shared closure. signal2 (from signals.c) Fire a signal from two objects of the same type from 40 threads with a custom handler in the object. Segfaults almost immediatly. signal3 (from signals.c) Fire a property notify signal. Segfaults almost immediatly. Solution -------- Solution -------- A solution to this problem is making the increment one atomic operation by either using a lock around the increment/decrement-test code or by using the faster GAtomic primitives. The proposed patch uses GAtomic to make the refcounting safe and uses a mutex where the refcount is not manageble by a GAtomic (gclosure.h where the ref_count is expressed in a bitfield). Ideally the lock cases should be converted to GAtomic too but that would mean an ABI breakage. Performance evaluation ---------------------- Preformance can be measured with a microbenchmark (objects2) that does nothing else than ref/unref an object from a single thread. Tests ran on pentium 4 HT 2.8GHz. unpatched glib 2.6.1 >>time ./objects2 init 0x804ef78 real 0m6.085s user 0m5.653s sys 0m0.016s patched glib 2.6.1 with atomic refcount >>time ./objects2 init 0x804ef98 real 0m31.695s user 0m31.620s sys 0m0.014s Performance negatively inpacted because of the extra function calls made for each ref/unref. patched glib 2.6.1 with atomic refcount inlined >>time ./objects2 init 0x804ef98 real 0m13.226s user 0m13.204s sys 0m0.016s Performance negatively inpacted because of the extra lock; assembly call that synchronizes the CPU caches. ABI/API ------- no ABI/API changes.
Owen and Matthias, are you happy about this 3-way split of the patch? or do you want even further splitting?
The splitting looks fine to me. What I discussed with Owen yesterday, and what needs to be addressed in a different way, is the closures. We certainly don't want to take a lock every time we ref/unref a closure, since that is done a couple of times per signal emission. Breaking the abi is not an option either. So, if there is no way to perform an atomic increment on the low 15 bits of a 32bit value, one option we considered was to make use of the fact that notifiers is a private pointer in the closure struct pointing to an array of various things. So there would be two options for associating a 32bit refcount with a closure: a) make the refcount occupy the first 4 bytes of the memory block pointed to by notifiers b) change struct { ... GClosureNotifyData *notifiers; } GCLosure; to struct { gint ref_count; GClosureNotifyData *notifiers; } GClosurePrivate; struct { ... GClosurePrivate *private; } GClosure;
Option a) might not work since the memblock pointed to by notifiers might change at when a g_renew of the block is performed in various functions. It _is_ possible to use _compare_and_swap to modify just the 15 bits of the ref_count but it would involve finding out where those 15 bits are in the struct, which is not defined by any C standards. Option b) is therefore the most feasable solution IMO.
It would be interesting to see the how the locks affect the timing of the signal tests, btw.
Any news on this ?
Created attachment 38488 [details] [review] the patch
To help this effort, I have implemented option b) to get a full integer ref_count for closures
Created attachment 39517 [details] [review] atomic refcounting patch New patch, uses compare and swap to implement the closure refcounting. Uses a recursive mutex to make the object_notify work reliably. Remove atomic read in the g_assert/g_return_if_fail debug statements.
patch id=39517 is against 2.6.3
We definitely want to use a full int (as the current patch does) because different architectures have different requirements wrt atomic operations. Many need a full word. Others don't have instruction-level support for atomic arithmetic at all and thus need them emulated, which might require creating a lock and thus wasting another 8- or 16-bits. SPARC32, for example, suffers from this affliction. Oh, and depending on usage we probably want to make the ref count volatile.
Wim, ping My understanding from talking with timj is that you will produce another version of the patch ?
Johan, did you mean to move this back to 2.8 API freeze, or did you confuse it with the constructor issue ?
Matthias, Yes, sorry. Moving back to 2.8 Freze.
Created attachment 48828 [details] [review] adds testcases a new patch to add testcases against current HEAD cvs
Created attachment 48829 [details] [review] new patch to make gobject refcounting atomic This is a reviewed patch to add atomic GObject refcounting against current HEAD CVS. GType is not patched anymore. GObject last_unref merged with _unref, taking into account remarks from Tim Janik. Toggle refs are included. This patch does not yet contain the gclosure.c patch. To be done.
Created attachment 49016 [details] [review] includes gclosure.c This patch also includes gclosure.c atomic refcounting.
previous patch is not yet 100% threadsafe.
Wim, is the last patch ready to be sent to timj for review ?
Created attachment 49023 [details] [review] updated patch use atomic reads in closure_invoke_notifiers(). Adding/removing notifiers while invoking is still not threadsafe, is this important?
Wim, +typedef struct +{ + union { + GClosureBits bits; + gint atomic; + } data; +} GAtomicClosureBits; why the extra wrapping in a struct, would typedef union { GClosureBits bits; gint atomic; } not work as well ? Also, should it be gint32, since GClosureBits is 32 bits wide ? But we only have atomic operations on gint. +#define CLOSURE_UNREF(closure, is_zero) \ +G_STMT_START { \ + GClosureBits old, new; \ + do { \ + CLOSURE_READ_BITS2 (closure, &old, &new); \ + if (old.ref_count == 1) /* last unref, invalidate first */ \ + g_closure_invalidate ((closure)); \ + new.ref_count--; \ + is_zero = (new.ref_count == 0); \ + } \ + while (!CLOSURE_SWAP_BITS (closure, &old, &new)); \ +} G_STMT_END Will this not end up invalidating the closure twice, if the loop body gets executed more than once ? Or could it invalidate the closure, but it could still leave the closure with a refcount of 1, if another thread refs the closure between the invalidation and the swap ? There are a few uses of the meta_marshal, is_invalid and in_marshal bits in gsignal.c and gobject.c, which probably need to be done atomically.
Created attachment 49242 [details] [review] another test Here is another test which checks that the notify queue works correctly when multiple threads change properties simultaneously. Segfaults easily without the patch.
Created attachment 49249 [details] [review] updated patch This patch cleans up the struct in union and uses an atomic read in gsignal.c to access the meta_marshal bit.
Committed the last patch and the tests. We may want to add a configure check that gint is 32 bits
matthias, comitting the patch and closing the bugreport as fixed was not of your business, you need to get approval from me first for such changes. regardless of whether you think the final patch still has issues or not (which it has).
i committed the following changes. with that, we have basic atomic reference counting for params and gobject, but not gclosure. also, gobject atomic reference counting conflicts the recently comitted toggle references. Sat Jul 30 21:10:26 2005 Tim Janik <timj@gtk.org> * gobject.c: reverted notify_mutex introduction, since this prevents parallelized setting of object properties on different objects, and serves no apparent purpose (to me at least). g_object_real_dispose(): removed non-atomic reference count modifications. g_object_unref(): make sure the closures array is destroyed when destroying signal handlers. * gparam.c: cosmetic changes. * gsignal.c: comment fixup. allow 16bit blocking count. * gsignal.c: reverted GClosure related changes. * gclosure.c: reverted premature commit of atomic reference counting attempt.
wim, can you please explain why you added notify_mutex locking around accesses to property_notify_context?
Tim, I just love you. Reverting the gclosure changes without further explanation is not bringing us closer to a working solution. Unless you are going to commit your superior solution soon (which I doubt). And cut out that my-business-your-business bickering
FYI, Wim is on vacation for two weeks and might not be able to comment on this before somewhere around the 10th.
I'm a bit confused here. Why is there a 15 day gap between Matthias commiting and Tim complaining ? If I follow correctly, Tim is the maintainer of this chunk of code, but somehow a commit to his code was unnoticed for 2 weeks ? I'm trying to understand how the process should have gone in this case, with a stable release coming up so soon I would expect the maintainer to be following relevant bugs closely enough, and definately commits to his code. As it stands Wim was led to believe that he did all he could to bring the patch to the level expected, and now two weeks after that chunks get reverted at the unfortunate moment he has no chance to reply. How should we proceed here ? Can this still be resolved before the next stable glib release ? What should be done on Wim's end ?
Thomas, Tims notion of maintainership is somewhat different from ours...
Atomic reference counting for closures went into CVS now: Mon Aug 1 23:00:42 2005 Tim Janik <timj@imendio.com> * gclosure.c: turned all modifications to the first 32 integer bits in a closure into atomic accesses. wrapped write accesses into special macros to keep the atomic modification logic in a single place. comment cleanups. Aparently owen has reasonings on why we're save with GObject reference counting and toggle references, owen please comment.
Matthias, if you're unhappy with my way of maintaining GObject, i'm sorry, but if improper, unreviewed code is committed to CVS, it has to be reverted. Thomas, unfortunately there's been some miscommunication going on between me and Matthias. That, and because i'm always pretty busy with lots of other projects i'm also responsible for, has lead into me not noticing the erroneous commit until after aproximately 10 days. This was by no means wim's fault, he's been doing a great job on tackling the atomic ref counting in the first place, and without him, we'd not have this for 2.8. At this point, no further action from him is required, but it'd be nice if he outlined his ideas about notify_mutex.
On toggle references: there are a couple of properties of toggle references that make them reasonably robust against thread safety issues even without explicit locking. 1. It only makes sense for *one* toggle reference to be in place at once, if two toggle references are added to the same object, then neither will ever be notified and the object will never be freed. So, to a first approximation, we can ignore the case where two threads try to add toggle references to an object at the same time. 2. Given the normal and expected use of toggle references, a toggle reference is removed only immediately before the the object is destroyed: that is, when no other reference to the object exist. So will be no thread safety issues with removing toggle references. (http://mail.gnome.org/archives/gtk-devel-list/2005-April/msg00095.html has some diagrams of how toggle references work) Thus, all situations of concern are between toggle_refs_notify() and g_object_add_toggle_ref() or between two invocations of toggle_refs_notify() Possible concerns: - g_object_add_toggle_ref() sets the OBJECT_HAS_TOGGLE_REF_FLAG flag, this is seen in g_object_ref() or g_object_unref() and it runs toggle_refs_notify() before the object data is set. Because g_object_get/set_data() use a mutex, reversing the order of setting the flag and object data should fix this; the mutex establishes sufficient memory coherency even though setting the flag isn't done within the mutex. (Having a mutex lock/unlock between the initial ref() and the subsequent setting the flag in g_object_add_toggle_ref() is also important. It makes sure that the two actions are seen as ordered by other threads so that the toggle reference will never be notified until the calling thread drops the reference count it must have had to call g_object_add_toggle_ref()) - One thread decrements the refcount from 2 to 1, another increments the refcount from 1 to 2, the two subsequent toggle_refs_notify() calls get called in the wrong order. This one is much harder to deal with. I don't have any good ideas to fix it ... you'd probably have to switch to mutex locking for refcounting on objects that have toggle references. My basic conclusion here is that toggle references are probably best not mixed together with atomic reference counts. GStreamer should not directly expose atomic-reference counted objects to language bindings that use toggle references. Note also that atomic-reference counting always causes another problem for toggle references: the toggle notifiers an be called from arbitrary threads, so that the toggle reference code in the language binding would have to be written very carefully. Weak references have a much more straightforward set of problems - add_weak_ref(), remove_weak_ref(), and notification (triggered by g_object_real_dispose(), say) can all happen from any thread at any time, so you'd have to mutex protect everything to get it to work. be called from any thread, so they all need to be
*** Bug 311496 has been marked as a duplicate of this bug. ***
AFAIK (which is not far), declaring the bitfields of GClosure as volatile is insufficient to guarantee that reading them will return up-to-date values (as g_atomic_int_read would). Not sure. Tim? The notify_mutex that was in gobject.c serves the purpose of guaranteeing the consistency of the notify queues if multiple threads are setting properties on the same object. It is a rather coarse lock, yes. I suppose threadsafe property setting/getting is somewhat outside of the scope of this bug though. A bit of background to show where we're coming from -- GStreamer sets up media processing pipelines, and when they get set to PLAYING they create threads and start to run (always in 0.9, only in some cases in 0.8). In 0.8 this is a very tricky operation -- once you set the thing to PLAYING, you can't really do anything with the objects the threads touch because you have no guarantee that what you access is valid or that what you try to set is not corrupting some data structure. So, in 0.9 we tried to redesign the API so that it is MT-safe for every API call. A much simpler statement to make than the one in the past. We have a whole mess of locks and signals and conds underneath to make sure this works while the pipeline spins up, while it's running, when it's not, etc. But without atomic recounting this is just not possible to do. The same is true for signals and properties. If an API call will cause a signal emission or a property notification (not always possible to tell if you call into plugin code), we'd like for the GObject infrastructure in between to not crash. So, that was the purpose of the very coarse notify_mutex, and of having more threads in tests/refcount/signals.c. So although concurrent object access is not normally supported in glib, the signals and property threadsafety patches are make GObject as threadsafe as the object implementation is -- which is more in line with GObject's existence as an open data type (as opposed to hash tables, for instance). The statement of the goals above is true; the implications for what we need out of GLib is only AFAIK -- corrections of my ignorance are of course welcome...
a toggle-reference check is still missing in g_object_unref(): /* decrement the last reference */ is_zero = g_atomic_int_dec_and_test (&object->ref_count); /* may have been re-referenced meanwhile */ if (G_LIKELY (is_zero))
Reading Owen's comments I agree with his prososal to use an object mutex lock to guarantee the toggle_ref_notify() order. functions would go like this: _ref: if (G_UNLIKELY (OBJECT_HAS_TOGGLE_REF (object)) { G_OBJECT_REC_LOCK (object); object->refcount++; /* went from 1->2, notify toggle refs */ if (object->refcount == 2) toggle_refs_notify (object, FALSE); G_OBJECT_REC_UNLOCK (object); } else { g_atomic_int_inc (&object->ref_count); } _unref: if (G_UNLIKELY (OBJECT_HAS_TOGGLE_REF (object)) { G_OBJECT_REC_LOCK (object); object->refcount--; /* went from 2->1, notify toggle refs */ if (object->refcount == 1) { toggle_refs_notify (object, TRUE); } else if (object->refcount == 0) { ... } G_OBJECT_REC_UNLOCK (object); } else { gboolean is_zero; is_zero = g_atomic_int_dec_and_test (&object->ref_count); /* may have been re-referenced meanwhile */ if (G_LIKELY (is_zero)) { ... } } - a global recursive object mutex is used to make sure the refcounting on toggle reffed objects is atomic together with the notify. The mutex is recursive so that the toggle ref handler can _ref/_unref the object. - non toggle reffed objects update the refcount field atomically. It is assumed that modifying the toggle ref on an object happens from one thread. Any comments before I try to implement this?
Wim, I believe Tim had a patch in the queue for handling the toggle references. I fear that he lost interest before finishing it, though. Then we could be in for a long wait again...
Is this bug still valid?
Yes it is... some parts of the patch are obsolete now though. I'll be looking at which parts are still valid in the next days, port it to latest SVN and attach patches for all separate parts of the patch.
Created attachment 110569 [details] [review] gobject-notify.diff Only part of the old patch that is still valid. The GObject notification stuff uses the global property_notify_context variable and if more than one notification is emitted at the same time from different threads weird crashes will happen. We have quite some bugs on GStreamer caused by this.
Some other places that use refcounts without atomic operations are: glib/ghook.c glib/gcache.c glib/gmain.c I'm not sure if these are problems and should be fixed though.
The two testcases are already in SVN since some time it seems.
*** Bug 489682 has been marked as a duplicate of this bug. ***
Moving to the right component. You probably want to poke timj on irc to get the patch reviewed.
(In reply to comment #50) > Created an attachment (id=110569) [edit] > gobject-notify.diff > > Only part of the old patch that is still valid. > > The GObject notification stuff uses the global property_notify_context variable > and if more than one notification is emitted at the same time from different > threads weird crashes will happen. We have quite some bugs on GStreamer caused > by this. Please take a look at GObjectNotifyContext property_notify_context: struct _GObjectNotifyContext { GQuark quark_notify_queue; GObjectNotifyQueueDispatcher dispatcher; GTrashStack *_nqueue_trash; /* unused */ }; quark_notify_queue and dispatcher are setup during class_init, retain their values after, and _nqueue_trash is unused. So this can't be the cause of crshes you're seeing for concurrent notifications. Just to make sure, you're talking about concurrent object notifications in different threads from *different* objects right? I.e. you're not trying to emit notifications from the same object in different threads simultaneously?
Ok, so I've probably misunderstood the code. From looking at it again it seems that this will only lead to problems if it happens on the same object instance from different threads at the same time. So this is simply something you shouldn't do?
(In reply to comment #56) > Ok, so I've probably misunderstood the code. From looking at it again it seems > that this will only lead to problems if it happens on the same object instance > from different threads at the same time. > > So this is simply something you shouldn't do? Objects and classes are simialr to other GLib data structures in that regard. E.g. if you use a GHashTable in 2 threads, you need to make sure to lock access around it in both so only one thread at a time uses the structure. The same is true for object and class structures.
Ok, then this is simply no GLib/GObject bug... let's close it here then ;) Unfortunately this can't be easily worked around in gstreamer so maybe we have to implement our own thread safe properties there in the future.
I disagree. Setting properties on the same object from multiple threads should work as expected without the crashing that happens now. Any locking that needs to be done to protect the object data structures is of course not GObject's task but at least we should make it to the object implementation. I was poking at this some more and came up with a very simple patch that I will attach next.
Created attachment 130890 [details] [review] proposed patch This patch adds a testcase that modifies a property on a testobject from multiple threads. Without the patch this crashes instantly. Also included is a patch to the gobjectnotifyqueue.c so that it does not keep the notify queue on the object itself but as a threadlocal variable. The effect is that when multiple threads freeze and thaw the notify queue they will see a different queue instead of stepping on eachothers toes. The test runs fine after applying this patch.
Created attachment 131515 [details] [review] new proposed patch Updated patch based on Wim's proposed patch. Takes nested notifies on multiple objects into account (test included) and makes the previous test a bit shorter by using G_DEFINE_TYPE() and removing unused dispose() method. (Guess the former should be done for the existing tests as well?)
I can reproduce this bug pretty easily, is it possible to have it reviewed/committed?
Same thing, the patch avoids spurious segfaults in our code.
(In reply to comment #59) > I disagree. Setting properties on the same object from multiple threads should > work as expected without the crashing that happens now. If those multiple threads all make sure to hold a global or object-specific lock around accessing object properties, there won't occour any problems. (In reply to comment #60) > Also included is a patch to the gobjectnotifyqueue.c so that it does not keep > the notify queue on the object itself but as a threadlocal variable. This breaks our API. Currently, thread A may freeze an obejct, thread B can set a property on it, and thread A may then thaw and emit notifies on the object. Such scenario easily occour e.g. with using thread pools. With your patch, thawing doesn't take effect in other threads, and property notifications can be arbitrarily duplicated.
(In reply to comment #64) > (In reply to comment #59) > > I disagree. Setting properties on the same object from multiple threads should > > work as expected without the crashing that happens now. > > If those multiple threads all make sure to hold a global or object-specific > lock around accessing object properties, there won't occour any problems. That doesn't work. What if the same object is used at different places? You propose to always export an object-specific lock everywhere that should be used then? Doesn't sound like a good solution > (In reply to comment #60) > > Also included is a patch to the gobjectnotifyqueue.c so that it does not keep > > the notify queue on the object itself but as a threadlocal variable. > > This breaks our API. Currently, thread A may freeze an obejct, thread B can set > a property on it, and thread A may then thaw and emit notifies on the object. > Such scenario easily occour e.g. with using thread pools. With your patch, > thawing doesn't take effect in other threads, and property notifications can be > arbitrarily duplicated. That's actually a very interesting use case... could you point to some code examples where this is done? Just curious But of course you have a point here, this API behaviour would be broken which is not nice. Did the docs say something about this being possible? ;)
So, I want progress on this. Apparently there is 3 possible behaviors we could implement: 1) Guarantee that the notify is emitted from the thread that modified the property (current patch). This does not keep the guarantee that a frozen object doesn't emit notify signals - or if we want to keep it we must block threads, and we don't want that. 2) Emit all notify events whenever the last object thaws the queue. This can cause multiple emissions of notify events in different threads at the same time, or can just cause one notify event depending on race conditions. But it keeps the current API guarantees. 3) WONTFIX this and not support concurrent property access at all. I have to say that I do like the third solution most and would prefer GStreamer having a nicer API, since both 1) and 2) are ugly solutions to the problem and I can't find a non-ugly one. But in the end this is quite a common bug and rather simple to fix inside gobject and GStreamer is happy no matter what solution we use. So I'd just pick solution 2) and explicitly state that the thread emitting the notify signal is undefined for concurrent access. And I'd pick that solution because it's the one that can guarantee that: - at least one notify is emitted for every property modification - no notify is emitted when the object is frozen - all notifies are emitted when the object is unfrozen
Created attachment 164938 [details] [review] Make GObject property notify thread safe Here is an implementation of the second option, the property notify will be emitted by the last thread to thaw. They will be emitted only once, etc. It seem to keep the current behavior in the single threaded case. And well, the behavior in the multi-threaded case was to segfault randomly, so this is definitely an improvement.
Review of attachment 164938 [details] [review]: This appears to be missing the new test files.
Created attachment 164940 [details] [review] Make GObject property change notifications thread safe Oops, forgot to git-add them..
There's some brokenness from evolution in your patch it seems: - The count variable in g_object_notify_queue_freeze() seems unused - g_object_notify_queue_thaw() does not seem to return a boolean - g_critical("Free queue for %p is larger than 65526", object); should be 65536. I think I'd prefer if it said "Called g_object_notify_queue_freeze() too often. Forgot to call g_object_notify_queue_thaw() or infinite loop?" or something like this, so Joe Random Hacker has a clue what it means. A few observations: - You seem to try to keep the warnings in the current code as they are. As warnings are just a means to tell the user something is fundamentally broken, we just need be sure we do emit warnings, but not that they are the same, which should make the code a lot simpler: - We don't need the create parameter to g_object_notify_queue_freeze(), as we only use FALSE to be able to emit the warning (when the queue must exist anyway). So if we emit a warning in g_object_notify_queue_thaw(), everything should be fine. - The object->ref_count check in g_object_notify_queue_thaw() probably needs to use g_atomic_int_get(). - Related to that check, I think we can move it to the other check at top of the file. Allowing reducing the freeze count to anything but zero looks broken to me. Or am I missing something weird in the object destruction process? And my biggest worry: - Have you (or anybody) benchmarked the effect of a big global lock that we take 2+ times for every property notify?
Created attachment 164972 [details] [review] Make GObject property change notifications thread safe (improved) (In reply to comment #70) > There's some brokenness from evolution in your patch it seems: > - The count variable in g_object_notify_queue_freeze() seems unused Fixed > - g_object_notify_queue_thaw() does not seem to return a boolean Oops. > - g_critical("Free queue for %p is larger than 65526", object); should be > 65536. I think I'd prefer if it said "Called g_object_notify_queue_freeze() too > often. Forgot to call g_object_notify_queue_thaw() or infinite loop?" or > something like this, so Joe Random Hacker has a clue what it means. Improved the warning. > A few observations: > - You seem to try to keep the warnings in the current code as they are. As > warnings are just a means to tell the user something is fundamentally broken, > we just need be sure we do emit warnings, but not that they are the same, which > should make the code a lot simpler: I tried to not change what already works. > - We don't need the create parameter to g_object_notify_queue_freeze(), as we > only use FALSE to be able to emit the warning (when the queue must exist > anyway). So if we emit a warning in g_object_notify_queue_thaw(), everything > should be fine. Actually, _queue_freeze() is now also used as a getter function because the _queue_thaw() function takes a GObjectNotifyQueue as a parameter and it seemed like the best way to get it. > - The object->ref_count check in g_object_notify_queue_thaw() probably needs to > use g_atomic_int_get(). None of the other uses of g_return_if_fail() with the ref_count in gobject.c seem to use the g_atomic_int_get().. but I added it anyway. Better safe than sorry. > - Related to that check, I think we can move it to the other check at top of > the file. Allowing reducing the freeze count to anything but zero looks broken > to me. Or am I missing something weird in the object destruction process? Moved > And my biggest worry: > - Have you (or anybody) benchmarked the effect of a big global lock that we > take 2+ times for every property notify? No, but it already takes a lock twice every time.. g_datalist_id_{get,set}* functions also have a global lock inside. We could reduce the number of times the global lock is taken by adding a lock inside the GObjectNotifyQueue struct and then using it the _add_() function and by making the freeze_count atomic (so we don't have to take the lock in _thaw() unless its the last one). That said, I'm not sure at all if its worth it. In single threaded cases, there is no cost. In multi-threaded cases, I think its a lot better to contend than to crash...
It's really great to see the renewed efforts to fix this bug. (Thanks Olivier!) I realize the patch hasn't been benchmarked yet, but since I don't see any alternate proposals for implementing correct locking behavior, could we go with the thread-safe solution for now, and optimize it only if necessary?
commit f6d3e224dfa9e8b69403a8c79a27a58b5c9f66b7 Author: Olivier Crête <olivier.crete@collabora.co.uk> Date: Thu Jul 22 21:52:54 2010 +0200 notify: Add tests for threadsafe object notifies https://bugzilla.gnome.org/show_bug.cgi?id=166020 commit 0483ef000ae323415bb0c7b762dd587f1384b1f3 Author: Benjamin Otte <otte@redhat.com> Date: Thu Jul 22 21:51:28 2010 +0200 notify: Make dedup code clearer A for loops seems easier to understand to me than gotos. commit 9026b11e380a9b6479c0fcb852a5aba297ed9dd4 Author: Olivier Crête <olivier.crete@collabora.co.uk> Date: Thu Jul 22 21:26:32 2010 +0200 notify: Make GObject property change notifications thread safe Adds locking around object property change notification handling. The notifications are only emitted after all threads have called g_object_thaw_notify(). https://bugzilla.gnome.org/show_bug.cgi?id=166020 commit 65797f7e54061a21866261d00a1c5533cd6a4f81 Author: Olivier Crête <olivier.crete@collabora.co.uk> Date: Thu Jul 22 21:20:35 2010 +0200 notify: Remove g_object_notify_queue_from_object() The function will not be safe with object locking in place, so we remove it. The workaround is somewhat ugly, but it works. https://bugzilla.gnome.org/show_bug.cgi?id=166020 commit 83026092eba98c2e6e0a96112555f980382a99e7 Author: Olivier Crête <olivier.crete@collabora.co.uk> Date: Thu Jul 22 20:25:00 2010 +0200 notify: Refactor g_object_notify_queue_thaw() This adds better error reporting and simplifies the code for adding thread safety. https://bugzilla.gnome.org/show_bug.cgi?id=166020 commit 0201a81f04ae5fd46c6db25859e90dde1fc35d4b Author: Olivier Crête <olivier.crete@collabora.co.uk> Date: Thu Jul 22 09:02:18 2010 +0200 notify: Remove unused g_object_notify_queue_clear() https://bugzilla.gnome.org/show_bug.cgi?id=166020 commit a2c5bba31d19dcd7f3726c99280ff794cbf744a9 Author: Benjamin Otte <otte@redhat.com> Date: Thu Jul 22 08:52:25 2010 +0200 notify: Emit a g_critical() instead g_return_if_fail()ing This does not change the code semantically in any way but avoids a return in the middle of the code. https://bugzilla.gnome.org/show_bug.cgi?id=166020
Comment on attachment 164972 [details] [review] Make GObject property change notifications thread safe (improved) I basically committed this in a split form and did some cleanups.
I don't know the history but gobjectnotifyqueue.c (.c!) is listed in "GObject library header files for public installation" (gobject/Makefile.am, gobject_public_h_sources), and it's included in gtk/gtkwidget.c The removal of g_object_notify_queue_from_object and g_object_notify_queue_clear makes gtk+ build fail. libtool: link: gcc -std=gnu99 -DGDK_PIXBUF_DISABLE_DEPRECATED -DG_DISABLE_DEPRECATED -g -O2 -Wall -o .libs/gtk-query-immodules-3.0 queryimmodules.o -pthread -L/home/jhbuild/sid-ia32/build/lib ./.libs/libgtk-x11-3.0.so ../gdk/.libs/libgdk-x11-3.0.so /home/jhbuild/sid-ia32/build/lib/libpangocairo-1.0.so -lX11 -lXcomposite -lXdamage -lXfixes /home/jhbuild/sid-ia32/build/lib/libatk-1.0.so /home/jhbuild/sid-ia32/build/lib/libcairo.so /home/jhbuild/sid-ia32/build/lib/libgdk_pixbuf-2.0.so -lm -lpng12 /home/jhbuild/sid-ia32/build/lib/libgio-2.0.so /home/jhbuild/sid-ia32/build/lib/libpangoft2-1.0.so /home/jhbuild/sid-ia32/build/lib/libpango-1.0.so /usr/lib/libfreetype.so /home/jhbuild/sid-ia32/build/lib/libfontconfig.so /home/jhbuild/sid-ia32/build/lib/libgobject-2.0.so /home/jhbuild/sid-ia32/build/lib/libgmodule-2.0.so /home/jhbuild/sid-ia32/build/lib/libgthread-2.0.so -lrt /home/jhbuild/sid-ia32/build/lib/libglib-2.0.so -pthread -Wl,-rpath -Wl,/home/jhbuild/sid-ia32/build/lib ./.libs/libgtk-x11-3.0.so: undefined reference to `g_object_notify_queue_from_object' ./.libs/libgtk-x11-3.0.so: undefined reference to `g_object_notify_queue_clear'
Woah, omg, you are right. It'd never have occured to me that we indeed _install_ a _source_ file. I've pushed commits to fix this now.
Thanks for fixing this; the commits can be applied to glib-2-24 using 'git cherry-pick a2c5bba~1..f6d3e22'; this fixed bug 628775 for me.
I want to move this patch to GLib 2-20, I don't understand when to use TRUE/FALSE for the create parameter.
Especially in following function void g_object_notify (GObject *object, const gchar *property_name) { GParamSpec *pspec; g_return_if_fail (G_IS_OBJECT (object)); g_return_if_fail (property_name != NULL); if (g_atomic_int_get (&object->ref_count) == 0) return; g_object_ref (object); /* We don't need to get the redirect target * (by, e.g. calling g_object_class_find_property()) * because g_object_notify_queue_add() does that */ pspec = g_param_spec_pool_lookup (pspec_pool, property_name, G_OBJECT_TYPE (object), TRUE); if (!pspec) g_warning ("%s: object class `%s' has no property named `%s'", G_STRFUNC, G_OBJECT_TYPE_NAME (object), property_name); else { GObjectNotifyQueue *nqueue; nqueue = g_object_notify_queue_freeze (object, &property_notify_context, TRUE); >> TRUE or FALSE here????!! g_object_notify_queue_add (object, nqueue, pspec); g_object_notify_queue_thaw (object, nqueue); } g_object_unref (object); }