GNOME Bugzilla – Bug 781601
Race in g_dbus_server_stop causing spurious GDBus test failures
Last modified: 2017-05-11 08:11:55 UTC
Since 2.52.1 I'm seeing spurious test failures from the GDBus tests due to CRITICALs happening in g_dbus_server_stop. I think a GCancellable gets cancelled while another thread is finalizing one of its GCancellableSources. GLib-CRITICAL **: g_source_set_ready_time: assertion 'source->ref_count > 0' failed
+ Trace 237386
I think the fix for bug 778049 (commit e4ce400e8f7) uncovered this problem as g_source_is_destroyed just accesses the flags directly and doesn't bother checking the refcount, so the call to g_source_set_ready_time was avoided.
I’m looking at this now. Definitely looks like a race; it can be reproduced about 50% of times when running: G_TEST_SRCDIR=$srcdir/glib/gio/tests G_TEST_BUILDDIR=$builddir/glib/gio/tests libtool exec gdb --args /opt/gnome/build/glib/gio/tests/gdbus-peer Backtrace with threads: (gdb) t a a bt
+ Trace 237387
Thread 1 (Thread 0x7ffff7fcf400 (LWP 4904))
Created attachment 350339 [details] [review] gmain: Allow GSource methods to be called from a finalize() callback Temporarily increase the ref count of a GSource to 1 while calling its finalize() callback, so that the finalize() implementation can call GSource methods (like g_source_set_ready_time()) without causing critical warnings. It’s safe to call those methods at this point, as the source has been destroyed, but nothing has been freed. This is an indirect way of fixing a race between GCancellable and GCancellableSource, whereby the GCancellable::cancelled callback for the GCancellableSource is not disconnected until the GCancellableSource’s finalize() function is called. Previously, this meant there was a window in which the GCancellableSource’s ref count was 0, but the ::cancelled callback was still connected, and could legitimately be called as a result of another thread calling g_cancellable_cancel() on the GCancellable. The callback calls g_source_set_ready_time() on the GSource, and there’s no thread-safe way of checking whether the GSource has been destroyed. Instead, we have to change GSource so its ref count is only decremented to 0 inside the locked section in g_source_unref_internal() *after* the finalize() function has been called, and hence after the GCancellable::cancelled callback has been disconnected. The use of g_cancellable_disconnect() ensures that the callback disconnection is thread safe. Signed-off-by: Philip Withnall <withnall@endlessm.com>
Addendum to the big commit message: ideally, what we want to do is call the code from cancellable_source_finalize() as soon as the GCancellableSource is destroyed (i.e. before the final g_source_unref() call on it), but there is no way to achieve this in GSource — and I can’t think of a way to add support for it without breaking API, since we can’t change the size of GSourceFuncs. We can only subtly change the semantics of GSourceFuncs.finalize; this change should be safe, since it’s relaxing a previous restriction (being unable to call GSource methods from within a finalize() implementation) rather than adding any new restrictions.
Review of attachment 350339 [details] [review]: I can't say I like it much, but sine it seems to fix the problem. Thanks for the detailed commit message.
Review of attachment 350339 [details] [review]: Looks sane to me. ::: glib/gmain.c @@ +2121,3 @@ if (context) LOCK_CONTEXT (context); + source->ref_count--; We don't technically need to do this, right? But we might as well I guess, otherwise people in gdb/debuggers could get confused.
Review of attachment 350339 [details] [review]: ::: glib/gmain.c @@ +2121,3 @@ if (context) LOCK_CONTEXT (context); + source->ref_count--; I don’t *think* we need to do it, but I guess there could be parts of g_source_unref_internal() below which expect the ref count to be 0. Debugging is also a good reason to keep things symmetric here.
Attachment 350339 [details] pushed as 281e301 - gmain: Allow GSource methods to be called from a finalize() callback
Can this be backported to 2.52?
Pushed to glib-2-52 as c589084.