After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 781601 - Race in g_dbus_server_stop causing spurious GDBus test failures
Race in g_dbus_server_stop causing spurious GDBus test failures
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gio
2.52.x
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2017-04-21 23:06 UTC by Jan Alexander Steffens (heftig)
Modified: 2017-05-11 08:11 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gmain: Allow GSource methods to be called from a finalize() callback (3.27 KB, patch)
2017-04-24 20:46 UTC, Philip Withnall
committed Details | Review

Description Jan Alexander Steffens (heftig) 2017-04-21 23:06:35 UTC
Since 2.52.1 I'm seeing spurious test failures from the GDBus tests due to CRITICALs happening in g_dbus_server_stop. I think a GCancellable gets cancelled while another thread is finalizing one of its GCancellableSources.

GLib-CRITICAL **: g_source_set_ready_time: assertion 'source->ref_count > 0' failed

  • #0 _g_log_abort
    from /build/glib2/src/glib/glib/.libs/libglib-2.0.so.0
  • #1 g_logv
    from /build/glib2/src/glib/glib/.libs/libglib-2.0.so.0
  • #2 g_log
    from /build/glib2/src/glib/glib/.libs/libglib-2.0.so.0
  • #3 g_closure_invoke
    from /build/glib2/src/glib/gobject/.libs/libgobject-2.0.so.0
  • #4 signal_emit_unlocked_R
    from /build/glib2/src/glib/gobject/.libs/libgobject-2.0.so.0
  • #5 g_signal_emit_valist
    from /build/glib2/src/glib/gobject/.libs/libgobject-2.0.so.0
  • #6 g_signal_emit
    from /build/glib2/src/glib/gobject/.libs/libgobject-2.0.so.0
  • #7 g_cancellable_cancel
    from /build/glib2/src/glib/gio/.libs/libgio-2.0.so.0
  • #8 set_active
    from /build/glib2/src/glib/gio/.libs/libgio-2.0.so.0
  • #9 g_dbus_server_stop
    from /build/glib2/src/glib/gio/.libs/libgio-2.0.so.0
  • #10 test_nonce_tcp
  • #11 g_test_run_suite_internal
    from /build/glib2/src/glib/glib/.libs/libglib-2.0.so.0
  • #12 g_test_run_suite_internal
    from /build/glib2/src/glib/glib/.libs/libglib-2.0.so.0
  • #13 g_test_run_suite
    from /build/glib2/src/glib/glib/.libs/libglib-2.0.so.0
  • #14 g_test_run
    from /build/glib2/src/glib/glib/.libs/libglib-2.0.so.0
  • #15 main

I think the fix for bug 778049 (commit e4ce400e8f7) uncovered this problem as g_source_is_destroyed just accesses the flags directly and doesn't bother checking the refcount, so the call to g_source_set_ready_time was avoided.
Comment 1 Philip Withnall 2017-04-24 18:18:43 UTC
I’m looking at this now. Definitely looks like a race; it can be reproduced about 50% of times when running:

G_TEST_SRCDIR=$srcdir/glib/gio/tests G_TEST_BUILDDIR=$builddir/glib/gio/tests libtool exec gdb --args /opt/gnome/build/glib/gio/tests/gdbus-peer

Backtrace with threads:

(gdb) t a a bt

Thread 1 (Thread 0x7ffff7fcf400 (LWP 4904))

  • #0 _g_log_abort
    at /opt/gnome/source/glib/glib/gmessages.c line 549
  • #1 g_logv
    at /opt/gnome/source/glib/glib/gmessages.c line 1357
  • #2 g_log
    at /opt/gnome/source/glib/glib/gmessages.c line 1398
  • #3 g_return_if_fail_warning
    at /opt/gnome/source/glib/glib/gmessages.c line 2687
  • #4 g_source_set_ready_time
    at /opt/gnome/source/glib/glib/gmain.c line 1846
  • #5 cancellable_source_cancelled
    at /opt/gnome/source/glib/gio/gcancellable.c line 653
  • #6 g_cclosure_marshal_VOID__VOID
    at /opt/gnome/source/glib/gobject/gmarshal.c line 875
  • #7 g_closure_invoke
    at /opt/gnome/source/glib/gobject/gclosure.c line 804
  • #8 signal_emit_unlocked_R
    at /opt/gnome/source/glib/gobject/gsignal.c line 3635
  • #9 g_signal_emit_valist
    at /opt/gnome/source/glib/gobject/gsignal.c line 3391
  • #10 g_signal_emit
    at /opt/gnome/source/glib/gobject/gsignal.c line 3447
  • #11 g_cancellable_cancel
    at /opt/gnome/source/glib/gio/gcancellable.c line 508
  • #12 set_active
    at /opt/gnome/source/glib/gio/gsocketservice.c line 163
  • #13 g_socket_service_stop
    at /opt/gnome/source/glib/gio/gsocketservice.c line 299
  • #14 g_dbus_server_stop
    at /opt/gnome/source/glib/gio/gdbusserver.c line 613
  • #15 test_nonce_tcp
    at /opt/gnome/source/glib/gio/tests/gdbus-peer.c line 1352
  • #16 test_case_run
    at /opt/gnome/source/glib/glib/gtestutils.c line 2161
  • #17 g_test_run_suite_internal
    at /opt/gnome/source/glib/glib/gtestutils.c line 2244
  • #18 g_test_run_suite_internal
    at /opt/gnome/source/glib/glib/gtestutils.c line 2256
  • #19 g_test_run_suite
    at /opt/gnome/source/glib/glib/gtestutils.c line 2332
  • #20 g_test_run
    at /opt/gnome/source/glib/glib/gtestutils.c line 1599
  • #21 main
    at /opt/gnome/source/glib/gio/tests/gdbus-peer.c line 1756

Comment 2 Philip Withnall 2017-04-24 20:46:30 UTC
Created attachment 350339 [details] [review]
gmain: Allow GSource methods to be called from a finalize() callback

Temporarily increase the ref count of a GSource to 1 while calling its
finalize() callback, so that the finalize() implementation can call
GSource methods (like g_source_set_ready_time()) without causing
critical warnings. It’s safe to call those methods at this point, as the
source has been destroyed, but nothing has been freed.

This is an indirect way of fixing a race between GCancellable and
GCancellableSource, whereby the GCancellable::cancelled callback for the
GCancellableSource is not disconnected until the GCancellableSource’s
finalize() function is called. Previously, this meant there was a window
in which the GCancellableSource’s ref count was 0, but the ::cancelled
callback was still connected, and could legitimately be called as a
result of another thread calling g_cancellable_cancel() on the
GCancellable. The callback calls g_source_set_ready_time() on the
GSource, and there’s no thread-safe way of checking whether the GSource
has been destroyed. Instead, we have to change GSource so its ref count
is only decremented to 0 inside the locked section in
g_source_unref_internal() *after* the finalize() function has been
called, and hence after the GCancellable::cancelled callback has been
disconnected. The use of g_cancellable_disconnect() ensures that the
callback disconnection is thread safe.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
Comment 3 Philip Withnall 2017-04-24 20:49:31 UTC
Addendum to the big commit message: ideally, what we want to do is call the code from cancellable_source_finalize() as soon as the GCancellableSource is destroyed (i.e. before the final g_source_unref() call on it), but there is no way to achieve this in GSource — and I can’t think of a way to add support for it without breaking API, since we can’t change the size of GSourceFuncs. We can only subtly change the semantics of GSourceFuncs.finalize; this change should be safe, since it’s relaxing a previous restriction (being unable to call GSource methods from within a finalize() implementation) rather than adding any new restrictions.
Comment 4 Matthias Clasen 2017-04-25 00:58:42 UTC
Review of attachment 350339 [details] [review]:

I can't say I like it much, but sine it seems to fix the problem.

Thanks for the detailed commit message.
Comment 5 Colin Walters 2017-04-25 01:06:05 UTC
Review of attachment 350339 [details] [review]:

Looks sane to me.

::: glib/gmain.c
@@ +2121,3 @@
 	  if (context)
 	    LOCK_CONTEXT (context);
+          source->ref_count--;

We don't technically need to do this, right?  But we might as well I guess, otherwise people in gdb/debuggers could get confused.
Comment 6 Philip Withnall 2017-04-25 09:20:23 UTC
Review of attachment 350339 [details] [review]:

::: glib/gmain.c
@@ +2121,3 @@
 	  if (context)
 	    LOCK_CONTEXT (context);
+          source->ref_count--;

I don’t *think* we need to do it, but I guess there could be parts of g_source_unref_internal() below which expect the ref count to be 0. Debugging is also a good reason to keep things symmetric here.
Comment 7 Philip Withnall 2017-04-25 09:21:34 UTC
Attachment 350339 [details] pushed as 281e301 - gmain: Allow GSource methods to be called from a finalize() callback
Comment 8 Jan Alexander Steffens (heftig) 2017-05-09 15:44:29 UTC
Can this be backported to 2.52?
Comment 9 Philip Withnall 2017-05-11 08:11:55 UTC
Pushed to glib-2-52 as c589084.