After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 665211 - GDBusConnection singleton access can race with disposal
GDBusConnection singleton access can race with disposal
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gdbus
2.30.x
Other Linux
: Normal normal
: ---
Assigned To: Simon McVittie
gtkdev
Depends on: 548954
Blocks:
 
 
Reported: 2011-11-30 15:28 UTC by Simon McVittie
Modified: 2012-01-02 17:30 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Test case (3.14 KB, text/plain)
2011-11-30 19:05 UTC, Simon McVittie
  Details
GDBusConnection: use GWeakRef to make the singletons thread-safe (3.52 KB, patch)
2011-12-07 15:06 UTC, Simon McVittie
committed Details | Review
Add test for GDBusConnection singleton access racing with destruction (4.34 KB, patch)
2011-12-07 15:07 UTC, Simon McVittie
committed Details | Review

Description Simon McVittie 2011-11-30 15:28:48 UTC
I've been investigating a threaded GDBus crash on and off for several weeks, and I think I've pinned down how it can happen. Not sure yet how to solve it or reproduce it reliably, though...

Consider this ASCII-art sequence diagram:

    "unref thread"        message_bus_lock        "get thread"

    g_object_unref                             g_bus_get_sync
      g_atomic_int_get(ref_count)      G          lock
        returns 1                      |          message_bus_get_singleton
      g_dbus_connection_dispose        |          singleton is non-NULL
        lock (will block)              |          g_object_ref
        ...                            |            ref_count 1 -> 2
        (lock obtained)              U G          unlock
                                     |
        the_system_bus = NULL        |            enter initable_init
        conn->worker = NULL          |
        unlock                       U            worker == NULL => SIGSEGV
      g_atomic_int_get(ref_count)
        returns 2
        (do not finalize)

My crash is for the system bus, but the other singleton (the_session_bus) is equally vulnerable to it.

Weak refs would not be a sufficient solution, because weak refs aren't notified until after dispose() has finished.

One possibility would be to make the_system_bus and the_session_bus into toggle refs, which get notified before dispose().

If we did that, and there was also a language binding using a toggle ref to the GDBusConnection, then the GDBusConnection would never be freed, because toggle refs are strong, and if you have more than one, they'll never be notified. I believe this is to avoid reentrancy in toggle_refs_notify().

It's unclear whether this would be a problem or not - the corresponding globals in libdbus are never freed unless the connection gets closed (and you're not allowed to close them from the libdbus side, so they only ever close if the peer closes them).
Comment 1 Simon McVittie 2011-11-30 15:31:44 UTC
(In reply to comment #0)
> Weak refs would not be a sufficient solution, because weak refs aren't
> notified until after dispose() has finished.

If the GObject gurus were willing to add a new sort of weak ref that is triggered before anything is dispose()'d (at the same stage of g_object_unref as toggle refs), that would be a sufficient solution here, though.
Comment 2 Simon McVittie 2011-11-30 19:05:53 UTC
Created attachment 202476 [details]
Test case

This test case reliably (5/5) segfaults on my laptop, using Debian's GLib 2.30.2-4. The number of runs and the sleep times might need adjustment on other hardware.

Backtrace for the "get" thread:

  • #0 _g_dbus_worker_ref
    at /tmp/buildd/glib2.0-2.30.2/./gio/gdbusprivate.c line 440
  • #1 _g_dbus_worker_send_message
    at /tmp/buildd/glib2.0-2.30.2/./gio/gdbusprivate.c line 1577
  • #2 g_dbus_connection_send_message_unlocked
    at /tmp/buildd/glib2.0-2.30.2/./gio/gdbusconnection.c line 1614
  • #3 g_dbus_connection_send_message_with_reply_unlocked
    at /tmp/buildd/glib2.0-2.30.2/./gio/gdbusconnection.c line 1903
  • #4 g_dbus_connection_send_message_with_reply
    at /tmp/buildd/glib2.0-2.30.2/./gio/gdbusconnection.c line 2012
  • #5 g_dbus_connection_send_message_with_reply_sync
    at /tmp/buildd/glib2.0-2.30.2/./gio/gdbusconnection.c line 2166
  • #6 g_dbus_connection_call_sync_internal
    at /tmp/buildd/glib2.0-2.30.2/./gio/gdbusconnection.c line 5428
  • #7 g_dbus_connection_call_sync
    at /tmp/buildd/glib2.0-2.30.2/./gio/gdbusconnection.c line 5650
  • #8 ensure_connection_works
  • #9 get_sync_in_thread
  • #10 g_thread_create_proxy
    at /tmp/buildd/glib2.0-2.30.2/./glib/gthread.c line 1962
  • #11 start_thread
    at pthread_create.c line 304
  • #12 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
  • #13 ??

Comment 3 Simon McVittie 2011-11-30 19:18:28 UTC
(In reply to comment #2)
> Backtrace for the "get" thread

This isn't exactly the same crash as in the diagram in Comment #0 - in this one, get + initable_init() won the race, but then the connection was disposed in the main thread anyway, and ensure_connection_works() failed (despite its thread owning a ref). I've seen both in automated crash reports, and the cause is basically the same.
Comment 4 Benjamin Otte (Company) 2011-12-04 13:07:47 UTC
I think one correct thing to do here is in dispose:

LOCK();
if (refcount != 1)
  return;
the_system_bus = NULL;
UNLOCK();

gobject has code in place that ensures finalize is only called if there's no references after a dispose.


Another correct thing to do would be using a toggle reference that ensures the object is no longer the system bus before dispose is called. But I have no clue how that interacts with bindings.
Comment 5 Simon McVittie 2011-12-05 11:25:00 UTC
(In reply to comment #4)
> I think one correct thing to do here is in dispose:
> 
> LOCK();
> if (refcount != 1)
>   return;
> the_system_bus = NULL;
> UNLOCK();

Assuming you meant "if (refcount != 1) { UNLOCK(); return; }": yes, that would work, I think. My misgivings about this are:

* refcount is (in theory) private;
* having the object abort dispose() and not chain up seems like a
  rule-breaking thing for a subclass to do;
* if you have a subclass of GDC (although I don't think that's possible
  right now, at least without using g-i, because the struct is private),
  then it's too late to avoid discarding the subclass's state.

> Another correct thing to do would be using a toggle reference that ensures the
> object is no longer the system bus before dispose is called. But I have no clue
> how that interacts with bindings.

As far as I can tell from reading gobject.c, the answer is: if a binding uses a toggle ref and GDC also uses a toggle ref, then the GDC object will never die, because toggle refs only trigger when the refcount is exactly 1. This is already the case if two languages both use toggle refs for their proxy objects, though. I don't know which bindings, if any, actually use toggle refs.

I'm currently looking into atomic/thread-safe weak refs similar to boost::shared_ptr, over on Bug #548954. If that fails or turns out to have its own problems, I'll try the two approaches you suggested here.
Comment 6 Simon McVittie 2011-12-07 15:06:33 UTC
Created attachment 202994 [details] [review]
GDBusConnection: use GWeakRef to make the singletons thread-safe

---

This requires my patch series from Bug #548954, and seems to work... but Bug #548954 will need thorough review from GObject experts, since it touches GObject core code.
Comment 7 Simon McVittie 2011-12-07 15:07:41 UTC
Created attachment 202995 [details] [review]
Add test for GDBusConnection singleton access racing with destruction

---

This is basically Attachment #202476 [details], incorporated into an existing GTester test.
Comment 8 David Zeuthen (not reading bugmail) 2011-12-07 15:38:05 UTC
I like the GWeakRef approach. Let's wait for bug 548954 to get resolved then we can commit this one. Thanks.
Comment 9 Allison Karlitskaya (desrt) 2012-01-02 17:29:43 UTC
Comment on attachment 202994 [details] [review]
GDBusConnection: use GWeakRef to make the singletons thread-safe

A slightly modified version of this patch has been committed.
Comment 10 Allison Karlitskaya (desrt) 2012-01-02 17:30:39 UTC
Attachment 202995 [details] pushed as 34e3881 - Add test for GDBusConnection singleton access racing with destruction