GNOME Bugzilla – Bug 665211
GDBusConnection singleton access can race with disposal
Last modified: 2012-01-02 17:30:42 UTC
I've been investigating a threaded GDBus crash on and off for several weeks, and I think I've pinned down how it can happen. Not sure yet how to solve it or reproduce it reliably, though... Consider this ASCII-art sequence diagram: "unref thread" message_bus_lock "get thread" g_object_unref g_bus_get_sync g_atomic_int_get(ref_count) G lock returns 1 | message_bus_get_singleton g_dbus_connection_dispose | singleton is non-NULL lock (will block) | g_object_ref ... | ref_count 1 -> 2 (lock obtained) U G unlock | the_system_bus = NULL | enter initable_init conn->worker = NULL | unlock U worker == NULL => SIGSEGV g_atomic_int_get(ref_count) returns 2 (do not finalize) My crash is for the system bus, but the other singleton (the_session_bus) is equally vulnerable to it. Weak refs would not be a sufficient solution, because weak refs aren't notified until after dispose() has finished. One possibility would be to make the_system_bus and the_session_bus into toggle refs, which get notified before dispose(). If we did that, and there was also a language binding using a toggle ref to the GDBusConnection, then the GDBusConnection would never be freed, because toggle refs are strong, and if you have more than one, they'll never be notified. I believe this is to avoid reentrancy in toggle_refs_notify(). It's unclear whether this would be a problem or not - the corresponding globals in libdbus are never freed unless the connection gets closed (and you're not allowed to close them from the libdbus side, so they only ever close if the peer closes them).
(In reply to comment #0) > Weak refs would not be a sufficient solution, because weak refs aren't > notified until after dispose() has finished. If the GObject gurus were willing to add a new sort of weak ref that is triggered before anything is dispose()'d (at the same stage of g_object_unref as toggle refs), that would be a sufficient solution here, though.
Created attachment 202476 [details] Test case This test case reliably (5/5) segfaults on my laptop, using Debian's GLib 2.30.2-4. The number of runs and the sleep times might need adjustment on other hardware. Backtrace for the "get" thread:
+ Trace 229187
(In reply to comment #2) > Backtrace for the "get" thread This isn't exactly the same crash as in the diagram in Comment #0 - in this one, get + initable_init() won the race, but then the connection was disposed in the main thread anyway, and ensure_connection_works() failed (despite its thread owning a ref). I've seen both in automated crash reports, and the cause is basically the same.
I think one correct thing to do here is in dispose: LOCK(); if (refcount != 1) return; the_system_bus = NULL; UNLOCK(); gobject has code in place that ensures finalize is only called if there's no references after a dispose. Another correct thing to do would be using a toggle reference that ensures the object is no longer the system bus before dispose is called. But I have no clue how that interacts with bindings.
(In reply to comment #4) > I think one correct thing to do here is in dispose: > > LOCK(); > if (refcount != 1) > return; > the_system_bus = NULL; > UNLOCK(); Assuming you meant "if (refcount != 1) { UNLOCK(); return; }": yes, that would work, I think. My misgivings about this are: * refcount is (in theory) private; * having the object abort dispose() and not chain up seems like a rule-breaking thing for a subclass to do; * if you have a subclass of GDC (although I don't think that's possible right now, at least without using g-i, because the struct is private), then it's too late to avoid discarding the subclass's state. > Another correct thing to do would be using a toggle reference that ensures the > object is no longer the system bus before dispose is called. But I have no clue > how that interacts with bindings. As far as I can tell from reading gobject.c, the answer is: if a binding uses a toggle ref and GDC also uses a toggle ref, then the GDC object will never die, because toggle refs only trigger when the refcount is exactly 1. This is already the case if two languages both use toggle refs for their proxy objects, though. I don't know which bindings, if any, actually use toggle refs. I'm currently looking into atomic/thread-safe weak refs similar to boost::shared_ptr, over on Bug #548954. If that fails or turns out to have its own problems, I'll try the two approaches you suggested here.
Created attachment 202994 [details] [review] GDBusConnection: use GWeakRef to make the singletons thread-safe --- This requires my patch series from Bug #548954, and seems to work... but Bug #548954 will need thorough review from GObject experts, since it touches GObject core code.
Created attachment 202995 [details] [review] Add test for GDBusConnection singleton access racing with destruction --- This is basically Attachment #202476 [details], incorporated into an existing GTester test.
I like the GWeakRef approach. Let's wait for bug 548954 to get resolved then we can commit this one. Thanks.
Comment on attachment 202994 [details] [review] GDBusConnection: use GWeakRef to make the singletons thread-safe A slightly modified version of this patch has been committed.
Attachment 202995 [details] pushed as 34e3881 - Add test for GDBusConnection singleton access racing with destruction