GNOME Bugzilla – Bug 792499
deadlock on startup with TCP session bus
Last modified: 2018-01-31 22:33:29 UTC
Created attachment 366773 [details] [review] flip off enable-proxy for dbus session connections So I'm trying to use a remote TCP session bus (a bad idea, maybe, but the machines I'm using it between are already sharing $HOME, so are in the same security domain, and I want to share a bus between my desktop and the compute server in the attic). The bus is set up and works, but all glib applications deadlock at startup. It's evident why if you look at a backtrace (here of p11tool --list-all):
+ Trace 238331
At around frame 27, bus initialization starts: at frame 26 we must assume that something takes out the properties_lock. At around frame 22 the GSocketClient decides that this is a potentially proxiable address, so hunts down a proxy resolver. This calls into GIO, which initializes all its modules at frame 13. Many of these modules (like the GVFS volume monitor) use GDBus, which isn't yet fully initialized, so we recurse back into initable_init(), and promptly deadlock on the properties_lock we already took out. It is not clear what level the fix for this belongs at. For a long time I thought it might be GVFS, but I don't think it's acceptable to say that GVFS modules may not use GDBus. Rather, I think we should simply dictate that GDBus session busses cannot be passed over proxies (I've never heard of anyone trying to pass a DBus session bus over a SOCKS proxy, while use over TCP is not unheard of and is a supported configuration for DBus). Since proxied GDBus would lock up anyway due to this bug, we're not actually losing any functionality here. The attached patch flips off the enable-proxy property on the GSocketClient used to connect to the dbus session, preventing the deadlock.
This is an interesting one, and I can’t come up with anything better than you solution. I’ve been thinking along the lines of: • The overall problem is that D-Bus cannot be used as part of the initialisation of the singleton GDBusConnections. • One way of achieving that would be to make g_initable_init() on a GDBusConnection instance return G_IO_ERROR_WOULD_BLOCK rather than deadlocking on the init_lock. • I don’t think it’s possible to propagate this failure back up through all the callers to the original g_dbus_address_get_stream_sync() call, since it passes through a load of singleton getters like g_proxy_resolver_get_default(), which can’t fail. • We can’t make init_lock recursive, because then we’d end up with an infinite loop rather than a deadlock. So, as you’ve found, the next reasonable approach seems to me to be ensuring that no D-Bus traffic happens while holding the init_lock. Colin, can I get a second opinion on this please?
Review of attachment 366773 [details] [review]: ::: gio/gdbusaddress.c @@ +679,3 @@ + g_value_init (&f, G_TYPE_BOOLEAN); + g_value_set_boolean (&f, FALSE); + g_object_set_property (G_OBJECT (client), "enable-proxy", &f); You can just call `g_socket_client_set_enable_proxy (client, FALSE)` instead. Please also add a comment to the code, pointing to this bug and briefly explaining why proxy support has to be disabled.
(In reply to Philip Withnall from comment #1) > So, as you’ve found, the next reasonable approach seems to me to be ensuring > that no D-Bus traffic happens while holding the init_lock. In theory, forcing the connection to be non-proxiable is neither necessary nor sufficient for ensuring that no D-Bus traffic happens though. (Not necessary since the problem in the stack trace isn't with the proxy resolver itself, it's with the fact that _g_io_modules_ensure_loaded() causes all the other giomodules to be loaded then too. And not sufficient since it's possible there could be some other DBus-using giomodule that would need to get loaded by gdbus initialization in the future.) In practice, the GProxyResolver that gets loaded is probably going to be GProxyResolverGNOME though, which makes GSettings calls in its init func, so, Game Over, man! :-/
(In reply to Dan Winship from comment #3) > (In reply to Philip Withnall from comment #1) > > So, as you’ve found, the next reasonable approach seems to me to be ensuring > > that no D-Bus traffic happens while holding the init_lock. > > In theory, forcing the connection to be non-proxiable is neither necessary > nor sufficient for ensuring that no D-Bus traffic happens though. (Not > necessary since the problem in the stack trace isn't with the proxy resolver > itself, it's with the fact that _g_io_modules_ensure_loaded() causes all the > other giomodules to be loaded then too. And not sufficient since it's > possible there could be some other DBus-using giomodule that would need to > get loaded by gdbus initialization in the future.) > > In practice, the GProxyResolver that gets loaded is probably going to be > GProxyResolverGNOME though, which makes GSettings calls in its init func, > so, Game Over, man! :-/ Yeah, but the proper fix here is to make all the GIO default modules stuff failable on construction, which is a lot of work and API disruption for the sake of getting proxiable D-Bus-over-TCP connections (and probably some other corner cases) working. Evidently nobody has tried this before, or this bug would have been reported sooner. I don’t see any way that people could have got this working *without* hitting this bug, unless none of their default GIO modules used D-Bus. So I’d be happy with disabling proxying for D-Bus addresses, in the knowledge that it is a bit of a hack.
(In reply to Philip Withnall from comment #4) > Yeah, but the proper fix here is to make all the GIO default modules stuff > failable on construction, which is a lot of work and API disruption for the > sake of getting proxiable D-Bus-over-TCP connections (and probably some > other corner cases) working. No, that wouldn't even necessarily help; if the GProxyResolver needs to use D-Bus to decide whether or not to proxy the connection (eg, if you're using GProxyResolverGNOME and you have automatic proxy resolution set up so it needs to make a D-Bus call to the glibpacrunner daemon), then you end up with an infinite loop where creating a (TCP) GDBusConnection requires querying GProxyResolver which requires creating a GDBusConnection which requires querying GProxyResolver which... (I guess my earlier comment wasn't clear, but I was agreeing that disabling proxy resolution for GDBusConnection is probably the right/only fix.)
OK, I’ve pushed a fix which is based on my review of Nix’s above: commit 4752d72a601787ef5a2249674a3ff887f42320e6 (HEAD -> master, origin/master, origin/HEAD) Author: Philip Withnall <withnall@endlessm.com> Date: Wed Jan 31 22:21:55 2018 +0000 gdbusaddress: Disable proxy support for D-Bus addresses See the discussion in the bug report: with proxy support enabled, a proxy resolver is created. Doing that will load all the GIO modules, and typically at least one of them will try to use GDBus during initialisation, which will cause a deadlock. Using a TCP address with GDBusAddress is still supported, but accessing it over a proxy is not. Document this. Signed-off-by: Philip Withnall <withnall@endlessm.com> https://bugzilla.gnome.org/show_bug.cgi?id=792499