After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 792499 - deadlock on startup with TCP session bus
deadlock on startup with TCP session bus
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gdbus
2.54.x
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2018-01-13 17:31 UTC by Nix
Modified: 2018-01-31 22:33 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
flip off enable-proxy for dbus session connections (713 bytes, patch)
2018-01-13 17:31 UTC, Nix
needs-work Details | Review

Description Nix 2018-01-13 17:31:37 UTC
Created attachment 366773 [details] [review]
flip off enable-proxy for dbus session connections

So I'm trying to use a remote TCP session bus (a bad idea, maybe, but the machines I'm using it between are already sharing $HOME, so are in the same security domain, and I want to share a bus between my desktop and the compute server in the attic). The bus is set up and works, but all glib applications deadlock at startup.

It's evident why if you look at a backtrace (here of p11tool --list-all):

  • #0 syscall
    from /lib/libc.so.6
  • #1 g_mutex_lock_slowpath
    at /usr/src/gnome/glib/../x86_64-mutilate/glib/gthread-posix.c line 1313
  • #2 g_mutex_lock
    at /usr/src/gnome/glib/../x86_64-mutilate/glib/gthread-posix.c line 1337
  • #3 initable_init
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gdbusconnection.c line 2465
  • #4 g_bus_get_sync
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gdbusconnection.c line 7274
  • #5 initable_init
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gdbusproxy.c line 1928
  • #6 g_initable_new_valist
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/ginitable.c line 248
  • #7 g_initable_new
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/ginitable.c line 162
  • #8 gvfs_dbus_daemon_proxy_new_for_bus_sync
    from /usr/lib/gvfs/libgvfscommon.so
  • #9 g_proxy_volume_monitor_register
    from /usr/lib/gio/modules/libgioremote-volume-monitor.so
  • #10 g_io_module_load_module
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/giomodule.c line 327
  • #11 g_type_module_use
    at /usr/src/gnome/glib/../x86_64-mutilate/gobject/gtypemodule.c line 244
  • #12 g_io_modules_scan_all_in_directory_with_scope
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/giomodule.c line 518
  • #13 _g_io_modules_ensure_loaded
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/giomodule.c line 1098
  • #14 _g_io_module_get_default
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/giomodule.c line 850
  • #15 g_proxy_resolver_get_default
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gproxyresolver.c line 79
  • #16 g_proxy_address_enumerator_set_property
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gproxyaddressenumerator.c line 656
  • #17 object_set_property
    at /usr/src/gnome/glib/../x86_64-mutilate/gobject/gobject.c line 1439
  • #18 g_object_new_internal
    at /usr/src/gnome/glib/../x86_64-mutilate/gobject/gobject.c line 1831
  • #19 g_object_new_valist
    at /usr/src/gnome/glib/../x86_64-mutilate/gobject/gobject.c line 2120
  • #20 g_object_new
    at /usr/src/gnome/glib/../x86_64-mutilate/gobject/gobject.c line 1640
  • #21 g_network_address_connectable_proxy_enumerate
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gnetworkaddress.c line 1108
  • #22 g_socket_client_connect
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gsocketclient.c line 982
  • #23 g_dbus_address_connect
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gdbusaddress.c line 678
  • #24 g_dbus_address_try_connect_one
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gdbusaddress.c line 784
  • #25 g_dbus_address_get_stream_sync
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gdbusaddress.c line 969
  • #26 initable_init
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gdbusconnection.c line 2504
  • #27 g_bus_get_sync
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gdbusconnection.c line 7274
  • #28 g_application_impl_register
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gapplicationimpl-dbus.c line 543
  • #29 g_application_register
    at /usr/src/gnome/glib/../x86_64-mutilate/gio/gapplication.c line 2053
  • #30 C_Initialize
    from /usr/lib/pkcs11/opensc-pkcs11.so
  • #31 initialize_module_inlock_reentrant
    at /usr/src/p11-kit/x86_64-mutilate/p11-kit/modules.c line 703
  • #32 managed_C_Initialize
    at /usr/src/p11-kit/x86_64-mutilate/p11-kit/modules.c line 1547
  • #33 binding_C_Initialize
    at /usr/src/p11-kit/x86_64-mutilate/p11-kit/virtual.c line 139
  • #34 ffi_closure_unix64_inner
    at src/x86/ffi64.c line 825
  • #35 ffi_closure_unix64
    from /usr/lib/libffi.so.6
  • #36 p11_kit_modules_initialize
    at /usr/src/p11-kit/x86_64-mutilate/p11-kit/modules.c line 2117
  • #37 p11_kit_modules_load_and_initialize
    at /usr/src/p11-kit/x86_64-mutilate/p11-kit/modules.c line 2171
  • #38 auto_load
    at /usr/src/gnutls/x86_64-mutilate/lib/pkcs11.c line 750
  • #39 gnutls_pkcs11_init
    at /usr/src/gnutls/x86_64-mutilate/lib/pkcs11.c line 823
  • #40 cmd_parser
    at /usr/src/gnutls/x86_64-mutilate/src/p11tool.c line 187
  • #41 main
    at /usr/src/gnutls/x86_64-mutilate/src/p11tool.c line 75

At around frame 27, bus initialization starts: at frame 26 we must assume that something takes out the properties_lock. At around frame 22 the GSocketClient decides that this is a potentially proxiable address, so hunts down a proxy resolver. This calls into GIO, which initializes all its modules at frame 13. Many of these modules (like the GVFS volume monitor) use GDBus, which isn't yet fully initialized, so we recurse back into initable_init(), and promptly deadlock on the properties_lock we already took out.

It is not clear what level the fix for this belongs at. For a long time I thought it might be GVFS, but I don't think it's acceptable to say that GVFS modules may not use GDBus. Rather, I think we should simply dictate that GDBus session busses cannot be passed over proxies (I've never heard of anyone trying to pass a DBus session bus over a SOCKS proxy, while use over TCP is not unheard of and is a supported configuration for DBus). Since proxied GDBus would lock up anyway due to this bug, we're not actually losing any functionality here.

The attached patch flips off the enable-proxy property on the GSocketClient used to connect to the dbus session, preventing the deadlock.
Comment 1 Philip Withnall 2018-01-15 11:55:22 UTC
This is an interesting one, and I can’t come up with anything better than you solution. I’ve been thinking along the lines of:

 • The overall problem is that D-Bus cannot be used as part of the initialisation of the singleton GDBusConnections.
 • One way of achieving that would be to make g_initable_init() on a GDBusConnection instance return G_IO_ERROR_WOULD_BLOCK rather than deadlocking on the init_lock.
 • I don’t think it’s possible to propagate this failure back up through all the callers to the original g_dbus_address_get_stream_sync() call, since it passes through a load of singleton getters like g_proxy_resolver_get_default(), which can’t fail.
 • We can’t make init_lock recursive, because then we’d end up with an infinite loop rather than a deadlock.

So, as you’ve found, the next reasonable approach seems to me to be ensuring that no D-Bus traffic happens while holding the init_lock.

Colin, can I get a second opinion on this please?
Comment 2 Philip Withnall 2018-01-15 11:57:48 UTC
Review of attachment 366773 [details] [review]:

::: gio/gdbusaddress.c
@@ +679,3 @@
+      g_value_init (&f, G_TYPE_BOOLEAN);
+      g_value_set_boolean (&f, FALSE);
+      g_object_set_property (G_OBJECT (client), "enable-proxy", &f);

You can just call `g_socket_client_set_enable_proxy (client, FALSE)` instead.

Please also add a comment to the code, pointing to this bug and briefly explaining why proxy support has to be disabled.
Comment 3 Dan Winship 2018-01-15 18:30:41 UTC
(In reply to Philip Withnall from comment #1)
> So, as you’ve found, the next reasonable approach seems to me to be ensuring
> that no D-Bus traffic happens while holding the init_lock.

In theory, forcing the connection to be non-proxiable is neither necessary nor sufficient for ensuring that no D-Bus traffic happens though. (Not necessary since the problem in the stack trace isn't with the proxy resolver itself, it's with the fact that _g_io_modules_ensure_loaded() causes all the other giomodules to be loaded then too. And not sufficient since it's possible there could be some other DBus-using giomodule that would need to get loaded by gdbus initialization in the future.)

In practice, the GProxyResolver that gets loaded is probably going to be GProxyResolverGNOME though, which makes GSettings calls in its init func, so, Game Over, man! :-/
Comment 4 Philip Withnall 2018-01-19 12:50:58 UTC
(In reply to Dan Winship from comment #3)
> (In reply to Philip Withnall from comment #1)
> > So, as you’ve found, the next reasonable approach seems to me to be ensuring
> > that no D-Bus traffic happens while holding the init_lock.
> 
> In theory, forcing the connection to be non-proxiable is neither necessary
> nor sufficient for ensuring that no D-Bus traffic happens though. (Not
> necessary since the problem in the stack trace isn't with the proxy resolver
> itself, it's with the fact that _g_io_modules_ensure_loaded() causes all the
> other giomodules to be loaded then too. And not sufficient since it's
> possible there could be some other DBus-using giomodule that would need to
> get loaded by gdbus initialization in the future.)
> 
> In practice, the GProxyResolver that gets loaded is probably going to be
> GProxyResolverGNOME though, which makes GSettings calls in its init func,
> so, Game Over, man! :-/

Yeah, but the proper fix here is to make all the GIO default modules stuff failable on construction, which is a lot of work and API disruption for the sake of getting proxiable D-Bus-over-TCP connections (and probably some other corner cases) working. Evidently nobody has tried this before, or this bug would have been reported sooner. I don’t see any way that people could have got this working *without* hitting this bug, unless none of their default GIO modules used D-Bus.

So I’d be happy with disabling proxying for D-Bus addresses, in the knowledge that it is a bit of a hack.
Comment 5 Dan Winship 2018-01-22 16:13:16 UTC
(In reply to Philip Withnall from comment #4)
> Yeah, but the proper fix here is to make all the GIO default modules stuff
> failable on construction, which is a lot of work and API disruption for the
> sake of getting proxiable D-Bus-over-TCP connections (and probably some
> other corner cases) working.

No, that wouldn't even necessarily help; if the GProxyResolver needs to use D-Bus to decide whether or not to proxy the connection (eg, if you're using GProxyResolverGNOME and you have automatic proxy resolution set up so it needs to make a D-Bus call to the glibpacrunner daemon), then you end up with an infinite loop where creating a (TCP) GDBusConnection requires querying GProxyResolver which requires creating a GDBusConnection which requires querying GProxyResolver which...

(I guess my earlier comment wasn't clear, but I was agreeing that disabling proxy resolution for GDBusConnection is probably the right/only fix.)
Comment 6 Philip Withnall 2018-01-31 22:33:29 UTC
OK, I’ve pushed a fix which is based on my review of Nix’s above:

commit 4752d72a601787ef5a2249674a3ff887f42320e6 (HEAD -> master, origin/master, origin/HEAD)
Author: Philip Withnall <withnall@endlessm.com>
Date:   Wed Jan 31 22:21:55 2018 +0000

    gdbusaddress: Disable proxy support for D-Bus addresses
    
    See the discussion in the bug report: with proxy support enabled, a
    proxy resolver is created. Doing that will load all the GIO modules, and
    typically at least one of them will try to use GDBus during
    initialisation, which will cause a deadlock.
    
    Using a TCP address with GDBusAddress is still supported, but accessing
    it over a proxy is not.
    
    Document this.
    
    Signed-off-by: Philip Withnall <withnall@endlessm.com>
    
    https://bugzilla.gnome.org/show_bug.cgi?id=792499