After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 796804 - g_module_*() calls in _nm_utils_init() constructor cause deadlock
g_module_*() calls in _nm_utils_init() constructor cause deadlock
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: general
git master
Other Linux
: Normal critical
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2018-07-13 11:19 UTC by Philip Withnall
Modified: 2018-12-21 07:47 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
libnm: avoid deadlock during g_module_open() in _nm_utils_init() (2.79 KB, patch)
2018-07-13 14:03 UTC, Thomas Haller
none Details | Review

Description Philip Withnall 2018-07-13 11:19:02 UTC
See https://gitlab.gnome.org/GNOME/glib/issues/1443.

I suspect we’re going to fix GLib to document that calling g_module_*() from with a constructor (like _nm_utils_init()) is not allowed, as it results in reversing the locking order between dlopen() and g_module_open(), since the constructor function is called from within dlopen().

There is a possibility that we will be able to fix it in GLib by eliminating or reducing the locking in g_module_open(), but that won’t fix the deadlock when NetworkManager is running against older GLib versions.

So the safest thing for NetworkManager to do is drop its use of g_module_open() from constructor functions like _nm_utils_init().

That does mean you unfortunately lose your check for using libnm-util and libnm-glib in the same process. Perhaps it could be moved to a common class_init() method instead?
Comment 1 Thomas Haller 2018-07-13 13:36:07 UTC
note that recently I dropped several __attribute__((constructor)) functions in [1]. The offending one, is however still there.

> There is a possibility that we will be able to fix it in GLib by eliminating or 
> reducing the locking in g_module_open(), but that won’t fix the deadlock when 
> NetworkManager is running against older GLib versions.

Well, fixing libnm won't also fix it for users who use older libnm version ;-) 


This is anyway just a failsafe resulting in aborting the application with g_error(). If the situation arises, that the process is borked, it doesn't really matter whether we creash during __attribute__((constructor)) or later. Fine with me, moving the g_module_open() somewhere else.


[1] https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=a0f4f4021
Comment 2 Thomas Haller 2018-07-13 14:03:03 UTC
Created attachment 373050 [details] [review]
libnm: avoid deadlock during g_module_open() in _nm_utils_init()

_nm_utils_init() is a __attribute__((constructor)) function,
that is, it runs during dlopen().

On the other head, g_module_open() itself calls dlopen().

It is prone to deadlock. Don't do it.

The check is only an aggressive assertion to crash the application
if it wrongly loads libnm and libnm-util/libnm-glib at the same time.
If that happens, all is lost already. We can just as well call the
assertion later. It's not supposed to fail anyway.
Comment 3 Beniamino Galvani 2018-08-01 12:59:19 UTC
(In reply to Thomas Haller from comment #2)
> Created attachment 373050 [details] [review] [review]
> libnm: avoid deadlock during g_module_open() in _nm_utils_init()

Looks good to me.
Comment 5 Дилян Палаузов 2018-12-20 16:56:52 UTC
With Evolution 3.31.3/NetworkManager-1.14.4 I get this deadlock:

 Thread #1: lock order "0x23660670 before 0x4227968" violated

Observed (incorrect) order is: acquisition of lock at 0x4227968
   at 0x4C338AC: mutex_lock_WRK (hg_intercepts.c:909)
   by 0x40127CE: _dl_open (dl-open.c:537)
   by 0x14846F25: dlopen_doit (dlopen.c:66)
   by 0x11618B0B: _dl_catch_exception (dl-error-skeleton.c:196)
   by 0x11618B7E: _dl_catch_error (dl-error-skeleton.c:215)
   by 0x14847584: _dlerror_run (dlerror.c:163)
   by 0x14846FC0: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
   by 0x30D3379C: libmodman::module_manager::load_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) (module_manager.cpp:251)
   by 0x30D343DB: libmodman::module_manager::load_dir(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) (module_manager.cpp:328)
   by 0x30D22718: libproxy::proxy_factory::proxy_factory() (proxy.cpp:165)
   by 0x30D22BBA: px_proxy_factory_new (proxy.cpp:449)
   by 0x30B144B1: g_libproxy_resolver_init (glibproxyresolver.c:79)
   by 0xA5ED9AD: g_type_create_instance (gtype.c:1864)
   by 0xA5D0277: g_object_new_internal (gobject.c:1805)
   by 0xA5D1924: g_object_new_with_properties (gobject.c:1973)
   by 0xA5D2330: g_object_new (gobject.c:1645)
   by 0xA27A8EB: try_implementation (giomodule.c:830)
   by 0xA27AA52: _g_io_module_get_default (giomodule.c:929)
   by 0x513742E: camel_service_init (camel-service.c:1081)
   by 0xA5ED972: g_type_create_instance (gtype.c:1858)
   by 0xA5D0277: g_object_new_internal (gobject.c:1805)
   by 0xA5D1FF3: g_object_new_valist (gobject.c:2128)
   by 0xA276FC5: g_initable_new_valist (ginitable.c:244)
   by 0xA27708B: g_initable_new (ginitable.c:162)
   by 0x513BF56: session_add_service (camel-session.c:448)
   by 0x2776AF5F: mail_session_add_service (e-mail-session.c:1186)
   by 0x282359FF: mail_ui_session_add_service (e-mail-ui-session.c:516)
   by 0x513BB67: camel_session_add_service (camel-session.c:927)
   by 0x277699C5: mail_session_add_from_source (e-mail-session.c:464)
   by 0x2776F4D8: mail_session_constructed (e-mail-session.c:1052)
   by 0xA5D049A: g_object_new_internal (gobject.c:1845)
   by 0xA5D1FF3: g_object_new_valist (gobject.c:2128)
   by 0xA5D231B: g_object_new (gobject.c:1648)
   by 0x2823923A: e_mail_ui_session_new (e-mail-ui-session.c:1015)
   by 0x28286DBA: mail_backend_constructed.lto_priv.206 (e-mail-backend.c:1229)
   by 0x2D1B62E1: mail_shell_backend_constructed (e-mail-shell-backend.c:844)
   by 0xA5D0919: g_object_new_internal (gobject.c:1777)
   by 0xA5D1FF3: g_object_new_valist (gobject.c:2128)
   by 0xA5D231B: g_object_new (gobject.c:1648)
   by 0x6DD6296: extensible_load_extension (e-extensible.c:93)
   by 0x6E2320E: e_type_traverse (e-data-server-util.c:2941)
   by 0x6E231F1: e_type_traverse (e-data-server-util.c:2935)
   by 0x6E231F1: e_type_traverse (e-data-server-util.c:2935)
   by 0x6DD6461: e_extensible_load_extensions (e-extensible.c:138)
   by 0x6DD652F: e_extensible_list_extensions (e-extensible.c:180)
   by 0x4E711A2: e_shell_load_modules (e-shell.c:2264)
   by 0x4046C1: main (main.c:656)

 followed by a later acquisition of lock at 0x23660670
   at 0x4C338AC: mutex_lock_WRK (hg_intercepts.c:909)
   by 0xA5EB626: g_type_class_ref (gtype.c:2935)
   by 0x311AE78B: register_error_domain (nm-errors.c:48)
   by 0x311AE9DE: _nm_dbus_errors_init (nm-errors.c:63)
   by 0x3117B3A6: _nm_utils_init (nm-utils.c:253)
   by 0x400F079: call_init.part.0 (dl-init.c:72)
   by 0x400F185: _dl_init (dl-init.c:118)
   by 0x4012F52: dl_open_worker (dl-open.c:506)
   by 0x11618B0B: _dl_catch_exception (dl-error-skeleton.c:196)
   by 0x4012839: _dl_open (dl-open.c:588)
   by 0x14846F25: dlopen_doit (dlopen.c:66)
   by 0x11618B0B: _dl_catch_exception (dl-error-skeleton.c:196)
   by 0x11618B7E: _dl_catch_error (dl-error-skeleton.c:215)
   by 0x14847584: _dlerror_run (dlerror.c:163)
   by 0x14846FC0: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
   by 0x30D3379C: libmodman::module_manager::load_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) (module_manager.cpp:251)
   by 0x30D343DB: libmodman::module_manager::load_dir(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) (module_manager.cpp:328)
   by 0x30D22718: libproxy::proxy_factory::proxy_factory() (proxy.cpp:165)
   by 0x30D22BBA: px_proxy_factory_new (proxy.cpp:449)
   by 0x30B144B1: g_libproxy_resolver_init (glibproxyresolver.c:79)
   by 0xA5ED9AD: g_type_create_instance (gtype.c:1864)
   by 0xA5D0277: g_object_new_internal (gobject.c:1805)
   by 0xA5D1924: g_object_new_with_properties (gobject.c:1973)
   by 0xA5D2330: g_object_new (gobject.c:1645)
   by 0xA27A8EB: try_implementation (giomodule.c:830)
   by 0xA27AA52: _g_io_module_get_default (giomodule.c:929)
   by 0x513742E: camel_service_init (camel-service.c:1081)
   by 0xA5ED972: g_type_create_instance (gtype.c:1858)
   by 0xA5D0277: g_object_new_internal (gobject.c:1805)
   by 0xA5D1FF3: g_object_new_valist (gobject.c:2128)
   by 0xA276FC5: g_initable_new_valist (ginitable.c:244)
   by 0xA27708B: g_initable_new (ginitable.c:162)
   by 0x513BF56: session_add_service (camel-session.c:448)
   by 0x2776AF5F: mail_session_add_service (e-mail-session.c:1186)
   by 0x282359FF: mail_ui_session_add_service (e-mail-ui-session.c:516)
   by 0x513BB67: camel_session_add_service (camel-session.c:927)
   by 0x277699C5: mail_session_add_from_source (e-mail-session.c:464)
   by 0x2776F4D8: mail_session_constructed (e-mail-session.c:1052)
   by 0xA5D049A: g_object_new_internal (gobject.c:1845)
   by 0xA5D1FF3: g_object_new_valist (gobject.c:2128)
   by 0xA5D231B: g_object_new (gobject.c:1648)
   by 0x2823923A: e_mail_ui_session_new (e-mail-ui-session.c:1015)
   by 0x28286DBA: mail_backend_constructed.lto_priv.206 (e-mail-backend.c:1229)
   by 0x2D1B62E1: mail_shell_backend_constructed (e-mail-shell-backend.c:844)
   by 0xA5D0919: g_object_new_internal (gobject.c:1777)
   by 0xA5D1FF3: g_object_new_valist (gobject.c:2128)
   by 0xA5D231B: g_object_new (gobject.c:1648)
   by 0x6DD6296: extensible_load_extension (e-extensible.c:93)
   by 0x6E2320E: e_type_traverse (e-data-server-util.c:2941)
   by 0x6E231F1: e_type_traverse (e-data-server-util.c:2935)
   by 0x6E231F1: e_type_traverse (e-data-server-util.c:2935)
   by 0x6DD6461: e_extensible_load_extensions (e-extensible.c:138)
   by 0x6DD652F: e_extensible_list_extensions (e-extensible.c:180)
   by 0x4E711A2: e_shell_load_modules (e-shell.c:2264)
   by 0x4046C1: main (main.c:656)

Required order was established by acquisition of lock at 0x23660670
   at 0x4C338AC: mutex_lock_WRK (hg_intercepts.c:909)
   by 0xA5EB66B: g_type_class_ref (gtype.c:2935)
   by 0x6DD626A: extensible_load_extension (e-extensible.c:87)
   by 0x6E2320E: e_type_traverse (e-data-server-util.c:2941)
   by 0x6DD6461: e_extensible_load_extensions (e-extensible.c:138)
   by 0xA5D0170: g_object_new_internal (gobject.c:1777)
   by 0xA5D1924: g_object_new_with_properties (gobject.c:1973)
   by 0xA5D2330: g_object_new (gobject.c:1645)
   by 0x6E08296: e_source_registry_init (e-source-registry.c:1760)
   by 0xA5ED9AD: g_type_create_instance (gtype.c:1864)
   by 0xA5D0277: g_object_new_internal (gobject.c:1805)
   by 0xA5D1924: g_object_new_with_properties (gobject.c:1973)
   by 0xA5D2330: g_object_new (gobject.c:1645)
   by 0x6E08958: e_source_registry_new_sync (e-source-registry.c:305)
   by 0x4E6DCEC: shell_initable_init (e-shell.c:1836)
   by 0xA276FE1: g_initable_new_valist (ginitable.c:248)
   by 0xA27708B: g_initable_new (ginitable.c:162)
   by 0x4044A3: main (main.c:403)

 followed by a later acquisition of lock at 0x4227968
   at 0x4C338AC: mutex_lock_WRK (hg_intercepts.c:909)
   by 0x1484709F: dlsym (dlsym.c:68)
   by 0x97C133D: g_module_symbol (gmodule-dl.c:163)
   by 0x6DD8EC0: module_load (e-module.c:128)
   by 0xA5F0BF0: g_type_module_use (gtypemodule.c:244)
   by 0xA5F0C9E: g_type_module_use_plugin (gtypemodule.c:310)
   by 0xA5EA384: type_data_ref_Wm (gtype.c:1240)
   by 0xA5EB694: g_type_class_ref (gtype.c:2944)
   by 0x6DD626A: extensible_load_extension (e-extensible.c:87)
   by 0x6E2320E: e_type_traverse (e-data-server-util.c:2941)
   by 0x6DD6461: e_extensible_load_extensions (e-extensible.c:138)
   by 0xA5D0170: g_object_new_internal (gobject.c:1777)
   by 0xA5D1924: g_object_new_with_properties (gobject.c:1973)
   by 0xA5D2330: g_object_new (gobject.c:1645)
   by 0x6E08296: e_source_registry_init (e-source-registry.c:1760)
   by 0xA5ED9AD: g_type_create_instance (gtype.c:1864)
   by 0xA5D0277: g_object_new_internal (gobject.c:1805)
   by 0xA5D1924: g_object_new_with_properties (gobject.c:1973)
   by 0xA5D2330: g_object_new (gobject.c:1645)
   by 0x6E08958: e_source_registry_new_sync (e-source-registry.c:305)
   by 0x4E6DCEC: shell_initable_init (e-shell.c:1836)
   by 0xA276FE1: g_initable_new_valist (ginitable.c:248)
   by 0xA27708B: g_initable_new (ginitable.c:162)
   by 0x4044A3: main (main.c:403)

 Lock at 0x23660670 was first observed
   at 0x4C378ED: pthread_mutex_init (hg_intercepts.c:787)
   by 0xA8A227C: g_rec_mutex_impl_new (gthread-posix.c:282)
   by 0xA8A2337: g_rec_mutex_lock (gthread-posix.c:302)
   by 0xA5EDD80: g_type_add_interface_static (gtype.c:2842)
   by 0xA2F0BB6: g_dbus_connection_get_type_once (gdbusconnection.c:529)
   by 0xA2F23E4: g_dbus_connection_get_type (gdbusconnection.c:529)
   by 0x404139: main (main.c:466)
 Address 0x23660670 is 0 bytes inside a block of size 40 alloc'd
   at 0x4C3056F: malloc (vg_replace_malloc.c:299)
   by 0xA8A224F: g_rec_mutex_impl_new (gthread-posix.c:276)
   by 0xA8A2337: g_rec_mutex_lock (gthread-posix.c:302)
   by 0xA5EDD80: g_type_add_interface_static (gtype.c:2842)
   by 0xA2F0BB6: g_dbus_connection_get_type_once (gdbusconnection.c:529)
   by 0xA2F23E4: g_dbus_connection_get_type (gdbusconnection.c:529)
   by 0x404139: main (main.c:466)
 Block was alloc'd by thread #1

 Lock at 0x4227968 was first observed
   at 0x4C338AC: mutex_lock_WRK (hg_intercepts.c:909)
   by 0x40127CE: _dl_open (dl-open.c:537)
   by 0x14846F40: dlopen_doit (dlopen.c:66)
   by 0x11618B0B: _dl_catch_exception (dl-error-skeleton.c:196)
   by 0x11618B7E: _dl_catch_error (dl-error-skeleton.c:215)
   by 0x14847584: _dlerror_run (dlerror.c:163)
   by 0x14846FC0: dlopen@@GLIBC_2.2.5 (dlopen.c:87)
   by 0x97C1B9B: g_module_open (gmodule-dl.c:125)
   by 0x74D03D8: _gtk_module_has_mixed_deps (gtkmodules.c:590)
   by 0x74B31B1: pre_parse_hook (gtkmain.c:654)
   by 0xA866647: g_option_context_parse (goption.c:1937)
   by 0x74B3797: gtk_init_with_args (gtkmain.c:988)
   by 0x40418E: main (main.c:477)
 Address 0x4227968 is 2312 bytes inside data symbol "_rtld_local"
Comment 6 Thomas Haller 2018-12-20 19:29:14 UTC
> With Evolution 3.31.3/NetworkManager-1.14.4 I get this deadlock:

Hi,

your issue is not the same as this bug (also, this bug is marked as closed and fixed in libnm 1.14.4).

The original issue, was that libnm tried to load a module (dlopen) while being loaded. libnm no longer does that.



I think it's a combination of:

- glib: g_type_class_ref() calls type_data_ref_Wm() while locking on class_init_rec_mutex
- evolution-data-server: calling g_module_symbol() from module_load() (which gets called during g_type_class_ref()).
- glibc: holding the dlopen lock while running constructor functions.

Of course, none of these issues are easily avoided. Maybe evolution should show more self restrained regarding deadlocks when loading plugins.


Please open a new bug, but I don't think the issue is with libnm. All that libnm does here is calling g_type_class_ref() from a __attribute__((constructor)) function (which in turn is run during dlopen() of libnm). That is not easily avoided...
Comment 7 Дилян Палаузов 2018-12-21 07:47:17 UTC
Moved to https://gitlab.gnome.org/GNOME/evolution/issues/272.