GNOME Bugzilla – Bug 674885
type initialisation deadlock in GObject
Last modified: 2018-05-24 14:09:05 UTC
Imagine we have two very simple classes, A and B. A has a GObject property "foo" of type B. This means that the class init function for A contains a call to B's get_type() function (in order to create the paramspec of the correct type). Now imagine we have two threads, #1 and #2. #1 calls a_get_type(). #2 calls b_get_type(). The typical get_type() boilerplate (as in G_DEFINE_TYPE) uses g_once_init_enter() to guard the initialisation of the static GType copy. Let's say that thread #2 gets there first: it's now in the critical section of b_get_type() and the once is "owned" by thread #2. Thread #2 sleeps immediately after entering the once. Meanwhile, thread #1 is in a_get_type() which causes a_class_init() to be called. The 'class_init_rec_mutex' (in gtype.c) is held. a_class_init() makes its call to b_get_type(). Since thread #2 is in the critical section of the once, we sleep. Meanwhile, in thread #2, we've just entered the once. Now we have to initialise our GType. It's rather typical to call g_type_add_interface_static() here. That wants to acquire 'class_init_rec_mutex', which is held by thread #1, so now thread #2 sleeps. We're deadlocked.
Created attachment 212894 [details] an example in the real world (A = GDBusProxy, B = GDBusConnection)
Since we cannot expect people to stop using type functions from inside of class_init and we also cannot expect people to stop registering interfaces from their get_type() we have a few options open to us: 1) Figure out a way to drop the class_init_rec_mutex when calling class init functions. 2) Figure out a way to make g_type_add_interface_static() not acquire the lock. 3) Replace g_once_init_enter() in G_DEFINE_TYPE with something that does the same thing for the fast path, but for the slow cases attempts to acquire the class_init_rec_mutex instead of sleeping on a condition variable. '1' is probably impossible. '2' is tricky due to our ability to add interfaces to classes that are already in use (see a comment in the code there). There are really two key core issues here: 1) Through g_once_init_enter() we end up creating (effectively) many separate temporary mutexes. That's generally a dangerous behaviour, but it's fine as long as we don't attempt to acquire any other locks while holding these mutexes. 2) Our get_type() boilerplate calls functions that attempt to acquire locks.
So getting rid of 'class_init_rec_mutex' might not be so impossible. Its use in gtype is limited to two things: 1) In g_type_ref() to prevent the class from being initialised twice. 2) Various other places in the same file in order to avoid lock inversion issues with the type_rw_lock. If we can eliminate #1 by replacing it with g_once_init_enter keyed specifically to the type in question then we can kill the requirement that we hold a recursive mutex entirely. This would only become a hazard if class_init of A depends on class_init of B and class_init of B depends on class_init of A. If that is true then you don't need threads to get involved before you realise that you have a problem, so I think we could well be in the clear... ...and I'd love to be able to kill off a recursive mutex. :)
> ...and I'd love to be able to kill off a recursive mutex. :) That would be nice, indeed.
*** Bug 670479 has been marked as a duplicate of this bug. ***
*** Bug 683519 has been marked as a duplicate of this bug. ***
*** Bug 694439 has been marked as a duplicate of this bug. ***
*** Bug 697128 has been marked as a duplicate of this bug. ***
Note, it's possible for applications to work around this by calling e.g. g_type_ensure (G_TYPE_DBUS_CONNECTION); and similar early in main().
In my case it is dconfsettings module who calls g_dbus_connection_get_type() in a thread. My app did not ask for it, so it is spontaneous. I'm not sure calling g_type_ensure() will solve the solution if it happens in the same time than dconf does.
(In reply to comment #10) > In my case it is dconfsettings module who calls g_dbus_connection_get_type() in > a thread. My app did not ask for it, so it is spontaneous. I'm not sure calling > g_type_ensure() will solve the solution if it happens in the same time than > dconf does. You can do it before creating a GSettings object.
I never create a GSettings object... it's all totally spontaneous in my case...
Oh, actually it is EDS doing it when I create a ESourceRegistry. So indeed g_type_ensure() before that should fix my problem. Thanks :-)
*** Bug 726733 has been marked as a duplicate of this bug. ***
*** Bug 772798 has been marked as a duplicate of this bug. ***
I suspect *most* of these cases could be worked around in glib, since the thread is typically created from a wrapper function. dconf would require something like: diff --git a/gdbus/dconf-gdbus-thread.c b/gdbus/dconf-gdbus-thread.c index e397e3a..23bd290 100644 --- a/gdbus/dconf-gdbus-thread.c +++ b/gdbus/dconf-gdbus-thread.c @@ -304,6 +304,9 @@ dconf_gdbus_get_bus_for_sync (GBusType bus_type, { g_assert_cmpint (bus_type, <, G_N_ELEMENTS (dconf_gdbus_get_bus_data)); + /* https://bugzilla.gnome.org/show_bug.cgi?id=674885 */ + g_type_ensure (G_TYPE_DBUS_CONNECTION); + /* I'm not 100% sure we have to lock as much as we do here, but let's * play it safe. *
Bug 772848 might be a duplicate. I can see this in about 1 out of 3 logins with the split-up gnome-settings-daemon (in 3.23.2), which raises the severity of this problem.
Created attachment 337611 [details] [review] untested patch I didn't test this, but it's along the lines of what we need to do I think.
Created attachment 337612 [details] [review] updated (but still not tested) patch for GDBus
Created attachment 338815 [details] [test code] race between g_bus_get_sync and g_dbus_proxy_new_for_bus_sync
Created attachment 338816 [details] [Build script] race between g_bus_get_sync and g_dbus_proxy_new_for_bus_sync
Created attachment 338817 [details] [run script] race between g_bus_get_sync and g_dbus_proxy_new_for_bus_sync
Created attachment 338818 [details] [review] [patch] race between g_bus_get_sync and g_dbus_proxy_new_for_bus_sync
(In reply to Colin Walters from comment #19) > Created attachment 337612 [details] [review] [review] updated (but still not tested) > patch for GDBus This patch still have a problem for me. - test_gdbusconnection.c (attached) is still blocked with this patch.
I add test code and patch for deadlock between g_bus_get_sync and g_dbus_proxy_new_for_bus_sync. (Commit #20, #21, #22, #23) It is just work-around code. But it works fine for me. I couldn't find essential patch for this problem.
Created attachment 338820 [details] [review] 0001-gdbus-Initialize-types-at-async-entrypoints.patch How about this version of your patch? I built it on top of mine, which should be a bit more future proof, and easier to grep _g_dbus_initialize() invocations to delete if we ever fix the core bug.
(In reply to Colin Walters from comment #26) > Created attachment 338820 [details] [review] [review] > 0001-gdbus-Initialize-types-at-async-entrypoints.patch How about this > version of your patch? I built it on top of mine, which should be a bit > more future proof, and easier to grep _g_dbus_initialize() invocations to > delete if we ever fix the core bug. I've tested on ubuntu 14.04(2.40.2) and Fedora 24(2.48.2). I made this patch from git mainstream e37a3c94739385ae5b37bc19668142bec04d8c1c (https://git.gnome.org/browse/glib)
(In reply to insun.pyo from comment #27) > (In reply to Colin Walters from comment #26) > > Created attachment 338820 [details] [review] [review] [review] > > 0001-gdbus-Initialize-types-at-async-entrypoints.patch > > How about this > > version of your patch? I built it on top of mine, which should be a bit > > more future proof, and easier to grep _g_dbus_initialize() invocations to > > delete if we ever fix the core bug. > > I've tested on ubuntu 14.04(2.40.2) and Fedora 24(2.48.2). > I made this patch from git mainstream > e37a3c94739385ae5b37bc19668142bec04d8c1c > (https://git.gnome.org/browse/glib) I am so sorry. I misunderstood your opinion. Please ignore my comment.
Review of attachment 338820 [details] [review]: I wonder whether you call g_type_ensure (G_TYPE_DBUS_CONNECTION); from anywhere. The _g_dbus_initialize () didn't calls any g_type_ensure. Please let me know what code fix this dead-lock problem. ========================================================== _g_dbus_initialize (void) { static volatile gsize initialized = 0; if (g_once_init_enter (&initialized)) { volatile GQuark g_dbus_error_domain; const gchar *debug; g_dbus_error_domain = G_DBUS_ERROR; (g_dbus_error_domain); /* To avoid -Wunused-but-set-variable */ debug = g_getenv ("G_DBUS_DEBUG"); if (debug != NULL) { const GDebugKey keys[] = { { "authentication", G_DBUS_DEBUG_AUTHENTICATION }, { "transport", G_DBUS_DEBUG_TRANSPORT }, { "message", G_DBUS_DEBUG_MESSAGE }, { "payload", G_DBUS_DEBUG_PAYLOAD }, { "call", G_DBUS_DEBUG_CALL }, { "signal", G_DBUS_DEBUG_SIGNAL }, { "incoming", G_DBUS_DEBUG_INCOMING }, { "return", G_DBUS_DEBUG_RETURN }, { "emission", G_DBUS_DEBUG_EMISSION }, { "address", G_DBUS_DEBUG_ADDRESS } }; _gdbus_debug_flags = g_parse_debug_string (debug, keys, G_N_ELEMENTS (keys)); if (_gdbus_debug_flags & G_DBUS_DEBUG_PAYLOAD) _gdbus_debug_flags |= G_DBUS_DEBUG_MESSAGE; } g_once_init_leave (&initialized, 1); } } =====================================================================
Hi insun.pyo, you need to combine with this patch: https://bugzilla.gnome.org/show_bug.cgi?id=674885#c19
So I'm hitting something that mcatanzaro suggested may be related to this after upgrading my desktop to Rawhide. When I boot and try to log in - if the whole system doesn't just immediately hang, which it often does, but I think that's nouveau's fault or something - GNOME starts apparently fine, but my regular startup apps don't run, and after a couple of minutes, the 'Oh no! Something has gone wrong.' screen appears. If I look at the journal, I see these messages at the time this happens: Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.MediaKeys.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: Unrecoverable failure in required component org.gnome.SettingsDaemon.MediaKeys.desktop Nov 17 14:32:01 adam.happyassassin.net gnome-session[2039]: gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.Keyboard.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.Keyboard.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: Unrecoverable failure in required component org.gnome.SettingsDaemon.Keyboard.desktop Nov 17 14:32:01 adam.happyassassin.net gnome-session[2039]: gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.Housekeeping.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.Housekeeping.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: Unrecoverable failure in required component org.gnome.SettingsDaemon.Housekeeping.desktop Nov 17 14:32:01 adam.happyassassin.net gnome-session[2039]: gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.Color.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.Color.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: Unrecoverable failure in required component org.gnome.SettingsDaemon.Color.desktop Nov 17 14:32:01 adam.happyassassin.net gnome-session[2039]: gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.Clipboard.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.Clipboard.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: Unrecoverable failure in required component org.gnome.SettingsDaemon.Clipboard.desktop Nov 17 14:32:01 adam.happyassassin.net gnome-session[2039]: gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.A11yKeyboard.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.A11yKeyboard.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: Unrecoverable failure in required component org.gnome.SettingsDaemon.A11yKeyboard.desktop Nov 17 14:32:01 adam.happyassassin.net gnome-session[2039]: gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.XSettings.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.XSettings.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: Unrecoverable failure in required component org.gnome.SettingsDaemon.XSettings.desktop Nov 17 14:32:01 adam.happyassassin.net gnome-session[2039]: gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.XRANDR.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.XRANDR.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: Unrecoverable failure in required component org.gnome.SettingsDaemon.XRANDR.desktop Nov 17 14:32:01 adam.happyassassin.net gnome-session[2039]: gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.Sharing.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.Sharing.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: Unrecoverable failure in required component org.gnome.SettingsDaemon.Sharing.desktop Nov 17 14:32:01 adam.happyassassin.net gnome-session[2039]: gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.PrintNotifications.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.PrintNotifications.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: Unrecoverable failure in required component org.gnome.SettingsDaemon.PrintNotifications.desktop Nov 17 14:32:01 adam.happyassassin.net gnome-session[2039]: gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.Power.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: WARNING: Application 'org.gnome.SettingsDaemon.Power.desktop' failed to register before timeout Nov 17 14:32:01 adam.happyassassin.net gnome-session-binary[2039]: Unrecoverable failure in required component org.gnome.SettingsDaemon.Power.desktop With vanilla Rawhide packages, shortly after that, the desktop hangs and I can no longer move the mouse. I can ssh into the system, though. I tried building a glib2 package with both Colin's patches applied. With that glib2, the main bug still happens: I still get the log messages and the "Oh no!" screen. However, right after that happens, a couple of my usual startup apps (but not all of them) run, and the desktop doesn't hang, it keeps working - except that the 'Oh no!' window is blocking the right hand monitor. If I try and alt-f4 it I get dumped back to gdm without ceremony. This also happens with a brand new user account.
Here's a funny thing: after poking at this for a bit I decided to switch to lightdm+xfce for the moment so I could get some work done. I switched the DM to lightdm and rebooted, then logged in - but forgot to change the session to Xfce...and lightdm started a perfectly working GNOME session. It seems to be GNOME-on-X11 not GNOME-Wayland, but it's working. I'm pretty sure I saw this bug when I tried a GNOME-on-X11 session from gdm. I can confirm that later.
(In reply to Adam Williamson from comment #32) > Here's a funny thing: after poking at this for a bit I decided to switch to > lightdm+xfce for the moment so I could get some work done. I switched the DM > to lightdm and rebooted, then logged in - but forgot to change the session > to Xfce...and lightdm started a perfectly working GNOME session. It seems to > be GNOME-on-X11 not GNOME-Wayland, but it's working. I'm pretty sure I saw > this bug when I tried a GNOME-on-X11 session from gdm. I can confirm that > later. It's a race, you just got lucky. It could happen in gnome-settings-daemon, but launching one binary means we'd hit it less often than when launching 21 of them.
Created attachment 340271 [details] [review] gdbus: Initialize types earlier to break proxy <-> connection deadlock This will help us break generic GType deadlocks between people using GDBus in different threads (which is supported), not just by GType usage in the GDBus thread. This should fix the common cases we're seeing in the wild, although I have some lingering concerns that if someone e.g. referenced e.g. `G_TYPE_DBUS_AUTH_MECHANISM_SHA1` etc. we'd need to add those too.
https://bugzilla.gnome.org/show_bug.cgi?id=674885#c34 combined with https://bugzilla.gnome.org/show_bug.cgi?id=674885#c26 fixes the test case in https://bugzilla.gnome.org/show_bug.cgi?id=674885#c20 for me. Bastien, I suspect your case could also be fixed by patching dconf like https://bugzilla.gnome.org/show_bug.cgi?id=674885#c16 but...hm, we may also need: diff --git a/gio/gdbusconnection.c b/gio/gdbusconnection.c index 5408427..5e64523 100644 --- a/gio/gdbusconnection.c +++ b/gio/gdbusconnection.c @@ -7171,6 +7171,8 @@ get_uninitialized_connection (GBusType bus_type, ret = NULL; + _g_dbus_initialize (); + G_LOCK (message_bus_lock); singleton = message_bus_get_singleton (bus_type, error); if (singleton == NULL) Adam, can you collect stack traces from your hung login?
I pushed https://git.gnome.org/browse/dconf/commit/?id=701d19d12d4e0599340c9bd1eb2b3e25a40d780b
I hope we explore Allison's suggestion in comment #3 instead of committing workarounds.
Created attachment 340274 [details] [review] gdbus: Initialize types at async entrypoints This isn't a comprehensive fix, but should cover a lot of cases for GDBus.
Ok, I went through and changed https://bugzilla.gnome.org/show_bug.cgi?id=674885#c38 so we call initialize as early as possible (particularly we want to be before g_return_if_fail (G_IS_BLAH ()) since those call into gtype) and that we do so in all entrypoints for g_dbus_connection_new() and g_dbus_proxy_new().
(In reply to Michael Catanzaro from comment #37) > I hope we explore Allison's suggestion in comment #3 instead of committing > workarounds. I don't think "instead of" is a viable plan - there are only a few mortals who can edit gtype.c.
Bastien: I tried it at least 20 times with GDM, and it failed *every single time*. I tried it once with lightdm and it worked right away. Pretty lucky! I've also never once seen this happen with the old g-s-d in, what, 7 years of using this system.
Colin: what exactly do you need traces from, at what point?
Created attachment 340303 [details] gsd-keyboard lockup gsd-keyboard trace locked up on login
Created attachment 340304 [details] gsd-sharing lockedup gsd-sharing locked up on login
(In reply to Adam Williamson from comment #42) > Colin: what exactly do you need traces from, at what point? Same thing Yanko attached in https://bugzilla.gnome.org/show_bug.cgi?id=674885#c43
Starting a g-s-d patch here: https://bugzilla.gnome.org/show_bug.cgi?id=774813
Created attachment 341179 [details] gsd-keyboard backtrace
Created attachment 341180 [details] gsd-sharing backtrace
Comment on attachment 341179 [details] gsd-keyboard backtrace > >Thread 4 (Thread 0x7fbc58e89700 (LWP 1971)): >#0 0x00007fbc67adfd3d in poll () at ../sysdeps/unix/syscall-template.S:84 >#1 0x00007fbc68013236 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0 >#2 0x00007fbc680135c2 in g_main_loop_run () at /lib64/libglib-2.0.so.0 >#3 0x00007fbc68600796 in gdbus_shared_thread_func () at /lib64/libgio-2.0.so.0 >#4 0x00007fbc6803adc3 in g_thread_proxy () at /lib64/libglib-2.0.so.0 >#5 0x00007fbc67db172a in start_thread (arg=0x7fbc58e89700) at pthread_create.c:333 > __res = <optimized out> > pd = 0x7fbc58e89700 > now = <optimized out> > unwind_buf = > {cancel_jmp_buf = {{jmp_buf = {140446922217216, 5377119303540517941, 0, 140446938998799, 140446922217920, 140446922217216, -5339038981702177739, -5339138088172540875}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} > not_first_call = <optimized out> > pagesize_m1 = <optimized out> > sp = <optimized out> > freesize = <optimized out> > __PRETTY_FUNCTION__ = "start_thread" >#6 0x00007fbc67aeb3af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105 > >Thread 3 (Thread 0x7fbc5968a700 (LWP 1970)): >#0 0x00007fbc67adfd3d in poll () at ../sysdeps/unix/syscall-template.S:84 >#1 0x00007fbc68013236 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0 >#2 0x00007fbc6801334c in g_main_context_iteration () at /lib64/libglib-2.0.so.0 >#3 0x00007fbc68013391 in glib_worker_main () at /lib64/libglib-2.0.so.0 >#4 0x00007fbc6803adc3 in g_thread_proxy () at /lib64/libglib-2.0.so.0 >#5 0x00007fbc67db172a in start_thread (arg=0x7fbc5968a700) at pthread_create.c:333 > __res = <optimized out> > pd = 0x7fbc5968a700 > now = <optimized out> > unwind_buf = > {cancel_jmp_buf = {{jmp_buf = {140446930609920, 5377119303540517941, 0, 140446938998447, 140446930610624, 140446930609920, -5339035683704165323, -5339138088172540875}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} > not_first_call = <optimized out> > pagesize_m1 = <optimized out> > sp = <optimized out> > freesize = <optimized out> > __PRETTY_FUNCTION__ = "start_thread" >#6 0x00007fbc67aeb3af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105 > >Thread 2 (Thread 0x7fbc59e8b700 (LWP 1969)): >#0 0x00007fbc67adfd3d in poll () at ../sysdeps/unix/syscall-template.S:84 >#1 0x00007fbc68013236 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0 >#2 0x00007fbc6801334c in g_main_context_iteration () at /lib64/libglib-2.0.so.0 >#3 0x00007fbc59e92fad in dconf_gdbus_worker_thread () at /usr/lib64/gio/modules/libdconfsettings.so >#4 0x00007fbc6803adc3 in g_thread_proxy () at /lib64/libglib-2.0.so.0 >#5 0x00007fbc67db172a in start_thread (arg=0x7fbc59e8b700) at pthread_create.c:333 > __res = <optimized out> > pd = 0x7fbc59e8b700 > now = <optimized out> > unwind_buf = > {cancel_jmp_buf = {{jmp_buf = {140446939002624, 5377119303540517941, 0, 140734096377423, 140446939003328, 140446939002624, -5339036783752664011, -5339138088172540875}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} > not_first_call = <optimized out> > pagesize_m1 = <optimized out> > sp = <optimized out> > freesize = <optimized out> > __PRETTY_FUNCTION__ = "start_thread" >#6 0x00007fbc67aeb3af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105 > >Thread 1 (Thread 0x7fbc6b38d100 (LWP 1880)): >#0 0x00007fbc67adfd3d in poll () at ../sysdeps/unix/syscall-template.S:84 >#1 0x00007fbc68013236 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0 >#2 0x00007fbc680135c2 in g_main_loop_run () at /lib64/libglib-2.0.so.0 >#3 0x00007fbc69bc4635 in gtk_main () at gtkmain.c:1301 > loop = 0x5649921a7fd0 >#4 0x0000564990ddcd05 in main (argc=<optimized out>, argv=<optimized out>) at ../../plugins/common/daemon-skeleton-gtk.h:205 > error = 0x0 >Detaching from program: /usr/libexec/gsd-keyboard, process 1880
Comment on attachment 341180 [details] gsd-sharing backtrace sorry, those traces were garbage, they were from gdm's processes, not my user's. mcatanzaro says it's not important for me to provide the traces, so I'm going to go on to something else. logging in with lightdm always works around this, for me. no idea what that means.
*** Bug 779199 has been marked as a duplicate of this bug. ***
Something that looks a lot like this seems to affect the no-user-created-yet g-i-s session in current Fedora 26 with g-i-s 3.23.92: https://bugzilla.redhat.com/show_bug.cgi?id=1431879#c8 I think I do still hit it sometimes when booting my own desktop, too - about half the time (I don't boot up often, though).
(In reply to Adam Williamson from comment #52) > Something that looks a lot like this seems to affect the no-user-created-yet > g-i-s session in current Fedora 26 with g-i-s 3.23.92: > > https://bugzilla.redhat.com/show_bug.cgi?id=1431879#c8 > > I think I do still hit it sometimes when booting my own desktop, too - about > half the time (I don't boot up often, though). I thought this "fixed itself" because I never saw it again on my own machine, but it seems that it's highly random. In any case, a work-around is in https://bugzilla.gnome.org/show_bug.cgi?id=774813
*** Bug 772848 has been marked as a duplicate of this bug. ***
*** Bug 627724 has been marked as a duplicate of this bug. ***
Even after this patch, gnome-initial-setup 3.23.92's New User mode still fails to start on Ubuntu GNOME 17.04 Beta. I was able to workaround the problem by reducing the final line of /usr/share/gnome-session/sessions/gnome-initial-setup.session to RequiredComponents=setup-shell;gnome-initial-setup;org.gnome.SettingsDaemon.Keyboard;org.gnome.SettingsDaemon.Power; which disables accessibility and probably other important stuff.
(In reply to Jeremy Bicha from comment #56) > Even after this patch, gnome-initial-setup 3.23.92's New User mode still > fails to start on Ubuntu GNOME 17.04 Beta. Did you try the patch in bug #774813?
(In reply to Michael Catanzaro from comment #57) > Did you try the patch in bug #774813? Yes, that's what I meant. I used this patch: https://git.gnome.org/browse/gnome-settings-daemon/commit/?id=e5c6735c I even tried with this too but it still didn't solve the issue: https://git.gnome.org/browse/dconf/commit/?id=701d19d I see that gnome-initial-setup's New User mode being broken is now one of a handful of blockers for Fedora 26 Alpha.
Yes, but that bug is the other case, where g-i-s is clearly broken 100% of the time. This case is much less deterministic, and we might not block Alpha on it, if it turns out that it sometimes runs and sometimes fails. I haven't had time to look into it further yet.
I'm not using Fedora 26 but gnome-initial-setup's New User Mode is still broken 100% of the time for me.
Jeremy, can you attach a stack?
Created attachment 348194 [details] [review] gdbus-gtype-race-workaround: combined patchset
Since I had to re-read this bug again carefully to find the patches I'd previously tested, for simplicity in the future I'm going to combine them into a single `git-format-patch`. Current version can be found ↑ in comment 62.
Conceptually, we're trying to work around this race by moving the get_type() calls that are subject to this to the main thread. Another approach we might be able to take is moving the get_type() for dependencies outside of the "once init", i.e.: ``` GType g_dbus_proxy_get_type (void) { g_type_ensure (G_TYPE_DBUS_CONNECTION); static volatile gsize g_define_type_id__volatile = 0; if (g_once_init_enter (&g_define_type_id__volatile)) { ... } } ``` Something like: ``` G_DEFINE_TYPE_WITH_CODE_AND_PRELUDE (GDBusProxy, g_dbus_proxy, G_TYPE_OBJECT, g_type_ensure (G_TYPE_DBUS_CONNECTION), /* Prelude */ G_ADD_PRIVATE (GDBusProxy) /* the previous "CODE" */ G_IMPLEMENT_INTERFACE (G_TYPE_DBUS_INTERFACE, dbus_interface_iface_init) G_IMPLEMENT_INTERFACE (G_TYPE_INITABLE, initable_iface_init) G_IMPLEMENT_INTERFACE (G_TYPE_ASYNC_INITABLE, async_initable_iface_init)) ```
Review of attachment 348194 [details] [review]: I think we should land this right after 2.52
Created attachment 348245 [details] journal from gnome-initial-setup New User fail to start Colin, could you give me specific instructions? I applied the 348194 patch to my copy of glib but gnome-initial-setup still fails to start. I'm attaching my journal log.
So I've filed https://bugzilla.redhat.com/show_bug.cgi?id=1434154 downstream for what seems to be a case of this to which the gnome-initial-setup session is peculiarly vulnerable. Basically it seems like the g-i-s session almost *always* fails to start properly in a way that looks a lot like this bug - multiple "Unrecoverable failure in required component" / "failed to register before timeout" errors. This is the specific case Jeremy is also encountering.
This is really bad. Jeremy, could you try getting a backtrace with gdb to show where gnome-settings-daemon is stuck when it deadlocks?
Michael, could you give me specific instructions about how to do that?
Ctrl+Alt+F3 to get to a virtual terminal $ sudo gdb -p `pidof gnome-settings-daemon` (gdb) set logging on (gdb) thread apply all bt full (If there are two gnome-settings-daemon instances at the point of the hang, you'll need to find its pid manually using top or something.)
I'll see if I can get a trace from the Fedora case in a bit, but I'm also working on a FreeIPA blocker so my attention is kinda divided :/
(In reply to Michael Catanzaro from comment #70) > Ctrl+Alt+F3 to get to a virtual terminal > > $ sudo gdb -p `pidof gnome-settings-daemon` > (gdb) set logging on > (gdb) thread apply all bt full > > (If there are two gnome-settings-daemon instances at the point of the hang, > you'll need to find its pid manually using top or something.) 1) There's no gnome-settings-daemon binary anymore 2) There's likely going to be at least 2 sets of gsd-* binaries, for gdm and the g-i-s setup. Anyway, unfixable without backtraces of those hangs.
Well, you can get the backtraces *yourself*, you know. We've given enough information about how to reproduce the bug: you take a recent Fedora 26 Workstation image, run an install without creating a user, and boot the installed system. That's it. That's all the work you have to do. Everyone who's tried that so far has run straight into the bug.
(In reply to Adam Williamson from comment #73) > Well, you can get the backtraces *yourself*, you know. We've given enough > information about how to reproduce the bug: you take a recent Fedora 26 > Workstation image, run an install without creating a user, and boot the > installed system. That's it. That's all the work you have to do. Everyone > who's tried that so far has run straight into the bug. The gnome-settings-daemon patch from bug 774813 only got into a package in Fedora 26 around 6 hours ago [1]. So, yes, I need backtraces from Jeremy's system which apparently already had patches applied. [1]: gnome-settings-daemon-3.24.0-1* at: https://koji.fedoraproject.org/koji/packageinfo?packageID=5615
(In reply to Jeremy Bicha from comment #66) > Created attachment 348245 [details] > journal from gnome-initial-setup New User fail to start Mar 18 22:26:40 bob gsd-smartcard[1998]: Unable to register client: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.gnome.SessionManager was not provided by any .service files This isn't the same bug as here. For whatever reason, gnome-settings-daemon in the gnome-initial-setup doesn't have access to gnome-session via D-Bus. If it was the same bug as discussed here, then you wouldn't see any D-Bus errors, as it would hang before trying to contact gnome-session. Please file a new bug against gnome-initial-setup.
(In reply to Bastien Nocera from comment #75) > Please file a new bug against gnome-initial-setup. Done. https://bugzilla.gnome.org/780362
Review of attachment 348194 [details] [review]: .
Comment on attachment 348194 [details] [review] gdbus-gtype-race-workaround: combined patchset https://git.gnome.org/browse/glib/commit/?id=5b4b827e9940d724580101546c601c2e3c77ff82
Leaving this open since the underlying issue isn't fixed, and I wouldn't be surprised if variants of this were lurking in other threaded apps/libraries (e.g. GStreamer)? We can also try a somewhat more comprehensive workaround with a new API: https://bugzilla.gnome.org/show_bug.cgi?id=674885#c64
(In reply to Colin Walters from comment #79) > Leaving this open since the underlying issue isn't fixed, and I wouldn't be > surprised if variants of this were lurking in other threaded apps/libraries > (e.g. GStreamer)? GStreamer still has an explicit initialisation function, so that's unlikely.
I get this with mingw32+master: make[4]: Entering directory '/home/xy/glib/gio' CC libgio_2_0_la-gdbusprivate.lo gdbusprivate.c: In function 'ensure_required_types': gdbusprivate.c:231:16: error: 'G_TYPE_DBUS_CONNECTION' undeclared (first use in this function) ensure_type (G_TYPE_DBUS_CONNECTION); ^~~~~~~~~~~~~~~~~~~~~~ gdbusprivate.c:231:16: note: each undeclared identifier is reported only once for each function it appears in gdbusprivate.c:232:16: error: 'G_TYPE_DBUS_PROXY' undeclared (first use in this function) ensure_type (G_TYPE_DBUS_PROXY); ^~~~~~~~~~~~~~~~~
@lazka: Probably better to track that as a new bug. Anyways looks like this happens because `#ifdef G_OS_UNIX` at the top pulls in a few headers. Can you try this patch? diff --git a/gio/gdbusprivate.c b/gio/gdbusprivate.c index eeb12d9..a068029 100644 --- a/gio/gdbusprivate.c +++ b/gio/gdbusprivate.c @@ -27,6 +27,8 @@ #include "gsocket.h" #include "gdbusprivate.h" #include "gdbusmessage.h" +#include "gdbusconnection.h" +#include "gdbusproxy.h" #include "gdbuserror.h" #include "gdbusintrospection.h" #include "gtask.h"
(In reply to Colin Walters from comment #82) > @lazka: Probably better to track that as a new bug. Anyways looks like this > happens because `#ifdef G_OS_UNIX` at the top pulls in a few headers. Can > you try this patch? Thanks, the patch works here.
https://git.gnome.org/browse/glib/commit/?id=ff7f32f643e961a116ee1a6e333752a64a8f617e
Created attachment 352182 [details] backtrace I get about 1:20 times in the gnome-software self tests
I believe the trace in comment #85 is conceptually not specific to GDBus at all; going back to the bug description, here we have (A = GSocket, B = GSocketAddress). Now, again because GDBus is a thing that's going to be creating threads early, it's going to commonly provoke these races. So we could work around this indeed in the same way we did others. However, the fact that we've now found a second case independent of GDBus probably argues more strongly for a more generic approach like https://bugzilla.gnome.org/show_bug.cgi?id=674885#c64
Richard, you should be able to work around this specific one by doing: ``` g_type_ensure (G_TYPE_SOCKET_FAMILY); g_type_ensure (G_TYPE_SOCKET_TYPE); g_type_ensure (G_TYPE_SOCKET_PROTOCOL); g_type_ensure (G_TYPE_SOCKET_ADDRESS); ``` early on in main(). In general, the work around is to do a g_type_ensure() for all types that are installed as properties of a class (in this case GSocket) that may be initialized from multiple threads.
Created attachment 352183 [details] [review] 0001-gtype-Add-DEFINE_TYPE-with-prelude-to-workaround-gty.patch OK, I wrote a patch for this that I compile tested. Anyone object to the approach? If not, I'll polish it some more and test more.
Created attachment 352185 [details] [review] 0001-gdbus-Init-more-types-to-work-around-gtype-thread-is.patch
I know this is pretty tricky to fix, and I guess committing more workarounds is inevitable. But it seems inadvisable to add new public API to GLib just as a workaround for this bug.
Review of attachment 352185 [details] [review]: ++
We don't have to make it public, but the burden here is pretty low; we're just adding the ability to put code outside of the type init once. And all of these are just *convenience* macros - one could write out all of the type init by hand. So we're not giving anyone the ability to do anything they couldn't already do. Happy to hear other ideas (apart from fixing gtype.c, since I looked and have low confidence in my ability to do that personally).
I think the approach (encoding the type dependency tree) makes sense
Actually, now that I look at the gsocket diff, there's another use case here: move the g_networking_init () into the prelude as well. It's probably not a deadlock case, but it's definitely best practice to avoid nested locking if it's not necessary.
Ok, if there are no objections I'll commit the patch fromhttps://bugzilla.gnome.org/show_bug.cgi?id=674885#c88 soon.
Review of attachment 352183 [details] [review]: Given that Allison proposed a more comprehensive solution in comment #3 which hasn’t yet been explored (unless I’ve missed that exploration in the plethora of comments here), I would prefer it if this workaround was not made into public API. However, in the interests of getting a practical fix out there for this 5-year-old bug, I’m all in favour of this workaround approach if it uses a private API. (In reply to Colin Walters from comment #92) > We don't have to make it public, but the burden here is pretty low; we're > just adding the ability to put code outside of the type init once. And all > of these are just *convenience* macros - one could write out all of the type > init by hand. So we're not giving anyone the ability to do anything they > couldn't already do. Sure, by making the API public we’re not allowing people to do anything they couldn’t already do. However, we are adding maintenance burden.
Created attachment 353748 [details] [review] 0001-gtype-Add-private-DEFINE_TYPE-with-prelude-to-workar.patch OK, yeah, it's likely that most of the types subject to this are from GLib; applications defining their own types can easily work around it. In the GObject library world my instinct is (aside from GStreamer) there aren't many libraries with objects used from multiple threads. We can certainly start with a private macro easily and promote it if need be.
Review of attachment 353748 [details] [review]: ++
https://git.gnome.org/browse/glib/commit/?id=017f78d77f0bf2bed749e21199ea89d75e56ab69
Comment on attachment 353748 [details] [review] 0001-gtype-Add-private-DEFINE_TYPE-with-prelude-to-workar.patch >From 7f703b1f6eec4080b1b1b1d641ae6cdae6fc1fe3 Mon Sep 17 00:00:00 2001 >From: Colin Walters <walters@verbum.org> >Date: Fri, 19 May 2017 15:54:39 -0400 >Subject: [PATCH] gtype: Add private DEFINE_TYPE with prelude to workaround > gtype deadlocks > >And use it in GSocket as a demo. >--- > gio/gsocket.c | 24 +++++++++++++++++------- > gobject/gtype-private.h | 11 +++++++++++ > gobject/gtype.h | 16 ++++++++++++++-- > 3 files changed, 42 insertions(+), 9 deletions(-) > >diff --git a/gio/gsocket.c b/gio/gsocket.c >index d2f4970..d9ea3e2 100644 >--- a/gio/gsocket.c >+++ b/gio/gsocket.c >@@ -52,6 +52,9 @@ > #include <sys/uio.h> > #endif > >+#define GOBJECT_COMPILATION >+#include "gobject/gtype-private.h" /* For _PRELUDE type define */ >+#undef GOBJECT_COMPILATION > #include "gcancellable.h" > #include "gdatagrambased.h" > #include "gioenumtypes.h" >@@ -267,13 +270,20 @@ struct _GSocketPrivate > } recv_addr_cache[RECV_ADDR_CACHE_SIZE]; > }; > >-G_DEFINE_TYPE_WITH_CODE (GSocket, g_socket, G_TYPE_OBJECT, >- G_ADD_PRIVATE (GSocket) >- g_networking_init (); >- G_IMPLEMENT_INTERFACE (G_TYPE_INITABLE, >- g_socket_initable_iface_init); >- G_IMPLEMENT_INTERFACE (G_TYPE_DATAGRAM_BASED, >- g_socket_datagram_based_iface_init)); >+_G_DEFINE_TYPE_EXTENDED_WITH_PRELUDE (GSocket, g_socket, G_TYPE_OBJECT, 0, >+ /* Need a prelude for https://bugzilla.gnome.org/show_bug.cgi?id=674885 */ >+ g_type_ensure (G_TYPE_SOCKET_FAMILY); >+ g_type_ensure (G_TYPE_SOCKET_TYPE); >+ g_type_ensure (G_TYPE_SOCKET_PROTOCOL); >+ g_type_ensure (G_TYPE_SOCKET_ADDRESS); >+ /* And networking init is appropriate for the prelude */ >+ g_networking_init (); >+ , /* And now the regular type init code */ >+ G_ADD_PRIVATE (GSocket) >+ G_IMPLEMENT_INTERFACE (G_TYPE_INITABLE, >+ g_socket_initable_iface_init); >+ G_IMPLEMENT_INTERFACE (G_TYPE_DATAGRAM_BASED, >+ g_socket_datagram_based_iface_init)); > > static int > get_socket_errno (void) >diff --git a/gobject/gtype-private.h b/gobject/gtype-private.h >index ad56238..1bcd5b2 100644 >--- a/gobject/gtype-private.h >+++ b/gobject/gtype-private.h >@@ -90,6 +90,17 @@ void _g_closure_invoke_va (GClosure *closure, > int n_params, > GType *param_types); > >+/** >+ * _G_DEFINE_TYPE_EXTENDED_WITH_PRELUDE: >+ * >+ * See also G_DEFINE_TYPE_EXTENDED(). This macro is generally only >+ * necessary as a workaround for classes which have properties of >+ * object types that may be initialized in distinct threads. See: >+ * https://bugzilla.gnome.org/show_bug.cgi?id=674885 >+ * >+ * Currently private. >+ */ >+#define _G_DEFINE_TYPE_EXTENDED_WITH_PRELUDE(TN, t_n, T_P, _f_, _P_, _C_) _G_DEFINE_TYPE_EXTENDED_BEGIN_PRE (TN, t_n, T_P) {_P_;} _G_DEFINE_TYPE_EXTENDED_BEGIN_REGISTER (TN, t_n, T_P, _f_){_C_;} _G_DEFINE_TYPE_EXTENDED_END() > > G_END_DECLS > >diff --git a/gobject/gtype.h b/gobject/gtype.h >index d010a31..ea4f761 100644 >--- a/gobject/gtype.h >+++ b/gobject/gtype.h >@@ -1943,7 +1943,8 @@ static void type_name##_class_intern_init (gpointer klass) \ > } > #endif /* GLIB_VERSION_MAX_ALLOWED >= GLIB_VERSION_2_38 */ > >-#define _G_DEFINE_TYPE_EXTENDED_BEGIN(TypeName, type_name, TYPE_PARENT, flags) \ >+/* Added for _G_DEFINE_TYPE_EXTENDED_WITH_PRELUDE */ >+#define _G_DEFINE_TYPE_EXTENDED_BEGIN_PRE(TypeName, type_name, TYPE_PARENT) \ > \ > static void type_name##_init (TypeName *self); \ > static void type_name##_class_init (TypeName##Class *klass); \ >@@ -1962,7 +1963,11 @@ type_name##_get_instance_private (TypeName *self) \ > GType \ > type_name##_get_type (void) \ > { \ >- static volatile gsize g_define_type_id__volatile = 0; \ >+ static volatile gsize g_define_type_id__volatile = 0; >+ /* Prelude goes here */ >+ >+/* Added for _G_DEFINE_TYPE_EXTENDED_WITH_PRELUDE */ >+#define _G_DEFINE_TYPE_EXTENDED_BEGIN_REGISTER(TypeName, type_name, TYPE_PARENT, flags) \ > if (g_once_init_enter (&g_define_type_id__volatile)) \ > { \ > GType g_define_type_id = \ >@@ -1982,6 +1987,13 @@ type_name##_get_type (void) \ > return g_define_type_id__volatile; \ > } /* closes type_name##_get_type() */ > >+/* This was defined before we had G_DEFINE_TYPE_WITH_CODE_AND_PRELUDE, it's simplest >+ * to keep it. >+ */ >+#define _G_DEFINE_TYPE_EXTENDED_BEGIN(TypeName, type_name, TYPE_PARENT, flags) \ >+ _G_DEFINE_TYPE_EXTENDED_BEGIN_PRE(TypeName, type_name, TYPE_PARENT) \ >+ _G_DEFINE_TYPE_EXTENDED_BEGIN_REGISTER(TypeName, type_name, TYPE_PARENT, flags) \ >+ > #define _G_DEFINE_INTERFACE_EXTENDED_BEGIN(TypeName, type_name, TYPE_PREREQ) \ > \ > static void type_name##_default_init (TypeName##Interface *klass); \ >-- >2.9.4 >
(In reply to Philip Withnall from comment #98) > Review of attachment 353748 [details] [review] [review]: > > ++ Except it didn't help. I have git checkout of branch glib-2-54 at commit 97293636afb1133a19c4011860460b336945e2c3 and I get this deadlock:
+ Trace 238159
Thread 3 (Thread 0x7fe312015700 (LWP 29361))
Thread 2 (Thread 0x7fe312816700 (LWP 29360))
Thread 1 (Thread 0x7fe3150cc840 (LWP 29359))
Created attachment 363504 [details] [review] proposed patch Why not to admit that the class_init_rec_mutex is important for the whole call of get_type() function implementations, rather than adding more and more workarounds for currently known issues which may not work in some cases anyway? This patch contains a suggested fix. I reverted locally Colin's change (down to 2.54.0 release) and applied only this patch and I was not able to reproduce this issue with 10.000 runs of the test application, while I had been able to reproduce the issue with the same test application and glib in commit as stated in the previous comment. I didn't write developer documentation for the newly added functions yet, because I want to hear from you back, whether this is acceptable. It's correct change, both I believe it and the test proved it, but the final decision is up to you.
Review of attachment 363504 [details] [review]: You cannot simply change the macros: maintainers use them against *their* version of GLib, but users run the code they generate against *their* version of GLib, which may or may not have those symbols. Additionally, there's even code that gets generated with those macros, like the one from gdbus-codegen, which means yet another level of indirection. You'd need to rebuild every single GLib-based project and bump their minimum required version, transitively. This simply cannot happen. When we introduced the new instance private data API we had to split the G_DEFINE_TYPE macros and hide them behind a compile-time version check. This is really painful, but it would be the *minimum* requirement for making this work. I'm not going to mark this as 'needs-work' or 'rejected', because I'm not a GLib maintainer.
(In reply to Emmanuele Bassi (:ebassi) from comment #103) > You cannot simply change the macros: maintainers use them against *their* > version of GLib, but users run the code they generate against *their* > version of GLib, which may or may not have those symbols. I'm not sure why it would make any difference. Of course, building against 2.54.3 and trying to run this code against 2.54.0 will break, but that's sort of expected (though maybe the glib2 policy on the stable branches is different) and I do not know how you'd like to check for this in runtime without some overhead (see the end). Otherwise, it's a macro and as such doesn't cause trouble. Only rebuilt code will catch the changes, but as had been mentioned above, it looks like only glib/GDBus objects are affected, thus it's all fine. > When we introduced the new instance private data API we had to split the > G_DEFINE_TYPE macros and hide them behind a compile-time version check. This > is really painful, but it would be the *minimum* requirement for making this > work. I'm not sure it's applicable here. Could you give me a link to a related commit, please?
Created attachment 363510 [details] [review] 0001-gdbus-Use-_g_dbus_initialize-in-more-places.patch
(In reply to Milan Crha from comment #104) > (In reply to Emmanuele Bassi (:ebassi) from comment #103) > > You cannot simply change the macros: maintainers use them against *their* > > version of GLib, but users run the code they generate against *their* > > version of GLib, which may or may not have those symbols. > > I'm not sure why it would make any difference. Of course, building against > 2.54.3 and trying to run this code against 2.54.0 will break, but that's > sort of expected That's not expected *at all*. The macros expand code at compilation time, you're adding a run time dependency where none existed before. See bug 703191, which led to commit a4c352cd99738095ba34e04a86a2ffa9cc659cfe.
Milan, you should be able to work around this in Evolution though by calling e.g. `g_bus_get_sync (G_BUS_TYPE_SESSION)` very early on in `main()`. As for the patch in Comment 102 - it's worth considering I'd say. @ebassi: Hmm, Oh I see you're referring to e.g.: #if GLIB_VERSION_MAX_ALLOWED >= GLIB_VERSION_2_38 #define _G_DEFINE_TYPE_EXTENDED_CLASS_INIT(TypeName, type_name) \ Yeah, we should try to be friendly to people who don't want new injected runtime requirements. That said my high level feeling here is still that GDBus is 70% of the source of problems, with 25% GSocket, and there's some long tail of 5% other. So a few tactical workarounds for the GDBus case go a *long* way, like the `g_bus_get_sync()` early in main approach.
(In reply to Emmanuele Bassi (:ebassi) from comment #106) > That's not expected *at all*. > > The macros expand code at compilation time, you're adding a run time > dependency where none existed before. Okay. It depends on the stable branch policy and I agree it's not always expected. The correct thing would be to bump a soname version, which is not possible (good to do) in stable branches. I'm fine if it comes to the development version only, it's better than keep it broken forever. > See bug 703191, which led to commit a4c352cd99738095ba34e04a86a2ffa9cc659cfe. Thanks, I'll have a look. (In reply to Colin Walters from comment #107) > Milan, you should be able to work around this in Evolution though by calling > e.g. `g_bus_get_sync (G_BUS_TYPE_SESSION)` very early on in `main()`. It's done already [1] for GDBusConnection. That we know these two (and it seems it used to be only that one in the past), will really just workaround it until other affected types are found. [1] https://git.gnome.org/browse/evolution-data-server/commit/?id=fe7affe900a6d815830 Note this is not evolution, but evolution-data-server.
Just for the record, I extended the workaround for 3.26.3+, 3.27.3+: https://git.gnome.org/browse/evolution-data-server/commit/?id=291aed2ce
Milan, glib does not bump soname even in master... binary compat is guaranteed indefinitely. I think it would be acceptable to fix this bug in a manner that requires a rebuild of affected applications, so long as backwards compat is guaranteed. That is, I think it would be perfectly fine to use a variant of Milan's patch that uses GLIB_CHECK_VERSION to decide whether to use g_once_init_enter/leave or the new g_type_type_init_enter/leave. But I don't know whether it would be problematic for, in the same application, some code to be using g_once_init_enter/leave in class_init and other code to be using g_type_type_init_enter/leave in class_init.
Created attachment 363560 [details] [review] proposed patch ][ This time considering also GLIB_VERSION_MAX_ALLOWED.
(In reply to Colin Walters from comment #105) > Created attachment 363510 [details] [review] [review] > 0001-gdbus-Use-_g_dbus_initialize-in-more-places.patch I tried with it and it helps for the test app too, but it's still a workaround about knowing "this this this and this object causes trouble most often, then initialize them beforehand". It would not work for those like the following, I'm afraid: https://bugzilla.redhat.com/attachment.cgi?id=1122177
But if you use GLIB_VERSION_MAX_ALLOWED, almost nobody will ever get the fix... that's used to disable deprecation warnings when support for old GLib is required. Most apps don't use it at all. E.g. Epiphany will never use that, because we always want all deprecation warnings. I think you need to use GLIB_CHECK_VERSION.
Review of attachment 363560 [details] [review]: (Right?)
(In reply to Michael Catanzaro from comment #113) > But if you use GLIB_VERSION_MAX_ALLOWED, almost nobody will ever get the > fix... that's used to disable deprecation warnings when support for old GLib > is required. Most apps don't use it at all. E.g. Epiphany will never use > that, because we always want all deprecation warnings. No, when you do not define it in your lib/app, then you get everything. > I think you need to use GLIB_CHECK_VERSION. No, that's for glib users, not for glib itself. I know which version this code is added to, this really would not make sense. Please see commit from comment #106 and eventually also the related bug report.
(In reply to Milan Crha from comment #115) > (In reply to Michael Catanzaro from comment #113) > > But if you use GLIB_VERSION_MAX_ALLOWED, almost nobody will ever get the > > fix... that's used to disable deprecation warnings when support for old GLib > > is required. Most apps don't use it at all. E.g. Epiphany will never use > > that, because we always want all deprecation warnings. > > No, when you do not define it in your lib/app, then you get everything. But apps that do define it will now not get this important fix, for an arbitrary reason. E.g. WebKit sets MAX_ALLOWED to something fairly old. What a shame. > > I think you need to use GLIB_CHECK_VERSION. > > No, that's for glib users, not for glib itself. I know which version this > code is added to, this really would not make sense. Um... yes. Good point. My comment did not make sense. Emmanuele, is there a requirement that code compiled against newer GLib headers work properly when run with old versions of GLib?
(In reply to Michael Catanzaro from comment #116) > Emmanuele, is there a requirement that code compiled against newer GLib > headers work properly when run with old versions of GLib? I already said that it's the case for code generated by these macros. If you recompile your code against a newer version of GLib without bumping up your dependency on GLib (something that is *exceedingly* common), then you end up using newer symbols, which impose a runtime dependency — a dependency of which the app developer is completely unaware of. I thought I made myself clear when I linked a bug about this exact case, that led to a commit fixing this exact case.
Comment on attachment 363560 [details] [review] proposed patch ][ I mark the patch unreviewed in the light of the latest comments (my change was done because of Emmanuele's concerns, after all). Michael, if you set MAX_ALLOWED, then you get what you want, there's nothing bad about it. And in case of WebKit, which is basically single-threaded main/GUI thread library (widgets should be created only in the main/GUI thread), you are pretty much safe. The backtrace from a downstream bug (comment #112) shows deadlock when WebKit had been loading plugins. Again, nothing what would WebKit influence on its own. The WebKit background processes can be a different story though.
Review of attachment 363560 [details] [review]: This looks quite reasonable to me. I haven't tested it, but I can't think of why it wouldn't work; in the Linux kernel for example this is quite a common pattern. I think a lot of what @ebassi is saying is that this won't really work until quite a while down the line when multiple things are built with it, but since a lot of the problems are in GLib itself, as soon as we land this it'd immediately fix several cases.
Although hmm...this would probably add a lot more contention around the class init mutex right? Since it's very common to call the type init functions over and over; previously we'd just do a g_atomic_pointer_get(). Maybe what we should do is try to combine the once init __GNUC__ fast path and class init mutex, like... ``` gboolean g_type_type_init_enter (volatile gsize *location) { #ifdef __GNUC__ if (g_atomic_pointer_get (location)) return FALSE; #endif g_rec_mutex_lock (&class_init_rec_mutex); if (!g_once_init_enter (location)) { g_rec_mutex_unlock (&class_init_rec_mutex); return FALSE; } return TRUE; } ``` ?
Review of attachment 363560 [details] [review]: ::: gobject/gtype.c @@ +4903,3 @@ +/** + * g_type_type_init_enter: If these functions are public API they need to be listed in docs/reference/gobject/gobject-sections.txt. @@ +4930,3 @@ + +/** + * g_type_type_init_enter: Should be `g_type_type_init_leave:` @@ +4936,3 @@ + * + * Counter part of g_type_type_init_enter(), wrapper for g_once_init_leave(), + * releasing also the class initialization mutex, previously acquired Nitpick: ‘also releasing’ ::: gobject/gtype.h @@ +1325,3 @@ guint g_type_get_type_registration_serial (void); +#if GLIB_VERSION_MAX_ALLOWED >= GLIB_VERSION_2_56 I think this block deserves a comment explaining why it is conditional on 2.56, and why a #define is used at all.
Created attachment 364071 [details] [review] proposed patch ]I[ Added changes suggested by Philip. I'm sorry for the stupid typos. With respect of the docs change, eds/evo lets gtk-doc generate those files from sources, which needs less work from the devels. I'm not sure of the Colin's suggestion, though it's true that the get_type() functions are used all around the code more than plenty of times (think of the macros to verify that the passed-in object is of the proper type and such). Having GNUC tweaks only is not a good idea, from my point of view, because it changes behaviour in one build system only, while the consistency between build systems/tools should be preferred.
We already have the __GNUC__ case in the implementation of once: #ifdef __GNUC__ # define g_once_init_enter(location) \ Ultimately this traces back to the initial code drop: https://git.gnome.org/browse/glib/commit/?id=c9ccc828f1eff2ce39229ff7b294f24b3462e937 (gcc isn't a build system, it's a compiler, but I know what you mean) Is that worth doing? I'd say so, but I admit to having no data offhand. I just glanced at the glibc implementation of pthread_once and it does something similar. (Why actually isn't g_once an alias for pthread_once nowadays anyways?)
This didn’t get done for 2.56. ⇒ 2.58
In the committed attachment 353748 [details] [review] (adding the _PRELUDE to the G_DEFINE_*TYPE macros), I wanted to confirm whether or not the g_type_ensure() calls are intended to be outside of the g_once_init_enter(). Is that a necessary part of the fix? I'm curious because in my work to make _get_type() functions have the proper fast path in a post -fstack-protector world (bug #795180), I moved them into a secondary function protected by the g_once_init_enter().
*** Bug 796031 has been marked as a duplicate of this bug. ***
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/541.