GNOME Bugzilla – Bug 667494
GObject/GType deadlock
Last modified: 2013-08-18 10:15:47 UTC
I encountered a (rare) deadlock in GObject/GType during the startup phase of the maki IRC daemon. (You can find its code at http://redmine.ikkoku.de/projects/sushi-maki/repository if you need it.) It is pretty hard to reproduce and happens only 1 in 20 times or so. First some explanation: Thread 3 is started explicitly by the application and does some GUPnP initialization. It goes through g_type_class_ref(), which in turn does: g_static_rec_mutex_lock (&class_init_rec_mutex); Thread 2 is apparently started implicitly by GLib/GIO (probably due to g_bus_own_name()?) and handles GDBus connection stuff. The problem is now that Thread 2 and 3 apparently reach g_socket_address_get_type() more or less at the same time. Thread 2 wins the race and does the initialization protected by g_once_init_enter(), while Thread 3 does a g_cond_wait() inside g_once_init_enter(). During the initialization, Thread 2 now calls g_type_add_interface_static() (from G_IMPLEMENT_INTERFACE) and deadlocks, because this also tries to lock class_init_rec_mutex. Consequently, it can never do the g_cond_broadcast() in g_once_init_leave() and never wakes up Thread 3, which holds the class_init_rec_mutex lock forever. (In this example, this also causes a deadlock in Thread 1, which calls g_cancellable_new().) Here is the trace:
+ Trace 229406
Thread 2 (Thread 0x7fee0363e700 (LWP 10226))
Thread 1 (Thread 0x7fee07e36820 (LWP 10223))
I am not sure whether this is actually a bug in GLib or in GUPnP, so feel free to reassign it. Library versions: ii libglib2.0-0 2.30.0-0ubuntu4 GLib library of C routines ii libgupnp-1.0-4 0.18.1-2~oneiric1 GObject-based library for UPnP
Sounds like gupnp is doing something out of an init function that causes recursion into the type system (somewhat indirectly, via gdbus and a new thread). Haven't looked at the code in detail.
Is that using gupnp-igd somehow? That's the only place where threads come in.
http://redmine.ikkoku.de/projects/sushi-maki/repository/revisions/54f4022c5f171939ae39060375104b9239850d83/entry/source/plugins/upnp.c Looks like it has some weird code to *conditionally* use gupnp-igd if present. Don't know if in this particular case gupnp-igd is used or not. (In reply to comment #0) > Thread 3 is started explicitly by the application and does some GUPnP > initialization. > It goes through g_type_class_ref(), which in turn does: > g_static_rec_mutex_lock (&class_init_rec_mutex); > > Thread 2 is apparently started implicitly by GLib/GIO (probably due to > g_bus_own_name()?) and handles GDBus connection stuff. GUPnP is *not* thread-safe so you want to ensure that all gio/mainloop work is done in the same thread as GUPnP. Otherwise you'll get in a hell of troubles.
(In reply to comment #3) > http://redmine.ikkoku.de/projects/sushi-maki/repository/revisions/54f4022c5f171939ae39060375104b9239850d83/entry/source/plugins/upnp.c > > Looks like it has some weird code to *conditionally* use gupnp-igd if present. > Don't know if in this particular case gupnp-igd is used or not. BTW, you really don't want to have these #ifdef HAVE_GUPNP if that code doesn't do anything without GUPnP. Better just not build that plugin instead of having those ugly #ifdefs. Just a suggestion.
Khris, Unfortunately we wont have time to read through the sources of your code thoroughly so the best way to proceed would be for you to provide a simple test application (based on your existing code) that reproduces this issue.
(In reply to comment #5) > Khris, Unfortunately we wont have time to read through the sources of your code > thoroughly so the best way to proceed would be for you to provide a simple test > application (based on your existing code) that reproduces this issue. Sorry, I meant 'Michael' of course. Was looking at another bug at the same time.. :(
> Is that using gupnp-igd somehow? That's the only place where threads come in. Yes, it uses GUPnP-IGD, but since it hangs in gupnp_context_new() that part of the code is not reached. > GUPnP is *not* thread-safe so you want to ensure that all gio/mainloop work is done in the same thread as GUPnP. Otherwise you'll get in a hell of troubles. GUPnP is only used from within the same thread (the one the initialization is done in). I can not do all GIO stuff in the same thread, because I have per-server/connection threads. This should not cause trouble, right? > Khris, Unfortunately we wont have time to read through the sources of your code > thoroughly so the best way to proceed would be for you to provide a simple test > application (based on your existing code) that reproduces this issue. OK, will do. Thanks so far!
(In reply to comment #7) > > Is that using gupnp-igd somehow? That's the only place where threads come in. > > Yes, it uses GUPnP-IGD, but since it hangs in gupnp_context_new() that part of > the code is not reached. Yeah, I also saw that you're not using the threaded version. > > > GUPnP is *not* thread-safe so you want to ensure that all gio/mainloop work is > done in the same thread as GUPnP. Otherwise you'll get in a hell of troubles. > > GUPnP is only used from within the same thread (the one the initialization is > done in). I can not do all GIO stuff in the same thread, because I have > per-server/connection threads. > > This should not cause trouble, right? Ehm. Not sure. If you don't use separate main contexts you end up with events in threads where you aren't expecting them. As Zeeshan wrote, a reduced testcase would be helpful.
Created attachment 204908 [details] Minimal testcase I managed to produce a minimal testcase. I could not reproduce the exact same deadlock, however -- apparently the timing is very hard to get right. (Fiddling with the g_usleep()s in the code could make it possible to produce the same deadlock, but I gave up after some time.) The good (well, depends) news is that I was able to produce another deadlock, this time in gupnp_simple_igd_new(). Just extract the attached tarball and do "make" followed by "make run", this produces a deadlock within a few hundred iterations for me. See the trace:
+ Trace 229424
Thread 1 (Thread 0x2b5041e78120 (LWP 27739))
This is more or less the same as the initial deadlock, but this time with g_dbus_connection_get_type().
Thank you. Changing state, info provided.
I fail to get it locked. What's your version of gupnp-igd? Also, can you try if you get to a similar situation when you just instanciate two GUPnPControlPoints?
I've the feeling that in the second case GUPnP is just delaying the code execution long enough for the two GDBus functions to collide and is otherwise not involved at all.
Also does it help if you do GDBusConnection *system_bus; system_bus = g_bus_get_sync (G_BUS_TYPE_SYSTEM, NULL, NULL); directly after g_type_init () ?
Michael, does comment#13 help you ?
Sorry, I didn't have much time in the last few weeks. I will try to test it this weekend and come back to you.
Ok, according to my colleague comment 13 helped him. Can you check if that also works for you? Then it's related to bug 665211
Sorry for taking so long. Here are my results: I recently upgraded GLib and the problem still exists with version 2.32. Yes, the workaround in comment 13 fixes the problem -- because then the initialization stuff in g_dbus_connection_get_type() is not called from two threads concurrently anymore. However, since the original deadlock was caused by g_socket_address_get_type() I do not really see the connection to the bug you mentioned. I think the underlying problem is that the initialization in g_*_get_type() can deadlock if done by two threads concurrently, regardless of the specific type (see comment 0).
Created attachment 212428 [details] [review] Work-around a deadlock when quickly creating/releasing GDBusConnections
Can you check if that patch helps? I'm not really convinced that this is our bug and we're just triggering something lower here.
Actually I finally managed to get the lock myself and the proposed patch makes it go away. Just not a proper solution.
Created attachment 212437 [details] GUPnP-free minimal example Boiled down version of what happens within gupnp. Locks up for me on the first run.
Comment on attachment 212428 [details] [review] Work-around a deadlock when quickly creating/releasing GDBusConnections Attachment 212428 [details] pushed as a378c4c - Work-around a deadlock when quickly creating/releasing GDBusConnections
So is this FIXED then? Or still NEW because the bug itself is not fixed properly?
I'd consider this fixed