After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 667494 - GObject/GType deadlock
GObject/GType deadlock
Status: RESOLVED FIXED
Product: GUPnP
Classification: Other
Component: gupnp
unspecified
Other Linux
: Normal normal
: ---
Assigned To: GUPnP Maintainers
GUPnP Maintainers
Depends on:
Blocks:
 
 
Reported: 2012-01-08 15:09 UTC by Michael Kuhn
Modified: 2013-08-18 10:15 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Minimal testcase (1.54 KB, application/x-gzip)
2012-01-09 21:50 UTC, Michael Kuhn
  Details
Work-around a deadlock when quickly creating/releasing GDBusConnections (2.43 KB, patch)
2012-04-20 14:56 UTC, Jens Georg
committed Details | Review
GUPnP-free minimal example (2.33 KB, text/x-csrc)
2012-04-20 16:14 UTC, Jens Georg
  Details

Description Michael Kuhn 2012-01-08 15:09:51 UTC
I encountered a (rare) deadlock in GObject/GType during the startup phase of the maki IRC daemon.
(You can find its code at http://redmine.ikkoku.de/projects/sushi-maki/repository if you need it.)
It is pretty hard to reproduce and happens only 1 in 20 times or so.

First some explanation:

Thread 3 is started explicitly by the application and does some GUPnP initialization.
It goes through g_type_class_ref(), which in turn does:
  g_static_rec_mutex_lock (&class_init_rec_mutex);

Thread 2 is apparently started implicitly by GLib/GIO (probably due to g_bus_own_name()?) and handles GDBus connection stuff.

The problem is now that Thread 2 and 3 apparently reach g_socket_address_get_type() more or less at the same time.
Thread 2 wins the race and does the initialization protected by g_once_init_enter(), while Thread 3 does a g_cond_wait() inside g_once_init_enter().
During the initialization, Thread 2 now calls g_type_add_interface_static() (from G_IMPLEMENT_INTERFACE) and deadlocks, because this also tries to lock class_init_rec_mutex.
Consequently, it can never do the g_cond_broadcast() in g_once_init_leave() and never wakes up Thread 3, which holds the class_init_rec_mutex lock forever.
(In this example, this also causes a deadlock in Thread 1, which calls g_cancellable_new().)

Here is the trace:

Thread 2 (Thread 0x7fee0363e700 (LWP 10226))

  • #0 __lll_lock_wait
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 136
  • #1 _L_lock_883
    from /lib/x86_64-linux-gnu/libpthread.so.0
  • #2 __pthread_mutex_lock
    at pthread_mutex_lock.c line 61
  • #3 g_static_rec_mutex_lock
    at /build/buildd/glib2.0-2.30.0/./glib/gthread.c line 1450
  • #4 g_type_add_interface_static
    at /build/buildd/glib2.0-2.30.0/./gobject/gtype.c line 2807
  • #5 g_socket_address_get_type
    at /build/buildd/glib2.0-2.30.0/./gio/gsocketaddress.c line 72
  • #6 g_unix_socket_address_get_type
    at /build/buildd/glib2.0-2.30.0/./gio/gunixsocketaddress.c line 60
  • #7 g_unix_socket_address_new_with_type
    at /build/buildd/glib2.0-2.30.0/./gio/gunixsocketaddress.c line 426
  • #8 g_dbus_address_connect
    at /build/buildd/glib2.0-2.30.0/./gio/gdbusaddress.c line 569
  • #9 g_dbus_address_try_connect_one
    at /build/buildd/glib2.0-2.30.0/./gio/gdbusaddress.c line 767
  • #10 g_dbus_address_get_stream_sync
    at /build/buildd/glib2.0-2.30.0/./gio/gdbusaddress.c line 961
  • #11 initable_init
    at /build/buildd/glib2.0-2.30.0/./gio/gdbusconnection.c line 2316
  • #12 async_init_thread
    at /build/buildd/glib2.0-2.30.0/./gio/gasyncinitable.c line 268
  • #13 run_in_thread
    at /build/buildd/glib2.0-2.30.0/./gio/gsimpleasyncresult.c line 843
  • #14 io_job_thread
    at /build/buildd/glib2.0-2.30.0/./gio/gioscheduler.c line 180
  • #15 g_thread_pool_thread_proxy
    at /build/buildd/glib2.0-2.30.0/./glib/gthreadpool.c line 319
  • #16 g_thread_create_proxy
    at /build/buildd/glib2.0-2.30.0/./glib/gthread.c line 1962
  • #17 start_thread
    at pthread_create.c line 304
  • #18 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
  • #19 ??

Thread 1 (Thread 0x7fee07e36820 (LWP 10223))

  • #0 __lll_lock_wait
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 136
  • #1 _L_lock_883
    from /lib/x86_64-linux-gnu/libpthread.so.0
  • #2 __pthread_mutex_lock
    at pthread_mutex_lock.c line 61
  • #3 g_static_rec_mutex_lock
    at /build/buildd/glib2.0-2.30.0/./glib/gthread.c line 1450
  • #4 g_type_class_ref
    at /build/buildd/glib2.0-2.30.0/./gobject/gtype.c line 2900
  • #5 g_object_newv
    at /build/buildd/glib2.0-2.30.0/./gobject/gobject.c line 1398
  • #6 g_object_new
    at /build/buildd/glib2.0-2.30.0/./gobject/gobject.c line 1322
  • #7 sashimi_new
    at ../source/sashimi.c line 413
  • #8 maki_server_new
    at ../source/server.c line 613
  • #9 main
    at ../source/maki.c line 208

I am not sure whether this is actually a bug in GLib or in GUPnP, so feel free to reassign it.

Library versions:

ii  libglib2.0-0             2.30.0-0ubuntu4          GLib library of C routines
ii  libgupnp-1.0-4           0.18.1-2~oneiric1        GObject-based library for UPnP
Comment 1 Matthias Clasen 2012-01-09 12:26:35 UTC
Sounds like gupnp is doing something out of an init function that causes recursion into the type system (somewhat indirectly, via gdbus and a new thread).

Haven't looked at the code in detail.
Comment 2 Jens Georg 2012-01-09 12:41:48 UTC
Is that using gupnp-igd somehow? That's the only place where threads come in.
Comment 3 Zeeshan Ali 2012-01-09 17:03:32 UTC
http://redmine.ikkoku.de/projects/sushi-maki/repository/revisions/54f4022c5f171939ae39060375104b9239850d83/entry/source/plugins/upnp.c

Looks like it has some weird code to *conditionally* use gupnp-igd if present. Don't know if in this particular case gupnp-igd is used or not.

(In reply to comment #0)
> Thread 3 is started explicitly by the application and does some GUPnP
> initialization.
> It goes through g_type_class_ref(), which in turn does:
>   g_static_rec_mutex_lock (&class_init_rec_mutex);
> 
> Thread 2 is apparently started implicitly by GLib/GIO (probably due to
> g_bus_own_name()?) and handles GDBus connection stuff.

GUPnP is *not* thread-safe so you want to ensure that all gio/mainloop work is done in the same thread as GUPnP. Otherwise you'll get in a hell of troubles.
Comment 4 Zeeshan Ali 2012-01-09 17:10:26 UTC
(In reply to comment #3)
> http://redmine.ikkoku.de/projects/sushi-maki/repository/revisions/54f4022c5f171939ae39060375104b9239850d83/entry/source/plugins/upnp.c
> 
> Looks like it has some weird code to *conditionally* use gupnp-igd if present.
> Don't know if in this particular case gupnp-igd is used or not.

BTW, you really don't want to have these

#ifdef HAVE_GUPNP

if that code doesn't do anything without GUPnP. Better just not build that plugin instead of having those ugly #ifdefs. Just a suggestion.
Comment 5 Zeeshan Ali 2012-01-09 17:18:44 UTC
Khris, Unfortunately we wont have time to read through the sources of your code thoroughly so the best way to proceed would be for you to provide a simple test application (based on your existing code) that reproduces this issue.
Comment 6 Zeeshan Ali 2012-01-09 17:19:32 UTC
(In reply to comment #5)
> Khris, Unfortunately we wont have time to read through the sources of your code
> thoroughly so the best way to proceed would be for you to provide a simple test
> application (based on your existing code) that reproduces this issue.

Sorry, I meant 'Michael' of course. Was looking at another bug at the same time.. :(
Comment 7 Michael Kuhn 2012-01-09 18:28:23 UTC
> Is that using gupnp-igd somehow? That's the only place where threads come in.

Yes, it uses GUPnP-IGD, but since it hangs in gupnp_context_new() that part of the code is not reached.

> GUPnP is *not* thread-safe so you want to ensure that all gio/mainloop work is
done in the same thread as GUPnP. Otherwise you'll get in a hell of troubles.

GUPnP is only used from within the same thread (the one the initialization is done in). I can not do all GIO stuff in the same thread, because I have per-server/connection threads.

This should not cause trouble, right?

> Khris, Unfortunately we wont have time to read through the sources of your code
> thoroughly so the best way to proceed would be for you to provide a simple test
> application (based on your existing code) that reproduces this issue.

OK, will do. Thanks so far!
Comment 8 Jens Georg 2012-01-09 20:26:39 UTC
(In reply to comment #7)
> > Is that using gupnp-igd somehow? That's the only place where threads come in.
> 
> Yes, it uses GUPnP-IGD, but since it hangs in gupnp_context_new() that part of
> the code is not reached.

Yeah, I also saw that you're not using the threaded version.

> 
> > GUPnP is *not* thread-safe so you want to ensure that all gio/mainloop work is
> done in the same thread as GUPnP. Otherwise you'll get in a hell of troubles.
> 
> GUPnP is only used from within the same thread (the one the initialization is
> done in). I can not do all GIO stuff in the same thread, because I have
> per-server/connection threads.
> 
> This should not cause trouble, right?

Ehm. Not sure. If you don't use separate main contexts you end up with events in threads where you aren't expecting them.

As Zeeshan wrote, a reduced testcase would be helpful.
Comment 9 Michael Kuhn 2012-01-09 21:50:19 UTC
Created attachment 204908 [details]
Minimal testcase

I managed to produce a minimal testcase. I could not reproduce the exact same deadlock, however -- apparently the timing is very hard to get right. (Fiddling with the g_usleep()s in the code could make it possible to produce the same deadlock, but I gave up after some time.)

The good (well, depends) news is that I was able to produce another deadlock, this time in gupnp_simple_igd_new(). Just extract the attached tarball and do "make" followed by "make run", this produces a deadlock within a few hundred iterations for me.

See the trace:

Thread 1 (Thread 0x2b5041e78120 (LWP 27739))

  • #0 __lll_lock_wait
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S line 136
  • #1 _L_lock_883
    from /lib/x86_64-linux-gnu/libpthread.so.0
  • #2 __pthread_mutex_lock
    at pthread_mutex_lock.c line 61
  • #3 g_static_rec_mutex_lock
    at /build/buildd/glib2.0-2.30.0/./glib/gthread.c line 1450
  • #4 g_type_add_interface_static
    at /build/buildd/glib2.0-2.30.0/./gobject/gtype.c line 2807
  • #5 g_dbus_connection_get_type
    at /build/buildd/glib2.0-2.30.0/./gio/gdbusconnection.c line 452
  • #6 get_uninitialized_connection
    at /build/buildd/glib2.0-2.30.0/./gio/gdbusconnection.c line 6460
  • #7 g_bus_get
    at /build/buildd/glib2.0-2.30.0/./gio/gdbusconnection.c line 6590
  • #8 g_bus_own_name
    at /build/buildd/glib2.0-2.30.0/./gio/gdbusnameowning.c line 621
  • #9 main
    at gobject-deadlock.c line 69

This is more or less the same as the initial deadlock, but this time with g_dbus_connection_get_type().
Comment 10 Jens Georg 2012-01-09 22:53:29 UTC
Thank you.

Changing state, info provided.
Comment 11 Jens Georg 2012-01-15 12:15:10 UTC
I fail to get it locked. What's your version of gupnp-igd?

Also, can you try if you get to a similar situation when you just instanciate two GUPnPControlPoints?
Comment 12 Jens Georg 2012-01-15 12:58:40 UTC
I've the feeling that in the second case GUPnP is just delaying the code execution long enough for the two GDBus functions to collide and is otherwise not involved at all.
Comment 13 Jens Georg 2012-01-17 12:27:41 UTC
Also does it help if you do

GDBusConnection *system_bus;
system_bus = g_bus_get_sync (G_BUS_TYPE_SYSTEM, NULL, NULL);

directly after g_type_init () ?
Comment 14 Akhil Laddha 2012-02-29 08:57:10 UTC
Michael, does comment#13 help you ?
Comment 15 Michael Kuhn 2012-02-29 21:00:11 UTC
Sorry, I didn't have much time in the last few weeks. I will try to test it this weekend and come back to you.
Comment 16 Jens Georg 2012-03-30 10:56:03 UTC
Ok, according to my colleague comment 13 helped him. Can you check if that also works for you? Then it's related to bug 665211
Comment 17 Michael Kuhn 2012-04-08 09:37:09 UTC
Sorry for taking so long. Here are my results:

I recently upgraded GLib and the problem still exists with version 2.32.

Yes, the workaround in comment 13 fixes the problem -- because then the initialization stuff in g_dbus_connection_get_type() is not called from two threads concurrently anymore.

However, since the original deadlock was caused by g_socket_address_get_type() I do not really see the connection to the bug you mentioned.

I think the underlying problem is that the initialization in g_*_get_type() can deadlock if done by two threads concurrently, regardless of the specific type (see comment 0).
Comment 18 Jens Georg 2012-04-20 14:56:08 UTC
Created attachment 212428 [details] [review]
Work-around a deadlock when quickly creating/releasing GDBusConnections
Comment 19 Jens Georg 2012-04-20 14:57:34 UTC
Can you check if that patch helps? I'm not really convinced that this is our bug and we're just triggering something lower here.
Comment 20 Jens Georg 2012-04-20 15:51:37 UTC
Actually I finally managed to get the lock myself and the proposed patch makes it go away. Just not a proper solution.
Comment 21 Jens Georg 2012-04-20 16:14:47 UTC
Created attachment 212437 [details]
GUPnP-free minimal example

Boiled down version of what happens within gupnp. Locks up for me on the first run.
Comment 22 Jens Georg 2012-05-03 14:13:22 UTC
Comment on attachment 212428 [details] [review]
Work-around a deadlock when quickly creating/releasing GDBusConnections

Attachment 212428 [details] pushed as a378c4c - Work-around a deadlock when quickly creating/releasing GDBusConnections
Comment 23 Tobias Mueller 2012-09-24 22:27:16 UTC
So is this FIXED then? Or still NEW because the bug itself is not fixed properly?
Comment 24 Jens Georg 2013-08-18 10:15:47 UTC
I'd consider this fixed