After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 629945 - GDBus deadlock in g_bus_get_sync()
GDBus deadlock in g_bus_get_sync()
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gdbus
unspecified
Other All
: Normal normal
: ---
Assigned To: David Zeuthen (not reading bugmail)
gtkdev
: 637376 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2010-09-17 18:05 UTC by Allison Karlitskaya (desrt)
Modified: 2011-03-30 07:02 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Allison Karlitskaya (desrt) 2010-09-17 18:05:33 UTC
The problem from bug 627724 that was worked around in commit 0b74058f is still happening.

I got a spin (with 100% CPU) on the /gdbus/connection/filter test while distchecking today.

  • #0 sched_yield
    from /lib64/libc.so.6
  • #1 _g_dbus_shared_thread_ref
    at gdbusprivate.c line 355
  • #2 _g_dbus_worker_new
    at gdbusprivate.c line 1439
  • #3 initable_init
    at gdbusconnection.c line 2247
  • #4 g_bus_get_sync
    at gdbusconnection.c line 6121
  • #5 test_connection_filter
    at gdbus-connection.c line 815
  • #6 test_case_run
    at gtestutils.c line 1174
  • #7 g_test_run_suite_internal
    at gtestutils.c line 1223
  • #8 g_test_run_suite_internal
    at gtestutils.c line 1233
  • #9 g_test_run_suite_internal
    at gtestutils.c line 1233
  • #10 g_test_run_suite
    at gtestutils.c line 1282
  • #11 main
    at gdbus-connection.c line 1186

Comment 1 David Zeuthen (not reading bugmail) 2010-09-18 14:44:52 UTC
Ugh, need stack traces from all stacks I'm afraid - the trace in comment 0 doesn't show what's racing against what....
Comment 2 David Zeuthen (not reading bugmail) 2010-09-18 14:45:08 UTC
s/all stacks/all threads/ ...
Comment 3 David Zeuthen (not reading bugmail) 2010-09-18 14:45:59 UTC
(additionally, should really used a condition variable instead of busy-waiting)
Comment 4 Allison Karlitskaya (desrt) 2010-09-23 15:59:40 UTC
i've seen very many crashes from this particular test executable, including a segfault inside libc's malloc().  i think some heap corruption could be to blame for all of this.

meanwhile, i got another deadlock.  here's two more backtraces from the same test executable.  it appears that this happens inside of the fork() when creating the session bus:

desrt     6328  0.0  0.0 105452  2504 pts/10   Sl+  11:52   0:00  |   \_ /home/desrt/code/glib/gio/tests/.libs/lt-gdbus-connection
desrt     6343  0.0  0.0  29560   480 pts/10   S+   11:52   0:00  |       \_ /home/desrt/code/glib/gio/tests/.libs/lt-gdbus-connection


procss 6328 trace:

Thread 2 (Thread 0x7f61155f1710 (LWP 6347))

  • #0 __lll_lock_wait
    from /lib64/libpthread.so.0
  • #1 _L_lock_868
    from /lib64/libpthread.so.0
  • #2 pthread_mutex_lock
    from /lib64/libpthread.so.0
  • #3 g_source_attach
    at gmain.c line 911
  • #4 schedule_callbacks
    at gdbusconnection.c line 3351
  • #5 distribute_signals
    at gdbusconnection.c line 3387
  • #6 on_worker_message_received
    at gdbusconnection.c line 2031
  • #7 _g_dbus_worker_emit_message_received
    at gdbusprivate.c line 509
  • #8 _g_dbus_worker_queue_or_deliver_received_message
    at gdbusprivate.c line 537
  • #9 _g_dbus_worker_do_read_cb
    at gdbusprivate.c line 788
  • #10 complete_in_idle_cb
    at gsimpleasyncresult.c line 702
  • #11 g_main_dispatch
    at gmain.c line 2149
  • #12 g_main_context_dispatch
    at gmain.c line 2702
  • #13 g_main_context_iterate
    at gmain.c line 2780
  • #14 g_main_loop_run
    at gmain.c line 2988
  • #15 gdbus_shared_thread_func
    at gdbusprivate.c line 277
  • #16 g_thread_create_proxy
    at gthread.c line 1897
  • #17 start_thread
    from /lib64/libpthread.so.0
  • #18 clone
    from /lib64/libc.so.6

Comment 5 David Zeuthen (not reading bugmail) 2010-09-23 19:58:50 UTC
Should be fixed with

 http://git.gnome.org/browse/glib/commit/?id=f0b04acfd31b768151a88db3f8d3347f55b2a7b3

The root problem fixed by this commit was that the worker would issue callbacks on an already finalized connection. This wouldn't happen very often, in fact only sporadically. I can now run ./gdbus-connection in a continuous loop without any problems, like this

 $ while true; do .libs/lt-gdbus-connection || break; done

Btw, note that the slow tests (/gdbus/connection/{flush,large_message}) are now in ./gdbus-connection-slow as per commit this commit

 http://git.gnome.org/browse/glib/commit/?id=1f6a9f1e2d2ebc5f37e7c526344d7aa26cee148d

I'm still seeing a few (and extremely very rare) failures such as

 /gdbus/connection/basic: OK
 /gdbus/connection/life-cycle: **
ERROR:gdbus-connection.c:223:test_connection_life_cycle: assertion failed: (!quit_mainloop_fired)
 cleaning up bus with pid 21794
 Aborted (core dumped)

which I'm investigating right now... (I think this one is due to the test using a 1sec timeout - might not be enough...).
Comment 6 David Zeuthen (not reading bugmail) 2010-09-23 20:33:15 UTC
(In reply to comment #5)
> I'm still seeing a few (and extremely very rare) failures such as
> 
>  /gdbus/connection/basic: OK
>  /gdbus/connection/life-cycle: **
> ERROR:gdbus-connection.c:223:test_connection_life_cycle: assertion failed:
> (!quit_mainloop_fired)
>  cleaning up bus with pid 21794
>  Aborted (core dumped)
> 
> which I'm investigating right now... (I think this one is due to the test using
> a 1sec timeout - might not be enough...).

Yup, this was easily reproducible by me restarting a google-chrome instance loaded with many tabs. I've now bumped the timeouts to 30 seconds and all is good:

http://git.gnome.org/browse/glib/commit/?id=71b1d738e2c0fe900b6bb6672aa464ce74b489b3
Comment 7 David Zeuthen (not reading bugmail) 2010-09-23 20:35:06 UTC
Closing this bug for now since it's extremely likely that the lockups was caused by memory corruption caused by GDBusWorker callbacks for already finalized GDBusConnection.
Comment 8 David Zeuthen (not reading bugmail) 2010-09-23 21:38:23 UTC
(In reply to comment #5)
> I'm still seeing a few (and extremely very rare) failures such as
> 
>  /gdbus/connection/basic: OK
>  /gdbus/connection/life-cycle: **
> ERROR:gdbus-connection.c:223:test_connection_life_cycle: assertion failed:
> (!quit_mainloop_fired)
>  cleaning up bus with pid 21794
>  Aborted (core dumped)
> 
> which I'm investigating right now... (I think this one is due to the test using
> a 1sec timeout - might not be enough...).

Actually this one was because we didn't invoke the GDestroyNotify in the correct context... basically, the test variables could be set while the loop was _not_ running.. so mainloop_quit() on the non-running loop would do nothing.. so we'd end up timing out.. this is of course rare, but it did happen 1 in 1000 or 10000 runs.. see

 http://git.gnome.org/browse/glib/commit/?id=7036415cc1a32bbd9cc08e516196dbd704f8b5eb

for the fix.
Comment 9 Milan Crha 2011-03-30 07:02:30 UTC
*** Bug 637376 has been marked as a duplicate of this bug. ***