GNOME Bugzilla – Bug 629945
GDBus deadlock in g_bus_get_sync()
Last modified: 2011-03-30 07:02:30 UTC
The problem from bug 627724 that was worked around in commit 0b74058f is still happening. I got a spin (with 100% CPU) on the /gdbus/connection/filter test while distchecking today.
+ Trace 223766
Ugh, need stack traces from all stacks I'm afraid - the trace in comment 0 doesn't show what's racing against what....
s/all stacks/all threads/ ...
(additionally, should really used a condition variable instead of busy-waiting)
i've seen very many crashes from this particular test executable, including a segfault inside libc's malloc(). i think some heap corruption could be to blame for all of this. meanwhile, i got another deadlock. here's two more backtraces from the same test executable. it appears that this happens inside of the fork() when creating the session bus: desrt 6328 0.0 0.0 105452 2504 pts/10 Sl+ 11:52 0:00 | \_ /home/desrt/code/glib/gio/tests/.libs/lt-gdbus-connection desrt 6343 0.0 0.0 29560 480 pts/10 S+ 11:52 0:00 | \_ /home/desrt/code/glib/gio/tests/.libs/lt-gdbus-connection procss 6328 trace:
+ Trace 223862
Thread 2 (Thread 0x7f61155f1710 (LWP 6347))
Should be fixed with http://git.gnome.org/browse/glib/commit/?id=f0b04acfd31b768151a88db3f8d3347f55b2a7b3 The root problem fixed by this commit was that the worker would issue callbacks on an already finalized connection. This wouldn't happen very often, in fact only sporadically. I can now run ./gdbus-connection in a continuous loop without any problems, like this $ while true; do .libs/lt-gdbus-connection || break; done Btw, note that the slow tests (/gdbus/connection/{flush,large_message}) are now in ./gdbus-connection-slow as per commit this commit http://git.gnome.org/browse/glib/commit/?id=1f6a9f1e2d2ebc5f37e7c526344d7aa26cee148d I'm still seeing a few (and extremely very rare) failures such as /gdbus/connection/basic: OK /gdbus/connection/life-cycle: ** ERROR:gdbus-connection.c:223:test_connection_life_cycle: assertion failed: (!quit_mainloop_fired) cleaning up bus with pid 21794 Aborted (core dumped) which I'm investigating right now... (I think this one is due to the test using a 1sec timeout - might not be enough...).
(In reply to comment #5) > I'm still seeing a few (and extremely very rare) failures such as > > /gdbus/connection/basic: OK > /gdbus/connection/life-cycle: ** > ERROR:gdbus-connection.c:223:test_connection_life_cycle: assertion failed: > (!quit_mainloop_fired) > cleaning up bus with pid 21794 > Aborted (core dumped) > > which I'm investigating right now... (I think this one is due to the test using > a 1sec timeout - might not be enough...). Yup, this was easily reproducible by me restarting a google-chrome instance loaded with many tabs. I've now bumped the timeouts to 30 seconds and all is good: http://git.gnome.org/browse/glib/commit/?id=71b1d738e2c0fe900b6bb6672aa464ce74b489b3
Closing this bug for now since it's extremely likely that the lockups was caused by memory corruption caused by GDBusWorker callbacks for already finalized GDBusConnection.
(In reply to comment #5) > I'm still seeing a few (and extremely very rare) failures such as > > /gdbus/connection/basic: OK > /gdbus/connection/life-cycle: ** > ERROR:gdbus-connection.c:223:test_connection_life_cycle: assertion failed: > (!quit_mainloop_fired) > cleaning up bus with pid 21794 > Aborted (core dumped) > > which I'm investigating right now... (I think this one is due to the test using > a 1sec timeout - might not be enough...). Actually this one was because we didn't invoke the GDestroyNotify in the correct context... basically, the test variables could be set while the loop was _not_ running.. so mainloop_quit() on the non-running loop would do nothing.. so we'd end up timing out.. this is of course rare, but it did happen 1 in 1000 or 10000 runs.. see http://git.gnome.org/browse/glib/commit/?id=7036415cc1a32bbd9cc08e516196dbd704f8b5eb for the fix.
*** Bug 637376 has been marked as a duplicate of this bug. ***