GNOME Bugzilla – Bug 666129
the testsuit is hanging in gdbus tests
Last modified: 2014-01-20 13:18:33 UTC
The issue happens on the ubuntu builders (which use an older kernel version), since glib 2.30 the testsuit started to hang regularly, usually retries worked but it seems to hang in a consistant way since the 2.31.2 to 2.31.4 update The hanging tests are gdbus codegen-peer-to-peer and delivery-in-thread Stacktrace of an hang: "Thread 2 (Thread 0x40743b70 (LWP 13956)): #0 0x40020410 in __kernel_vsyscall () #1 0x4038d3ee in poll () from /lib/i386-linux-gnu/libc.so.6 #2 0x4007705b in g_poll (fds=0x8065530, nfds=3, timeout=-1) at /build/buildd/glib2.0-2.31.4.tested/./glib/gpoll.c:132 #3 0x4006961e in g_main_context_poll (n_fds=3, fds=0x8065530, timeout=<optimized out>, context=0x80615b8, priority=<optimized out>) at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3415 #4 g_main_context_iterate (dispatch=1, block=1074229296, context=0x80615b8, self=<optimized out>) at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3116 #5 g_main_context_iterate (context=0x80615b8, block=1074229296, dispatch=1, self=<optimized out>) at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3058 #6 0x40069adb in g_main_loop_run (loop=0x8061668) at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3315 #7 0x4023e26a in gdbus_shared_thread_func (user_data=0x80615a0) at /build/buildd/glib2.0-2.31.4.tested/./gio/gdbusprivate.c:276 #8 0x4008c863 in g_thread_proxy (data=0x805c350) at /build/buildd/glib2.0-2.31.4.tested/./glib/gthread.c:801 #9 0x40489d31 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0 #10 0x4039c0de in clone () from /lib/i386-linux-gnu/libc.so.6 ---Type <return> to continue, or q <return> to quit--- Backtrace stopped: Not enough registers or memory available to unwind further Thread 1 (Thread 0x405063f0 (LWP 13932)): #0 0x40020410 in __kernel_vsyscall () #1 0x4038d3ee in poll () from /lib/i386-linux-gnu/libc.so.6 #2 0x4007705b in g_poll (fds=0x80674c8, nfds=1, timeout=-1) at /build/buildd/glib2.0-2.31.4.tested/./glib/gpoll.c:132 #3 0x4006961e in g_main_context_poll (n_fds=1, fds=0x80674c8, timeout=<optimized out>, context=0x8056c48, priority=<optimized out>) at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3415 #4 g_main_context_iterate (dispatch=1, block=1074229296, context=0x8056c48, self=<optimized out>) at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3116 #5 g_main_context_iterate (context=0x8056c48, block=1074229296, dispatch=1, self=<optimized out>) at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3058 #6 0x40069adb in g_main_loop_run (loop=0x8057138) at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3315 #7 0x08049d3f in test_delivery_in_thread () at /build/buildd/glib2.0-2.31.4.tested/./gio/tests/gdbus-threading.c:239 #8 0x4008b5f0 in test_case_run (tc=0x8057720) at /build/buildd/glib2.0-2.31.4.tested/./glib/gtestutils.c:1612 #9 g_test_run_suite_internal (suite=0x8056f50, path=0x400d989e "")"
the kernel used on this builder is a 2.6.24 version
Please also attach the debug output. It looks this way for me: $ ./gdbus-threading /gdbus/delivery-in-thread: OK /gdbus/method-calls-in-threadlso please run the test case with G_DBUS_DEBUG=all in the environment ... WARNING: it will generate a ton of output on stdout (ca. 45,000 lines) but should still succeed.
The builder log at the hang time is: "TEST: gdbus-peer... (pid=26946) /gdbus/peer-to-peer: OK /gdbus/delayed-message-processing: OK /gdbus/nonce-tcp: OK /gdbus/tcp-anonymous: OK /gdbus/credentials: OK /gdbus/overflow: OK /gdbus/codegen-peer-to-peer: " it hangs there until being stopped
Created attachment 203407 [details] test debug log the log is from a G_DBUS_DEBUG=all run of gdbus-threading in a builder environment which was hanging...
in gdbus-threading, we see this: static gpointer test_delivery_in_thread_func (gpointer _data) { ... g_main_loop_quit (loop); return NULL; } static void test_delivery_in_thread (void) { GThread *thread; thread = g_thread_new ("deliver", test_delivery_in_thread_func, NULL); /* run the event loop - it is needed to dispatch D-Bus messages */ g_main_loop_run (loop); g_thread_join (thread); } not cool.
I'm guessing that the old kernel was somewhat more likely to allow the created thread to execute to completion before returning control to the main thread than new kernels are...
Created attachment 203416 [details] [review] gdbus tests: remove buggy use of GMainLoop g_main_loop_quit() only quits mainloops that are currently running -- not ones that may run in the future. The way the gdbus-threading tests are written can possibly result in a call to g_main_loop_quit() before g_main_loop_run() has started. The mainloops aren't actually used for anything other than signalling the completion of the threads, so just use g_thread_join() for that.
Many other GDBus testcases use GMainLoop in the same broken way... this is probably entirely the cause of all the reported deadlocks we've been hearing about.
/gdbus/codegen-peer-to-peer sits spinning in the while loop at the top of codegen_test_peer() because codegen_service_thread_func() creates its main loop (service_loop) which then gets NULL'd out by the main thread here: service_thread = g_thread_new ("codegen_test_peer", codegen_service_thread_func, NULL); service_loop = NULL; while (service_loop == NULL) g_thread_yield ();
Created attachment 203472 [details] [review] Initialize service_loop before running the service thread This seems to fix the hang in /gdbus/codegen-peer-to-peer
Comment on attachment 203416 [details] [review] gdbus tests: remove buggy use of GMainLoop Attachment 203416 [details] pushed as 0a7930d - gdbus tests: remove buggy use of GMainLoop
The following fix has been pushed: 162bafe Initialize service_loop before running the service thread
Created attachment 203547 [details] [review] Initialize service_loop before running the service thread Bug:
This got closed by accident and we've yet to go through all the testcases systematically ensuring that they're safe, so I'll reopen it.
nothing more happened here, so lets close it after all.