After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 666129 - the testsuit is hanging in gdbus tests
the testsuit is hanging in gdbus tests
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: build
2.31.x
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2011-12-13 21:29 UTC by Sebastien Bacher
Modified: 2014-01-20 13:18 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
test debug log (26.98 KB, text/plain)
2011-12-13 22:25 UTC, Sebastien Bacher
  Details
gdbus tests: remove buggy use of GMainLoop (3.86 KB, patch)
2011-12-14 01:56 UTC, Allison Karlitskaya (desrt)
committed Details | Review
Initialize service_loop before running the service thread (1012 bytes, patch)
2011-12-14 14:04 UTC, Chris Coulson
none Details | Review
Initialize service_loop before running the service thread (1010 bytes, patch)
2011-12-15 02:25 UTC, Matthias Clasen
committed Details | Review

Description Sebastien Bacher 2011-12-13 21:29:56 UTC
The issue happens on the ubuntu builders (which use an older kernel version), since glib 2.30 the testsuit started to hang regularly, usually retries worked but it seems to hang in a consistant way since the 2.31.2 to 2.31.4 update

The hanging tests are gdbus codegen-peer-to-peer and delivery-in-thread

Stacktrace of an hang:

"Thread 2 (Thread 0x40743b70 (LWP 13956)):
 #0  0x40020410 in __kernel_vsyscall ()
 #1  0x4038d3ee in poll () from /lib/i386-linux-gnu/libc.so.6
 #2  0x4007705b in g_poll (fds=0x8065530, nfds=3, timeout=-1)
     at /build/buildd/glib2.0-2.31.4.tested/./glib/gpoll.c:132
 #3  0x4006961e in g_main_context_poll (n_fds=3, fds=0x8065530, 
     timeout=<optimized out>, context=0x80615b8, priority=<optimized out>)
     at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3415
 #4  g_main_context_iterate (dispatch=1, block=1074229296, context=0x80615b8, 
     self=<optimized out>)
     at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3116
 #5  g_main_context_iterate (context=0x80615b8, block=1074229296, dispatch=1, 
     self=<optimized out>)
     at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3058
 #6  0x40069adb in g_main_loop_run (loop=0x8061668)
     at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3315
 #7  0x4023e26a in gdbus_shared_thread_func (user_data=0x80615a0)
     at /build/buildd/glib2.0-2.31.4.tested/./gio/gdbusprivate.c:276
 #8  0x4008c863 in g_thread_proxy (data=0x805c350)
     at /build/buildd/glib2.0-2.31.4.tested/./glib/gthread.c:801
 #9  0x40489d31 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
 #10 0x4039c0de in clone () from /lib/i386-linux-gnu/libc.so.6
 ---Type <return> to continue, or q <return> to quit---
 Backtrace stopped: Not enough registers or memory available to unwind further
 Thread 1 (Thread 0x405063f0 (LWP 13932)):
 #0  0x40020410 in __kernel_vsyscall ()
 #1  0x4038d3ee in poll () from /lib/i386-linux-gnu/libc.so.6
 #2  0x4007705b in g_poll (fds=0x80674c8, nfds=1, timeout=-1)
     at /build/buildd/glib2.0-2.31.4.tested/./glib/gpoll.c:132
 #3  0x4006961e in g_main_context_poll (n_fds=1, fds=0x80674c8, 
     timeout=<optimized out>, context=0x8056c48, priority=<optimized out>)
     at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3415
 #4  g_main_context_iterate (dispatch=1, block=1074229296, context=0x8056c48, 
     self=<optimized out>)
     at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3116
 #5  g_main_context_iterate (context=0x8056c48, block=1074229296, dispatch=1, 
     self=<optimized out>)
     at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3058
 #6  0x40069adb in g_main_loop_run (loop=0x8057138)
     at /build/buildd/glib2.0-2.31.4.tested/./glib/gmain.c:3315
 #7  0x08049d3f in test_delivery_in_thread ()
     at /build/buildd/glib2.0-2.31.4.tested/./gio/tests/gdbus-threading.c:239
 #8  0x4008b5f0 in test_case_run (tc=0x8057720)
     at /build/buildd/glib2.0-2.31.4.tested/./glib/gtestutils.c:1612
 #9  g_test_run_suite_internal (suite=0x8056f50, path=0x400d989e "")"
Comment 1 Sebastien Bacher 2011-12-13 21:31:09 UTC
the kernel used on this builder is a 2.6.24 version
Comment 2 David Zeuthen (not reading bugmail) 2011-12-13 21:37:16 UTC
Please also attach the debug output. It looks this way for me:

$ ./gdbus-threading 
/gdbus/delivery-in-thread: OK
/gdbus/method-calls-in-thread: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS OK


Also please run the test case with G_DBUS_DEBUG=all in the environment ... WARNING: it will generate a ton of output on stdout (ca. 45,000 lines) but should still succeed.
Comment 3 Sebastien Bacher 2011-12-13 22:23:56 UTC
The builder log at the hang time is:

"TEST: gdbus-peer... (pid=26946)
  /gdbus/peer-to-peer:                                                 OK
  /gdbus/delayed-message-processing:                                   OK
  /gdbus/nonce-tcp:                                                    OK
  /gdbus/tcp-anonymous:                                                OK
  /gdbus/credentials:                                                  OK
  /gdbus/overflow:                                                     OK
  /gdbus/codegen-peer-to-peer: "

it hangs there until being stopped
Comment 4 Sebastien Bacher 2011-12-13 22:25:48 UTC
Created attachment 203407 [details]
test debug log

the log is from a G_DBUS_DEBUG=all run of gdbus-threading in a builder environment which was hanging...
Comment 5 Allison Karlitskaya (desrt) 2011-12-14 01:39:28 UTC
in gdbus-threading, we see this:

static gpointer
test_delivery_in_thread_func (gpointer _data)
{
  ...


  g_main_loop_quit (loop);

  return NULL;
}

static void
test_delivery_in_thread (void)
{
  GThread *thread;

  thread = g_thread_new ("deliver",
                         test_delivery_in_thread_func,
                         NULL);

  /* run the event loop - it is needed to dispatch D-Bus messages */
  g_main_loop_run (loop);

  g_thread_join (thread);
}


not cool.
Comment 6 Allison Karlitskaya (desrt) 2011-12-14 01:42:06 UTC
I'm guessing that the old kernel was somewhat more likely to allow the created thread to execute to completion before returning control to the main thread than new kernels are...
Comment 7 Allison Karlitskaya (desrt) 2011-12-14 01:56:07 UTC
Created attachment 203416 [details] [review]
gdbus tests: remove buggy use of GMainLoop

g_main_loop_quit() only quits mainloops that are currently running --
not ones that may run in the future.  The way the gdbus-threading tests
are written can possibly result in a call to g_main_loop_quit() before
g_main_loop_run() has started.

The mainloops aren't actually used for anything other than signalling
the completion of the threads, so just use g_thread_join() for that.
Comment 8 Allison Karlitskaya (desrt) 2011-12-14 01:59:24 UTC
Many other GDBus testcases use GMainLoop in the same broken way... this is probably entirely the cause of all the reported deadlocks we've been hearing about.
Comment 9 Chris Coulson 2011-12-14 13:52:56 UTC
/gdbus/codegen-peer-to-peer sits spinning in the while loop at the top of codegen_test_peer() because codegen_service_thread_func() creates its main loop (service_loop) which then gets NULL'd out by the main thread here:

  service_thread = g_thread_new ("codegen_test_peer",
                                 codegen_service_thread_func,
                                 NULL);
  service_loop = NULL;
  while (service_loop == NULL)
    g_thread_yield ();
Comment 10 Chris Coulson 2011-12-14 14:04:30 UTC
Created attachment 203472 [details] [review]
Initialize service_loop before running the service thread

This seems to fix the hang in /gdbus/codegen-peer-to-peer
Comment 11 Allison Karlitskaya (desrt) 2011-12-14 14:57:07 UTC
Comment on attachment 203416 [details] [review]
gdbus tests: remove buggy use of GMainLoop

Attachment 203416 [details] pushed as 0a7930d - gdbus tests: remove buggy use of GMainLoop
Comment 12 Matthias Clasen 2011-12-15 02:25:03 UTC
The following fix has been pushed:
162bafe Initialize service_loop before running the service thread
Comment 13 Matthias Clasen 2011-12-15 02:25:07 UTC
Created attachment 203547 [details] [review]
Initialize service_loop before running the service thread

Bug:
Comment 14 Allison Karlitskaya (desrt) 2011-12-19 18:17:04 UTC
This got closed by accident and we've yet to go through all the testcases systematically ensuring that they're safe, so I'll reopen it.
Comment 15 Matthias Clasen 2014-01-20 13:18:33 UTC
nothing more happened here, so lets close it after all.