After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 578295 - gtester has a race condition
gtester has a race condition
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
: 572861 602782 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2009-04-07 20:02 UTC by Allison Karlitskaya (desrt)
Modified: 2010-08-08 00:02 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Allison Karlitskaya (desrt) 2009-04-07 20:02:45 UTC
every now and then gtester will randomly fail to exit.

expected output would be something like

--------------------
TEST: 1bit-mutex... (pid=27615)
  /glib/1bit-mutex:                                                    OK
PASS: 1bit-mutex
--------------------

but you see this instead:

--------------------
TEST: 1bit-mutex... (pid=27750)
  /glib/1bit-mutex:                                                    OK
--------------------

and it hangs forever.

'ps f' shows:

27732 pts/4    S+     0:00 [snip] \_ make check
27733 pts/4    S+     0:00 [snip]     \_ make check-local
27734 pts/4    S+     0:00 [snip]         \_ /bin/bash -c test -z "1bit-mutex" || ../../glib/gtester??? --verbose 1bit-mutex
27735 pts/4    S+     0:00 [snip]             \_ /home/desrt/code/glib/glib-2.20.1/_build/glib/.libs/lt-gtester --verbose 1bit-mutex
27750 pts/4    Z+     0:07 [snip]                 \_ [lt-1bit-mutex] <defunct>

so clearly, the lt-1bit-mutex process has quit and the gtester process missed the SIGCHLD.

here's a backtrace of what gtester is doing:

  • #0 __kernel_vsyscall
  • #1 poll
    from /lib/tls/i686/cmov/libc.so.6
  • #2 IA__g_poll
    at ../../glib/gpoll.c line 127
  • #3 g_main_context_iterate
    at ../../glib/gmain.c line 2761
  • #4 IA__g_main_context_iteration
    at ../../glib/gmain.c line 2511
  • #5 main
    at ../../glib/gtester.c line 404

notice that it's watching no fds at all.  that's just not cool.  there's no way that this process could possibly wake up except by receiving a signal.

if the SIGCHLD comes -just- before the poll syscall is made then it won't wake up the poll and there will be no way to not do the poll (since once the signal handler returns we could already have given control over to libc poll() call).
Comment 1 Allison Karlitskaya (desrt) 2009-04-07 20:04:42 UTC
one more note, though:  i decided that maybe i could wake the process by resizing the terminal window (SIGWINCH) but that's actually not enough to wake it anymore either so it's not merely a case of the SIGCHLD handler failing to wake up the poll() but a matter of the SIGCHLD handler not running at all....
Comment 2 Colin Walters 2009-04-07 20:15:45 UTC
Sounds like a generic race condition, probably not hit very often outside of things like gtester because most people are using the glib main loop which will have other sources set up.

signalfd to the rescue?

http://www.kernel.org/doc/man-pages/online/pages/man2/signalfd.2.html
Comment 3 Emilio Pozuelo Monfort 2010-03-13 00:13:54 UTC
Looks similar to bug 572861
Comment 4 Emilio Pozuelo Monfort 2010-03-15 18:10:16 UTC
Can't we just launch tests synchronously and avoid this issue altogether? The race condition may still exist but that would avoid it in gtester and I can't see why it needs to launch tests asynchronously.

With respect to the race condition, there's bug 398418 tracking it. Let's keep this bug open in case we want to switch to synchronous forks. If not, we can just dup it.
Comment 5 Emilio Pozuelo Monfort 2010-04-20 13:35:50 UTC
Bug 572861 and this one are the same, but I can't mark bugs as duplicates...
Comment 6 Emilio Pozuelo Monfort 2010-04-20 13:38:23 UTC
After reading bug 572861, I thought about another workaround, which works: call g_thread_init(NULL) in gtester's main function. However that requires gtester to link to libgthread, which seems to need changes in the build system (maybe move gtester from glib/ to a new tools/), but that's probably too much for a workaround :)
Comment 7 Christian Dywan 2010-04-21 12:03:49 UTC
*** Bug 572861 has been marked as a duplicate of this bug. ***
Comment 8 Allison Karlitskaya (desrt) 2010-05-28 18:19:11 UTC
*** Bug 602782 has been marked as a duplicate of this bug. ***
Comment 9 Allison Karlitskaya (desrt) 2010-08-07 13:45:03 UTC
The link-to-gthread solution is the easiest, but unfortunately, it's vaguely impossible.

We need glib/ to be compiled before gtester builds, but we also need gtester to be built before entering glib/tests/.  Therefore gtester needs to be in glib/.

We need glib/ to build gthread/, so glib/ has to come before it.

So unless we have a two-pass build system, or do some very serious shake-ups, we can't link gtester to libgthread.

I think the best way to fix this is to enable threads without enabling threads.

ie: get the mainloop to think that we've switched threads on, but without requiring linking to libgthread.
Comment 10 Tommi Komulainen 2010-08-07 20:19:50 UTC
As I mentioned in bug 572861 adding a dummy timeout in gtester.c main avoids this, and doesn't require threads. I don't quite understand why it works though.

Why is the child watch not working the same way as main context wake_up_pipe (sigchild handler always writing to a pipe waking up main context) but instead does odd looking special casing between single and multiple threads?
Comment 11 Allison Karlitskaya (desrt) 2010-08-07 22:12:49 UTC
Tommi: this is my plan for now.

The problem with the wake-up pipe is that it is only enabled in threaded situations.
Comment 12 Allison Karlitskaya (desrt) 2010-08-08 00:02:44 UTC
Okay.  Did that.

Of course, we should fix this properly.  See bug #398418