After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 398418 - GChildWatch race condition?
GChildWatch race condition?
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: mainloop
unspecified
Other All
: Normal major
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2007-01-19 17:00 UTC by Marc-Andre Lureau
Modified: 2011-09-10 00:39 UTC
See Also:
GNOME target: ---
GNOME version: Unversioned Enhancement


Attachments
g_spawn_async_with_pipes_full (12.98 KB, patch)
2007-01-19 17:04 UTC, Marc-Andre Lureau
none Details | Review
simple test case, fork, add a chidwatcher callback to prove the race (642 bytes, text/plain)
2007-01-22 17:21 UTC, Marc-Andre Lureau
  Details
[gmain] Avoid race condition with child watching (7.91 KB, patch)
2010-05-28 23:19 UTC, Colin Walters
none Details | Review

Description Marc-Andre Lureau 2007-01-19 17:00:10 UTC
g_spawn_async_with_pipes return the PID of the child process, which can be watch by GChildWatchFunc from gmain.h.

Unfortunately, in non-multithreaded applications, the SIGCHLD event is received by a (posix) signal handler. But it can happens that the process already died when the prepare function (that set the handler) is called. Furthermore, for different reasons, using signal handler is not reliable. It is widely accepted that a helper process should waitpid() and forward the SIGCHLD event (or other child signals) to the parent process (see "babysit" is dbus for example).

I suggest that we add a *status fd to g_spawn_async... And forward status report with the help of the intermediate child (see the following patch).

This way, the parent process could then add an io_channel watcher and get asynchronous child status report reliably.
Comment 1 Marc-Andre Lureau 2007-01-19 17:04:25 UTC
Created attachment 80712 [details] [review]
g_spawn_async_with_pipes_full

This patch 
- introduces a new function: g_spawn_async_with_pipes_full
- modify fork_exec_with_pipes to waitpid() of the grandchild in intermediate_child and write exit status to the parent pipe.

IS A REQUEST FOR COMMENTS, IT HAS NOT BEEN TESTED
Comment 2 Marc-Andre Lureau 2007-01-19 18:01:01 UTC
this is maybe not a good idea after all. that is probably a bug in my signal handler or something. I'll reopen the bug if necessary. ;)
Comment 3 Owen Taylor 2007-01-19 19:05:38 UTC
Are you perhaps just forgetting to pass the DO_NOT_REAP_CHILD flag to 
g_spawn...()? It was a while ago, but we spent a lot of time considering
possible race conditions for child watches, and I believe they should
be quite reliable.

Comment 4 Marc-Andre Lureau 2007-01-20 14:59:39 UTC
Yes they are reliable :) Sorry for the confusion, it was an error in my code ;)... Anyway, Glib does not provide a way to install the SIGCHLD signal handler before we fork and add the GChildWatchSource with resulting PID.

beside, it is easy to have side effects/interactions with external code: if another signal handler is installed, then the GChildSource will never be waken up.
(think of a possible process instance sharing with qt for example, thanks to gmainloop support in qt:)

I suggest either that we augment the API to install the handler before we fork/spawn and/or update the docs to prevent possible errors.

I don't know why DBus and other projects *have to use* an intermediate process (it's not exactly for reaping only). I was inspired by such code to extend the intermediate process we use to reap the grandchild in glib fork code. Thus, we have a reliable way to get the child status instead of waiting/using nasty signals :). That would change the way gchildsource should be defined... I don't have the time/needs to discuss such change ..

I closed the bug because it should be filled differently I guess.
Comment 5 Marc-Andre Lureau 2007-01-22 15:05:33 UTC
Actually, I should add one comment for the record. J. Hedberg found the following race condition:

In single thread mode, if the signal is handled just before the poll() is entered, then the mainloop will not be waken up (the child watcher will be notified when the poll will wake up for another reason) - the current signal handler just increment a variable.

One way to fix it would be to do the same trick with pipe in MT: signal handler should write on a fd, and poll should include the fd. This way, there is no race condition. 
Comment 6 Owen Taylor 2007-01-22 15:31:32 UTC
> Anyway, Glib does not provide a way to install the SIGCHLD signal handler
> before we fork and add the GChildWatchSource with resulting PID.

There is NO race condition from that, though admittedly it's fairly subtle:
note that GChildWatchSource.count starts at 0, while child_watch_count starts
at 1, so we *always* waitpid() the child once after adding the signal 
handler. If the child exited before the handler was installed, that initial
waitpid() will catch the child exiting.

> In single thread mode, if the signal is handled just before the poll() is
> entered, then the mainloop will not be waken up (the child watcher will be
> notified when the poll will wake up for another reason) - the current signal
> handler just increment a variable.

Hmm, I can't believe that we missed that race condition when writing the
code ... we went through a *lot* of thinking, testing, and iteration,
but I can't think of a flaw in your logic :-(

Can you write a test case that demonstrates the problem? It should be
easy to force the race with a custom source that sleeps in it's prepare()
function.
Comment 7 Marc-Andre Lureau 2007-01-22 17:19:32 UTC
Confirmed! There is definetely a race condition.

 

Index: gmain.c
===================================================================
--- gmain.c     (revision 5305)
+++ gmain.c     (working copy)
@@ -3614,6 +3614,9 @@ check_for_child_exited (GSource *source)
       child_watch_source->count = count;
     }
 
+  g_printf ("sleeping a bit\n");
+  g_usleep (10 * G_USEC_PER_SEC);
+  g_printf ("I am not tired anymore, and noticed nothing\n");
   return child_watch_source->child_exited;
 }


Kill the process while the prepare function is sleeping, you'll see. The callback is not called!...

(test case attached)
Comment 8 Marc-Andre Lureau 2007-01-22 17:21:25 UTC
Created attachment 80907 [details]
simple test case, fork, add a chidwatcher callback to prove the race

you have to kill the child process while the prepare() is sleeping to produce the race
Comment 9 Marc-Andre Lureau 2007-01-29 22:43:37 UTC
I take the liberty to change the status/description of the bug. I do not have the time to provide a fix unfortunately. 
Comment 10 Emilio Pozuelo Monfort 2010-03-15 18:08:08 UTC
This is a big issue for gtester, see bug 578295.
Comment 11 Colin Walters 2010-05-28 18:23:46 UTC
This bug is obsolete now that we've turned on threading by default in glib, correct?
Comment 12 Emilio Pozuelo Monfort 2010-05-28 18:28:06 UTC
I think so.
Comment 13 Colin Walters 2010-05-28 20:40:12 UTC
Executive summary from IRC discussion:

Bug still exists and is valid for processes which only link against glib, not gobject.

Suggested resolution: Make the current pipe trick used in the multithreaded case the default, and remove the single-threaded "optimization".
Comment 14 Colin Walters 2010-05-28 23:19:14 UTC
Created attachment 162244 [details] [review]
[gmain] Avoid race condition with child watching

Marc-Andre Lureau writes:
In single thread mode, if the signal is handled just before the poll() is
entered, then the mainloop will not be waken up (the child watcher will be
notified when the poll will wake up for another reason) - the current signal
handler just increment a variable.

This is a proof-of-concept patch which simply removes the single-threaded
workaround.  This requires consumers of g_child_watch to have called
g_thread_init(), which is slightly backwards incompatible.

A better fix here would be to require a main context fd, and have e.g.
the 'B' byte when written on it cause the main loop to iterate over
all of its children and call waitpid().  (We could also try writing the
pid over the pipe).
Comment 15 Emilio Pozuelo Monfort 2010-05-31 09:51:06 UTC
Colin,

I tried to call g_thread_init() from gtester and linking it to libgthread-2.0.la a while ago to workaround the race condition but it didn't work as glib was being built before gthread IIRC. If that's correct, I guess moving gtester from glib/ to e.g. tools/ would solve it.

Not sure if the compat break is acceptable though.
Comment 16 Allison Karlitskaya (desrt) 2010-08-08 00:04:28 UTC
I just closed bug 578295 with a workaround.

Leaving this one open so we can one day solve it properly. :)
Comment 17 Colin Walters 2011-04-27 22:00:55 UTC
So, using signalfd() would actually fix this bug (and allow us to lose a helper thread), which is fairly compelling.  That would leave this bug to only apply to non-Linux UNIX, which I don't care about myself.
Comment 18 Allison Karlitskaya (desrt) 2011-09-10 00:39:01 UTC
The single-threaded case of GMainLoop is gone so this bug is fixed.