GNOME Bugzilla – Bug 688402
gmain: Add private API to create Unix child watch that uses waitid()
Last modified: 2013-02-03 10:37:16 UTC
Needed by GSubprocess.
Created attachment 229056 [details] [review] gmain: Add private API to create Unix child watch that uses waitid() This avoids collecting the zombie child, which means that the PID can't be reused. This prevents possible race conditions that might occur were one to send e.g. SIGTERM to a child. This race condition has always existed due to the way we called waitpid() for the app, but the window was widened when we moved the waitpid() calls into a separate thread. If waitid() isn't available, we return NULL, and consumers of this private API (namely, GSubprocess) will need to handle that. See https://bugzilla.gnome.org/show_bug.cgi?id=672102
In waitpid_status_from_siginfo(), WIFEXITED() and friends need to be used on a local variable, because on darwin they can only be used on an lvalue.
Hit submit too quickly... After fixing this locally, it builds but later blocks on resource generation. Thinking this was something else I disabled tests and went on bulding gtk and gimp. GIMP startup then blocks in g_spawn_async_with_pipes(), in a read().
If I understand the problem correctly perhaps another easier approach would be to dispatch the SIGTERM that we send via the GLib worker thread -- that way we could ensure that we didn't already waitpid() the process that we are TERMing...
In more detail, imagine we have two threads: GLib worker thread and the "control" thread (which has to be the thread both requesting the SIGTERM send and the thread that would receive the child watch callback). When we go to register the SIGTERM to be sent (in the control thread) we check if we have a pending dispatch for that child watch already and fail to register it if so. We hold a lock to actually "register" the signal. Then, from the GLib worker thread, after the SIGTERM has been registered, we have two possibilities: either the worker thread first sees the SIGTERM as having been registered or it first sees the child as needing to be collected. In the first case, we send SIGTERM to the pid. We didn't call waitpid() yet, so no race -- the signal will just go to the zombie. In the second case we are probably best to check for pending signals to be delivered to processes that we just called waitpid() on and remove them from the table. Getting the locking right here will be important -- we probably even want to hold the lock while in waitpid() in order to make sure we don't clear a signal that got registered after waitpid() returned (because then in theory it could be intended for a new instance of the pid).
(In reply to comment #5) > In more detail, imagine we have two threads: GLib worker thread and the > "control" thread (which has to be the thread both requesting the SIGTERM send > and the thread that would receive the child watch callback). > > When we go to register the SIGTERM to be sent (in the control thread) we check > if we have a pending dispatch for that child watch already and fail to register > it if so. I think you had a good idea here, and it can actually be a lot simpler than what you suggested...just hold the unix signal lock, loop over the child watch sources, if we have one, and it hasn't exited, call kill(). Otherwise, don't. I have a new GSubprocess patch which i'll attach to the original bug, and we can probably just close this one.
Indeed, the latest gsubprocess branch does not take the approach here, so we don't need this API.