Bug 688402 – gmain: Add private API to create Unix child watch that uses waitid()

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 688402 - gmain: Add private API to create Unix child watch that uses waitid()


Summary:	gmain: Add private API to create Unix child watch that uses waitid()


Status:	RESOLVED WONTFIX

Product:	glib
Classification:	Platform
Component:	mainloop
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:	672102

Reported:	2012-11-15 14:33 UTC by Colin Walters
Modified:	2013-02-03 10:37 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
gmain: Add private API to create Unix child watch that uses waitid() (9.92 KB, patch) 2012-11-15 14:34 UTC, Colin Walters	none	Details \| Review

Description Colin Walters 2012-11-15 14:33:53 UTC

Needed by GSubprocess.

Comment 1 Colin Walters 2012-11-15 14:34:45 UTC

Created attachment 229056 [details] [review]
gmain: Add private API to create Unix child watch that uses waitid()

This avoids collecting the zombie child, which means that the PID
can't be reused.  This prevents possible race conditions that might
occur were one to send e.g. SIGTERM to a child.

This race condition has always existed due to the way we called
waitpid() for the app, but the window was widened when we moved the
waitpid() calls into a separate thread.

If waitid() isn't available, we return NULL, and consumers of this
private API (namely, GSubprocess) will need to handle that.

See https://bugzilla.gnome.org/show_bug.cgi?id=672102

Comment 2 Michael Natterer 2012-11-15 14:41:15 UTC

In waitpid_status_from_siginfo(), WIFEXITED() and friends need to be
used on a local variable, because on darwin they can only be used
on an lvalue.

Comment 3 Michael Natterer 2012-11-15 14:43:52 UTC

Hit submit too quickly...

After fixing this locally, it builds but later blocks on resource
generation. Thinking this was something else I disabled tests and
went on bulding gtk and gimp.

GIMP startup then blocks in g_spawn_async_with_pipes(), in a read().

Comment 4 Allison Karlitskaya (desrt) 2012-11-15 15:08:55 UTC

If I understand the problem correctly perhaps another easier approach would be to dispatch the SIGTERM that we send via the GLib worker thread -- that way we could ensure that we didn't already waitpid() the process that we are TERMing...

Comment 5 Allison Karlitskaya (desrt) 2012-11-15 15:16:31 UTC

In more detail, imagine we have two threads:  GLib worker thread and the "control" thread (which has to be the thread both requesting the SIGTERM send and the thread that would receive the child watch callback).

When we go to register the SIGTERM to be sent (in the control thread) we check if we have a pending dispatch for that child watch already and fail to register it if so.  We hold a lock to actually "register" the signal.

Then, from the GLib worker thread, after the SIGTERM has been registered, we have two possibilities:  either the worker thread first sees the SIGTERM as having been registered or it first sees the child as needing to be collected.

In the first case, we send SIGTERM to the pid.  We didn't call waitpid() yet, so no race -- the signal will just go to the zombie.

In the second case we are probably best to check for pending signals to be delivered to processes that we just called waitpid() on and remove them from the table.  Getting the locking right here will be important -- we probably even want to hold the lock while in waitpid() in order to make sure we don't clear a signal that got registered after waitpid() returned (because then in theory it could be intended for a new instance of the pid).

Comment 6 Colin Walters 2012-11-15 20:45:31 UTC

(In reply to comment #5)
> In more detail, imagine we have two threads:  GLib worker thread and the
> "control" thread (which has to be the thread both requesting the SIGTERM send
> and the thread that would receive the child watch callback).
> 
> When we go to register the SIGTERM to be sent (in the control thread) we check
> if we have a pending dispatch for that child watch already and fail to register
> it if so.

I think you had a good idea here, and it can actually be a lot simpler than what you suggested...just hold the unix signal lock, loop over the child watch sources, if we have one, and it hasn't exited, call kill().  Otherwise, don't.

I have a new GSubprocess patch which i'll attach to the original bug, and we can probably just close this one.

Comment 7 Allison Karlitskaya (desrt) 2013-02-03 10:37:16 UTC

Indeed, the latest gsubprocess branch does not take the approach here, so we don't need this API.