After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 154827 - g_child_watch_add doesn't work if target process has already become zombie
g_child_watch_add doesn't work if target process has already become zombie
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
2.4.x
Other Linux
: Normal major
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks: 150251 157195
 
 
Reported: 2004-10-07 21:18 UTC by Gustavo Carneiro
Modified: 2011-02-18 16:09 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
test case (662 bytes, text/plain)
2004-10-07 21:20 UTC, Gustavo Carneiro
  Details
Patch to fix test case in unix (950 bytes, patch)
2004-10-30 12:15 UTC, Gustavo Carneiro
none Details | Review

Description Gustavo Carneiro 2004-10-07 21:18:40 UTC
g_child_watch_add doesn't work if the child process being monitored terminates
too quickly.  Or maybe this could be 'if the process no longer exists'. I don't
know exactly in which order the events occur...
I have a test case ;-)
Comment 1 Gustavo Carneiro 2004-10-07 21:20:38 UTC
Created attachment 32368 [details]
test case

The test case demonstrates what happens: some time the notification occurs,
sometimes it doesn't.  If I change the program to {"/bin/sleep", "1", NULL},
then it always works.
Comment 2 Gustavo Carneiro 2004-10-08 13:55:59 UTC
My tests in pygtk are telling me that the problem is not directly the speed of
termination of the child process.

I did a test (sorry, in pygtk, no C code) where the child sleeps 1 second before
quitting, and the parent waits 2 seconds before calling g_child_watch_add.  In
this case the callback is never called.

Conclusion: g_child_watch_add doesn't work if the target process no longer
exists.  It is imperative that we fix this, otherwise the API is almost useless.
Comment 3 Gustavo Carneiro 2004-10-08 14:07:53 UTC
Without looking at glib code, I think the following pseudo-code should solve the
problem without race conditions:

  g_child_watch_add(pid, cb, data):
    1. src = setup_child_watch_notifier(pid, cb, data)
    2. pid1 = waitpid(pid, NULL, WNOHANG)
    3. if (pid1 == pid):
           /* child exited */
           destroy_child_watch_notifier(src)
           cb(data)
Comment 4 Owen Taylor 2004-10-08 17:05:24 UTC
I don't understand why the child no lonbger exists in your example ...
with DO_NOT_REAP_CHILD you should get a zombie process until the
child watch waits for it.

You *CANNOT* reliably wait for a process that no longer exists, because
the PID may have been reused for a different process. Plus you can
no longer get the exit status. I don't think we should try to make
GChildWatch work for the case where the child has exited and been
reaped.
Comment 5 Gustavo Carneiro 2004-10-08 17:17:11 UTC
I did a new test, where the child sleeps 1 second before quitting, and the
parent waits 10 seconds before calling g_child_watch_add.  I did a "ps x" and
saw the child zombie process. The child notification callback is never called.
Comment 6 Gustavo Carneiro 2004-10-30 12:15:15 UTC
Created attachment 33252 [details] [review]
Patch to fix test case in unix

This fixes the problem, on unix.
One thing bothers me, though.  Notice the commented code:
+/*   if (g_child_watch_check (source)) */
+/*	  g_message("Child %i exited", pid); */
In principle, it should be enough to call g_child_watch_check to fix the
problem.  However, that function has some weird child_watch_count guard that
never allows the function to do its work.  Perhaps that is the root of the
problem...
Comment 7 Matthias Clasen 2004-11-08 15:42:13 UTC
2004-11-08  Matthias Clasen  <mclasen@redhat.com>
	
	* glib/gmain.c: Initialize child_watch_count to 1, so 
	that we don't miss the very first child if it exits 
	before we set up the child watch. In that case we had 
	previously source->count == child_watch_count == 0, 
	causing g_child_watch_check() to skip the waitpid() 
	call.  (#154827, Gustavo Carneiro)

	* glib/gmain.c (g_child_watch_source_init_single) 
	(g_child_watch_source_init_multi_threaded): Use sigaction()
	instead of signal().  (#136867, Jonas Jonsson, patch by
	Archana Shah)