GNOME Bugzilla – Bug 145597
child-test failure on HP-UX 11.x, Solaris 2.5.1-9, AIX 4.3.x, 5.x, IRIX 6.5
Last modified: 2004-12-22 21:47:04 UTC
$ cd tests $ ./child-test child 19773 (ttl 10) exited, status 0 [hang]
I ran a quick test on Solaris 9/SPARC and this bug does not occur if I disable threads. The following platforms do pass child-test successfully: HP-UX 10.20 (doesn't have threads) Tru64 UNIX 4.0D, 5.1 Redhat Linux 7.1, 9 Redhat Enterprise Linux 2.1, 3.0
The ps output while the process is hung: $ ps -fu china ... china 19293 464 0 21:48:51 pts/3 0:00 /opt/build/glib-2.4.2/tests/.libs/child-test china 19309 19293 0 0:00 <defunct> ... So, it looks like someone is not catching SIGCHLD.
On Solaris the signal list is reset after the call of the handler function when signal() is used. The error does not occur if you use sigset() in gmain.c for the SIGCHLD handling. What goes wrong in child-test: 1) first child is generated and SIGCHLD is set. 2) second child is generated and SIGCHLD is set the second time (obviously this has no impact). ... 3) first child returns, the handler is called and the SIGCHLD is deleted fom the active signal list. 4) second child returns and the signal list does not contain SIGCHLD anymore, thus child-test hangs waiting for an interrupt. So, if you want a signal to be set for more than one interrupt you need to use sigset() on Solaris. Regards
BTW, the hang doesn't occur only on Solaris. HP-UX 11.x hangs as well. Should we just replace: signal (SIGCHLD, g_child_watch_signal_handler); with: sigset (SIGCHLD, g_child_watch_signal_handler);
Yes, this happens on systems where the signal() implementation is following System V standard and do not have a BSD implementation. From Linux signal(3) manual: PORTABILITY The original Unix signal() would reset the handler to SIG_DFL, and Sys- tem V (and the Linux kernel and libc4,5) does the same. On the other hand, BSD does not reset the handler, but blocks new instances of this signal from occurring during a call of the handler. The glibc2 library follows the BSD behaviour. The best way to solve this problem is to add a test to "configure" which attempts to get the behaviour of signal(): /* check if signal handler is set to SIG_DFL after a signal */ #include <signal.h> int sig; void handler(int x) { sig++; } main() { sig=0; signal( SIGUSR1, handler); kill(getpid(), SIGUSR1); sleep(1); kill(getpid(), SIGUSR1); exit (sig); } If the check does not return "2" sigset() must be used. Another solution may to test for the existence of sigset(), and always use sigset() if it is defined in libc and use signal() if not (Linux for example does not have sigset).
Ok, I'll work on a patch based on this. Thanks. BTW, why the call to sleep(1)? I'd rather not introduce a 1s sleep in the autoconf script.
You may be right, I just wanted to be sure that the signals are delivered one after another and that the signal handler ist already processed wenn the second signal is generated. I tested the script without the sleep() on a Solaris and a Linux Box and it is working fine.
Created attachment 29715 [details] [review] Patch to determine signal behavior
Created attachment 30402 [details] [review] Updated patch configure.in patch in #29715 wouldn't work.
Take a look at bug 136867
Ok, looks like the same bug. So, what solution should we use?
*** This bug has been marked as a duplicate of 136867 ***