After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 136867 - child-test is still hanging
child-test is still hanging
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
2.3.x
Other HP-UX
: Normal major
: ---
Assigned To: gtkdev
gtkdev
: 145597 (view as bug list)
Depends on:
Blocks: 157195
 
 
Reported: 2004-03-11 09:21 UTC by Jonas Jonsson
Modified: 2011-02-18 16:09 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Debugging patch (1.14 KB, patch)
2004-03-17 16:28 UTC, Owen Taylor
none Details | Review
Program that doesn't work on HP-UX (898 bytes, application/octet-stream)
2004-06-03 07:49 UTC, Jonas Jonsson
  Details
Patch with signal () replaced by sigaction () (2.02 KB, patch)
2004-08-06 11:23 UTC, Archana Shah
none Details | Review
Patch which reinstalls signal using signal () call (1.32 KB, patch)
2004-08-06 11:24 UTC, Archana Shah
none Details | Review

Description Jonas Jonsson 2004-03-11 09:21:39 UTC
Much as described in http://bugzilla.gnome.org/show_bug.cgi?id=136539,
child-test hangs.  However, it's not really the same thing here.

When running tests, it's only the first child that exits:

adbjsjn@lilith:glib-2.3.6/tests>./child-test 
child 10295 (ttl 10) exited, status 0


A "ps" from another x-terminal:

 adbjsjn 10282 10055  0 10:08:56 pts/2     0:00 ...glib/work/main.d/glib-2.
 adbjsjn 10032 10014  0 10:01:32 pts/2     0:00 -sh
 adbjsjn 10296 10282  0 10:08:56 pts/2     0:00 <defunct>
 adbjsjn 10055 10032  0 10:01:33 pts/2     0:00 bash

From gdb:

adbjsjn@lilith:tests/.libs>gdb child-test 
Detected 64-bit executable.
Invoking /opt/langtools/bin/gdb64.
HP gdb 3.2 for PA-RISC 2.0 (wide), HP-UX 11.00.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 3.2 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
(gdb) r
Starting program:.../glib-2.3.6/tests/.libs/child-test 
[New process 10206]
Detaching after fork from process 10206
[New process 10209]
Detaching after fork from process 10209
[New process 10210]
Detaching after fork from process 10210
child 10209 (ttl 10) exited, status 0

Program received signal SIGINT, Interrupt.
0x800003ffff5dcc74 in _poll_sys+0x2c () from /lib/pa20_64/libc.2
(gdb) Quit
(gdb) 

/usr/local/pa64/bin/gcc -v
Reading specs from /usr/local/pa64/lib/gcc-lib/hppa64-hp-hpux11.11/3.3.2/specs
Configured with: /scratch/root/gcc-pkg/3.3.1/hpux-11/gcc-3.3.2/configure
--enable-languages=c,c++ --enable-threads=posix --disable-nls --with-gnu-as
--with-gnu-ld --with-as=/usr/local/pa64/bin/as
--with-ld=/usr/local/pa64/bin/ld --host=hppa64-hp-hpux11.11
--target=hppa64-hp-hpux11.11 --prefix=/usr/local/pa64
Thread model: posix
gcc version 3.3.2
Comment 1 Jonas Jonsson 2004-03-11 09:23:59 UTC
And glib is 2.3.6 ...
Comment 2 Sebastian Wilhelmi 2004-03-11 10:28:47 UTC
The following patch is actually unrelated to this bug report, but it
implements a new version of g_child_watch. Could you please try,
whether it works around your problem es well.

http://bugzilla.gnome.org/showattachment.cgi?attach_id=25511
Comment 3 Jonas Jonsson 2004-03-11 12:59:15 UTC
It DOES work!

gmake[4]: Entering directory
`/alcesys/build/garnome-0.30.1/platform/glib/work/main.d/glib-2.3.6/tests'
PASS: atomic-test
PASS: array-test
PASS: cxx-test
whee! created pid: 28126 (ttl 4)
whee! created pid: 28127 (ttl 2)
whee! created pid: 28129 (ttl 5)
whee! created pid: 28128 (ttl 3)
child 28127 (ttl 2) exited, status 0
child 28128 (ttl 3) exited, status 0
child 28126 (ttl 4) exited, status 0
child 28129 (ttl 5) exited, status 0
whee! created pid: 28130 (ttl 2)
whee! created pid: 28131 (ttl 6)
whee! created pid: 28132 (ttl 4)
child 28130 (ttl 2) exited, status 0
child 28132 (ttl 4) exited, status 0
child 28131 (ttl 6) exited, status 0
whee! created pid: 28134 (ttl 2)
whee! created pid: 28133 (ttl 2)
child 28133 (ttl 2) exited, status 0
child 28134 (ttl 2) exited, status 0
PASS: child-test
Comment 4 Owen Taylor 2004-03-14 17:57:40 UTC
I have some reservations about Sebastian's new approach,
so I'd like to try to get debugging on what is going on
here; this isn't a problem with threading, since in
2.3.6, child-test isn't threaded.

Can you add some debugging into:

 g_child_watch_signal_handler()

At the beginning, add:
 
 write (2, "SIG\n", 4);

And in 

 g_child_watch_prepare
 g_child_watch_check
 g_child_watch_dispatch
 
add 

 g_printerr ("prepare: Checking for %d, counts = %d\n",
             ((GChildWatchSource *)source)->pid,
             ((GChildWatchSource *)source)->count,
              child_watch_count, 

(Same for check and dispatch, but with check:, dispatch: instead)

And see what that logs? I suspect we have some simple logic
error in the code, but I can't figure it out offhand.
Comment 5 Jonas Jonsson 2004-03-15 09:43:25 UTC
>And in 
>
> g_child_watch_prepare
> g_child_watch_check
> g_child_watch_dispatch
> 
>add 
>
> g_printerr ("prepare: Checking for %d, counts = %d\n",
>             ((GChildWatchSource *)source)->pid,
>             ((GChildWatchSource *)source)->count,
>              child_watch_count, 
>
>(Same for check and dispatch, but with check:, dispatch: instead)

No, I couldn't add this.  GChildWatchSource doesn't have a count
attribute (at least in my glib-2.3.6 tar-ball) and 'child_watch_count'
is totally unknown to my compiler.  Should it be a gint?  Where should
I initiate the variable, where should it be increased/decreased etc?
Should there be something after child_watch_count in the g_printerr()
call, I assumed no -> will g_printerr() handle the extra argument,
there's only formatting for two of them?

Would really like to get this going on HP-UX, but I'm afraid I can't
put down the amount of time needed, I might be able to put in half an
hour here, half an hour there ..... 
Comment 6 Jonas Jonsson 2004-03-17 15:33:48 UTC
Now with glib-2.4.0

So, I've tried this with HP's Ansi C-compiler, same result with
original code.

what /usr/bin/cc
/usr/bin/cc:
        $Revision: 92453-07 linker linker crt0.o B.11.16.01 030316 $
        LINT B.11.11.08 CXREF B.11.11.08
        HP92453-01 B.11.11.08 HP C Compiler
         $ PATCH/11.00:PHCO_27774  Oct  3 2002 09:45:59 $ 

CFLAGS = -Ae +DA2.0W -g

When running the program in gdb, I get this output:
tests/.libs>gdb64 child-test 
HP gdb 3.2 for PA-RISC 2.0 (wide), HP-UX 11.00.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 3.2 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for
warranty/support.
..
(gdb) r
Starting program:
/alcesys/build/garnome-0.30.1/platform/glib/work/main.d/glib-2.4.0/tests/.libs/child-test

[New process 19453]
Detaching after fork from process 19453
[New process 19456]
warning: reading `r3' register: No data
warning: reading `r3' register: No data
Detaching after fork from process 19456
[New process 19457]
warning: reading `r3' register: No data
warning: reading `r3' register: No data
Detaching after fork from process 19457
child 19456 (ttl 10) exited, status 0

Program received signal SIGINT, Interrupt.
0x800003ffff5cac74 in _poll_sys+0x2c () from /lib/pa20_64/libc.2
(gdb) kill
Kill the program being debugged? (y or n) y
(gdb) quit
Comment 7 Owen Taylor 2004-03-17 16:28:56 UTC
Created attachment 25733 [details] [review]
Debugging patch
Comment 8 Owen Taylor 2004-03-17 16:30:59 UTC
I've attached a patch that adds the debugging output
as described above. Could you apply this patch and then
run child-watch (not under gdb, the gdb output is confusing
rather than helpful here)

For comparison, I (on Linux) get:

===
prepare: Checking pid 5413, counts = 0/0
prepare: Checking pid 5414, counts = 0/0
SIG
check: Checking pid 5413, counts = 0/1
check: Checking pid 5414, counts = 0/1
child 5413 (ttl 10) exited, status 0
prepare: Checking pid 5414, counts = 1/1
SIG
check: Checking pid 5414, counts = 1/2
child 5414 (ttl 20) exited, status 0
===
Comment 9 Jonas Jonsson 2004-03-18 12:45:06 UTC
prepare: Checking pid 12255, counts = 0/0
prepare: Checking pid 12256, counts = 0/0
SIG
check: Checking pid 12255, counts = 0/1
check: Checking pid 12256, counts = 0/1
child 12255 (ttl 10) exited, status 0
prepare: Checking pid 12256, counts = 1/1
gmake[4]: *** [check-TESTS] Error 130

Looking with a ps -fu ME from another terminal shows one process
(12256) as <defunct>. After the third prepare, nothing happens .... 
on HP-UX

This is compiled as from my initial report (gcc).  Unless there's any
change in result, I won't report the result from HP-UX Ansi C ...
Comment 10 Owen Taylor 2004-03-18 14:36:20 UTC
Could you add a line to tests/child-test.c - after

 sleep(ttl)

add:

  g_printerr ("Exiting, ttl=%d pid=%d\n", ttl, getpid());

And try it again?
Comment 11 Jonas Jonsson 2004-03-22 13:27:21 UTC
./child-test 

prepare: Checking pid 10846, counts = 0/0
prepare: Checking pid 10847, counts = 0/0
Exiting ..., ttl=10, pid=10846
SIG
check: Checking pid 10846, counts = 0/1
check: Checking pid 10847, counts = 0/1
child 10846 (ttl 10) exited, status 0
prepare: Checking pid 10847, counts = 1/1
Exiting ..., ttl=20, pid=10847
Comment 12 Owen Taylor 2004-03-22 20:04:02 UTC
After some investigation, it looks like this is a very old
BSD / SysV compatiblity issue - SysV resets handlers installed
with signal() after they are called, BSD (and current Linux)
doesn't.

Could you, to test this theory, put:

   signal (SIGCHLD, g_child_watch_signal_handler);

as the very last line of 
g_child_watch_signal_handler() and see if that fixes the
problem?

I think the best long-term solution is probably to switch
to using sigaction() rather than signal() to install
the signal handler.
Comment 13 Jonas Jonsson 2004-06-03 07:49:42 UTC
Created attachment 28285 [details]
Program that doesn't work on HP-UX

Yes, it seems to be the case here.  The attached program runs fine on Linux
(RH) and cygwin but fails on HP-UX (11.23).  Very soon (probably next week),
I'll have a machine again, and then this problem *WILL* be solved :).
Comment 14 Archana Shah 2004-08-04 05:33:03 UTC
This is happening on Solaris also. There gnome-terminal does not exit because
SIGCHLD is emitted only once. So only one window gets closed when we say 'exit'
and rest all just hang. We can have two solutions for this, either of them fixes
this bug.

One solution is to use sigaction instead of signal. 
Here is the change:

 g_child_watch_source_init_multi_threaded (void)
 {
   GError *error = NULL;
+  struct sigaction action;

   g_assert (g_thread_supported());

@@ -3630,7 +3631,10 @@ g_child_watch_source_init_multi_threaded
   if (g_thread_create (child_watch_helper_thread, NULL, FALSE, &error) == NULL)
    g_error ("Cannot create a thread to monitor child exit status: %s\n",
error->message);
   child_watch_init_state = CHILD_WATCH_INITIALIZED_THREADED;
-  signal (SIGCHLD, g_child_watch_signal_handler);
+  action.sa_handler = g_child_watch_signal_handler ;
+  sigemptyset (&action.sa_mask);
+  action.sa_flags = SA_RESTART | SA_NOCLDSTOP;
+  sigaction (SIGCHLD, &action, NULL);
 }


Other solution is to re-install signal every time it is caught. For this the
change that has to be mads is :

@@ -3551,6 +3551,8 @@ g_child_watch_signal_handler (int signum
 {
   child_watch_count ++;

+  signal (SIGCHLD, g_child_watch_signal_handler);
+
   if (child_watch_init_state == CHILD_WATCH_INITIALIZED_THREADED)
     {
       write (child_watch_wake_up_pipe[1], "B", 1);
Comment 15 Archana Shah 2004-08-06 11:21:51 UTC
I am attaching both the patches here
Comment 16 Archana Shah 2004-08-06 11:23:55 UTC
Created attachment 30273 [details] [review]
Patch with signal () replaced by sigaction ()
Comment 17 Archana Shah 2004-08-06 11:24:53 UTC
Created attachment 30274 [details] [review]
Patch which reinstalls signal using signal () call
Comment 18 Ivan Noris 2004-10-18 07:16:48 UTC
The "sigaction" patch fixes the problem for gnome-terminal in GNOME 2.6, glib
updated to 2.4.7. I didn't try the other patch.

The child-test still hangs though.
Testing on Solaris 9/SPARC.
Comment 19 Matthias Clasen 2004-11-08 15:41:25 UTC
I changed glib to use sigaction now. 

Please reopen if there are still issues.
Comment 20 Matthias Clasen 2004-11-08 15:47:27 UTC
*** Bug 145597 has been marked as a duplicate of this bug. ***