After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 748885 - [review] minor cleanups killing child processes [ th/waitpid-bgo748885 ]
[review] minor cleanups killing child processes [ th/waitpid-bgo748885 ]
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2015-05-04 12:06 UTC by Thomas Haller
Modified: 2015-05-05 14:45 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Thomas Haller 2015-05-04 12:06:04 UTC
Please review




I came up with this when investigating http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=f53fda9fd6dfc68d8c70e3c273b02eac70d95f43

-- no idea why that failure is in the logs. But the revert still seems right to me.
Comment 1 Beniamino Galvani 2015-05-05 07:58:28 UTC
> dns-manager: use nm_utils_kill_child_sync() to wait for netconfig to exit

We are calling g_spawn_async_with_pipes() to launch netconfig without
the G_SPAWN_DO_NOT_REAP_CHILD and therefore the child process is
automatically free'd after termination; I think this is the reason why
the previous implementation checked for ECHILD.

Now with nm_utils_kill_child_sync() I get these errors even if
netconfig returned successfully:

[NetworkManagerUtils.c:742] nm_utils_kill_child_sync(): kill child process 'netconfig' (26762): sending SIGKILL...
NetworkManagerUtils.c:769] nm_utils_kill_child_sync(): kill child process 'netconfig' (26762): after sending no signal (0) and SIGKILL, waitpid failed with No child processes (10) (1000174 usec elapsed)
<warn>  could not commit DNS changes: Error waiting for netconfig to exit: No child processes

I think that we should pass the flag G_SPAWN_DO_NOT_REAP_CHILD to
g_spawn_async_with_pipes() to be sure that the process is still there
(in zombie state if already terminated)	when we wait for it.
Comment 2 Thomas Haller 2015-05-05 08:37:34 UTC
(In reply to Beniamino Galvani from comment #1)
> > dns-manager: use nm_utils_kill_child_sync() to wait for netconfig to exit
> 
> We are calling g_spawn_async_with_pipes() to launch netconfig without
> the G_SPAWN_DO_NOT_REAP_CHILD and therefore the child process is
> automatically free'd after termination; I think this is the reason why
> the previous implementation checked for ECHILD.

Good catch. When using nm_utils_kill_child_sync() you must not have a g_child_watch() and use  G_SPAWN_DO_NOT_REAP_CHILD.


> Now with nm_utils_kill_child_sync() I get these errors even if
> netconfig returned successfully:
> 
> [NetworkManagerUtils.c:742] nm_utils_kill_child_sync(): kill child process
> 'netconfig' (26762): sending SIGKILL...
> NetworkManagerUtils.c:769] nm_utils_kill_child_sync(): kill child process
> 'netconfig' (26762): after sending no signal (0) and SIGKILL, waitpid failed
> with No child processes (10) (1000174 usec elapsed)
> <warn>  could not commit DNS changes: Error waiting for netconfig to exit:
> No child processes
> 
> I think that we should pass the flag G_SPAWN_DO_NOT_REAP_CHILD to
> g_spawn_async_with_pipes() to be sure that the process is still there
> (in zombie state if already terminated)	when we wait for it.

Repushed. Better now?
Comment 3 Beniamino Galvani 2015-05-05 09:20:18 UTC
(In reply to Thomas Haller from comment #2)
> Repushed. Better now?

I tested the handling of netconfig termination and it works as expected.
The branch looks OK to me.
Comment 4 Dan Williams 2015-05-05 14:20:38 UTC
Did you (or if not, can you) do a quick audit to make sure that everywhere that passes DO_NOT_REAP_CHILD also has a g_child_watch_add()?  Otherwise we're left with zombies, right?

Anyway, branch looks OK to me now.
Comment 5 Thomas Haller 2015-05-05 14:39:15 UTC
(In reply to Dan Williams from comment #4)
> Did you (or if not, can you) do a quick audit to make sure that everywhere
> that passes DO_NOT_REAP_CHILD also has a g_child_watch_add()?  Otherwise
> we're left with zombies, right?
> 
> Anyway, branch looks OK to me now.

Not exactly.

If (and only if) we don't specify DO_NOT_REAP_CHILD, we must reap them ourselves with waitpid().

You can optionally use g_child_watch_add() to get a notification when to call waitpid(), but that is not strictly speaking necessary, as long as you eventually waitpid().


But OTOH, if you use g_child_watch_add(), you must specify DO_NOT_REAP_CHILD.

And if you use nm_utils_kill_child_*sync(), you must specify DO_NOT_REAP_CHILD and g_source_remove() all pending child watches.
Comment 6 Thomas Haller 2015-05-05 14:45:42 UTC
merged as http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=4d4f5fff5ce48a92dfdc19e8ecfb39a40c95b687

(In reply to Dan Williams from comment #4)
> Did you (or if not, can you) do a quick audit to make sure that everywhere
> that passes DO_NOT_REAP_CHILD also has a g_child_watch_add()?


I kinda looked at our child process already, so I think it should be all correct.




Note that 

commit f53fda9fd6dfc68d8c70e3c273b02eac70d95f43
Author: Thomas Haller <thaller@redhat.com>
Date:   Fri May 1 23:14:38 2015

    Revert "core: treat ECHILD as child already terminated"
    
    This reverts commit 268da271cc2955400fa0068294a64a75e0422cee.
    
    This breaks the test, but it is not clear that this is the
    right fix. Revert for now, needs still investigation.


is not solved by this... (I don't know the cause there).