GNOME Bugzilla – Bug 748885
[review] minor cleanups killing child processes [ th/waitpid-bgo748885 ]
Last modified: 2015-05-05 14:45:51 UTC
Please review I came up with this when investigating http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=f53fda9fd6dfc68d8c70e3c273b02eac70d95f43 -- no idea why that failure is in the logs. But the revert still seems right to me.
> dns-manager: use nm_utils_kill_child_sync() to wait for netconfig to exit We are calling g_spawn_async_with_pipes() to launch netconfig without the G_SPAWN_DO_NOT_REAP_CHILD and therefore the child process is automatically free'd after termination; I think this is the reason why the previous implementation checked for ECHILD. Now with nm_utils_kill_child_sync() I get these errors even if netconfig returned successfully: [NetworkManagerUtils.c:742] nm_utils_kill_child_sync(): kill child process 'netconfig' (26762): sending SIGKILL... NetworkManagerUtils.c:769] nm_utils_kill_child_sync(): kill child process 'netconfig' (26762): after sending no signal (0) and SIGKILL, waitpid failed with No child processes (10) (1000174 usec elapsed) <warn> could not commit DNS changes: Error waiting for netconfig to exit: No child processes I think that we should pass the flag G_SPAWN_DO_NOT_REAP_CHILD to g_spawn_async_with_pipes() to be sure that the process is still there (in zombie state if already terminated) when we wait for it.
(In reply to Beniamino Galvani from comment #1) > > dns-manager: use nm_utils_kill_child_sync() to wait for netconfig to exit > > We are calling g_spawn_async_with_pipes() to launch netconfig without > the G_SPAWN_DO_NOT_REAP_CHILD and therefore the child process is > automatically free'd after termination; I think this is the reason why > the previous implementation checked for ECHILD. Good catch. When using nm_utils_kill_child_sync() you must not have a g_child_watch() and use G_SPAWN_DO_NOT_REAP_CHILD. > Now with nm_utils_kill_child_sync() I get these errors even if > netconfig returned successfully: > > [NetworkManagerUtils.c:742] nm_utils_kill_child_sync(): kill child process > 'netconfig' (26762): sending SIGKILL... > NetworkManagerUtils.c:769] nm_utils_kill_child_sync(): kill child process > 'netconfig' (26762): after sending no signal (0) and SIGKILL, waitpid failed > with No child processes (10) (1000174 usec elapsed) > <warn> could not commit DNS changes: Error waiting for netconfig to exit: > No child processes > > I think that we should pass the flag G_SPAWN_DO_NOT_REAP_CHILD to > g_spawn_async_with_pipes() to be sure that the process is still there > (in zombie state if already terminated) when we wait for it. Repushed. Better now?
(In reply to Thomas Haller from comment #2) > Repushed. Better now? I tested the handling of netconfig termination and it works as expected. The branch looks OK to me.
Did you (or if not, can you) do a quick audit to make sure that everywhere that passes DO_NOT_REAP_CHILD also has a g_child_watch_add()? Otherwise we're left with zombies, right? Anyway, branch looks OK to me now.
(In reply to Dan Williams from comment #4) > Did you (or if not, can you) do a quick audit to make sure that everywhere > that passes DO_NOT_REAP_CHILD also has a g_child_watch_add()? Otherwise > we're left with zombies, right? > > Anyway, branch looks OK to me now. Not exactly. If (and only if) we don't specify DO_NOT_REAP_CHILD, we must reap them ourselves with waitpid(). You can optionally use g_child_watch_add() to get a notification when to call waitpid(), but that is not strictly speaking necessary, as long as you eventually waitpid(). But OTOH, if you use g_child_watch_add(), you must specify DO_NOT_REAP_CHILD. And if you use nm_utils_kill_child_*sync(), you must specify DO_NOT_REAP_CHILD and g_source_remove() all pending child watches.
merged as http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=4d4f5fff5ce48a92dfdc19e8ecfb39a40c95b687 (In reply to Dan Williams from comment #4) > Did you (or if not, can you) do a quick audit to make sure that everywhere > that passes DO_NOT_REAP_CHILD also has a g_child_watch_add()? I kinda looked at our child process already, so I think it should be all correct. Note that commit f53fda9fd6dfc68d8c70e3c273b02eac70d95f43 Author: Thomas Haller <thaller@redhat.com> Date: Fri May 1 23:14:38 2015 Revert "core: treat ECHILD as child already terminated" This reverts commit 268da271cc2955400fa0068294a64a75e0422cee. This breaks the test, but it is not clear that this is the right fix. Revert for now, needs still investigation. is not solved by this... (I don't know the cause there).