After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 723743 - g_child_watch_add() doesn't check for non-pids
g_child_watch_add() doesn't check for non-pids
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: mainloop
2.38.x
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2014-02-06 09:09 UTC by Simon McVittie
Modified: 2017-10-12 11:23 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
g_child_watch_source_new: POSIX pid must be positive (3.68 KB, patch)
2014-02-06 18:36 UTC, Simon McVittie
committed Details | Review
g_child_watch_new: enforce one-watch-per-pid (1.77 KB, patch)
2014-02-06 18:37 UTC, Simon McVittie
none Details | Review
g_child_watch_source_new: Document restrictions for POSIX platforms (5.58 KB, patch)
2017-10-11 13:31 UTC, Simon McVittie
committed Details | Review

Description Simon McVittie 2014-02-06 09:09:06 UTC
While trying to construct a minimal test-case for the g_spawn_* weirdness in <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=737380>, I discovered that g_child_watch_source_new (p) does not check whether the value of p is "a real PID", or a pseudo-PID like -1; and both g_child_watch_source_new() and dispatch_unix_signals() will call waitpid (source->pid, ., .) regardless.

This means that a call to g_child_watch_source_new(), g_child_watch_add_full() or g_child_watch_add() with non-positive pid will result in the behaviour that dispatch_unix_signals() warns about:

    "Most likely the process is ignoring SIGCHLD, or some other thread is
    invoking waitpid() with a nonpositive first argument; either behavior
    can break applications that use g_spawn_sync either directly or
    indirectly."

If use of those functions with pid <= 0 is going to cause problems, shouldn't they g_return_[val_]if_fail (pid > 0) on Unix platforms?
Comment 1 Simon McVittie 2014-02-06 09:21:00 UTC
Unfortunately, I don't think this can necessarily explain what's happening to the GApplication test, unless there are bugs in the implementation of g_spawn_async_with_pipes() that are causing it to return with neither an error set, nor a valid child_pid filled in, because it does this:

  g_spawn_async_with_pipes (NULL, args, NULL,
                            G_SPAWN_DO_NOT_REAP_CHILD,
                            NULL, NULL, &pid, NULL,
                            &data->stdout_pipe, NULL, &error);
  g_assert_no_error (error);

  g_child_watch_add (pid, child_quit, data);

When I have a compiled GLib tree I'll try inserting a g_assert_cmpint (pid, >, 0); but the only machine where I can reproduce this is a 444 MHz mipsel, and compiling GLib there takes several hours.

The test failure was

> # Start of gapplication tests
> ok 1 /gapplication/no-dbus
> ok 2 /gapplication/basic
> # GLib-FATAL-WARNING: In call to g_spawn_sync(), exit status of a
> child process was requested but ECHILD was received by waitpid().
> Most likely the process is ignoring SIGCHLD, or some other thread is
> invoking waitpid() with a nonpositive first argument; either behavior
> can break applications that use g_spawn_sync either directly or
> indirectly.
> (/«PKGBUILDDIR»/debian/build/deb/gio/tests/.libs/lt-gapplication:10577):
> GLib-WARNING **: In call to g_spawn_sync(), exit status of a child
> process was requested but ECHILD was received by waitpid(). Most
> likely the process is ignoring SIGCHLD, or some other thread is
> invoking waitpid() with a nonpositive first argument; either behavior
> can break applications that use g_spawn_sync either directly or
> indirectly.

and when I investigated it, I found that it was attempting D-Bus autolaunch (with g_spawn_sync()) in the GDBus thread, in parallel with the main thread executing the code I mentioned above. Bug #723506 is that I don't think it should be attempting D-Bus autolaunch without a $DISPLAY, which works around the test failure.
Comment 2 Allison Karlitskaya (desrt) 2014-02-06 09:46:09 UTC
Patch happily accepted to add the non-negative check that you mention.
Comment 3 Simon McVittie 2014-02-06 18:05:22 UTC
(In reply to comment #0)
>     "Most likely the process is ignoring SIGCHLD, or some other thread is
>     invoking waitpid() with a nonpositive first argument; either behavior
>     can break applications that use g_spawn_sync either directly or
>     indirectly."

While looking into adding this I noticed that that message can also appear if you try to watch the same child twice. When dispatch_unix_signals_unlocked() gets to the most recently-added instance of that pid (because we prepend to the list), waitpid() will return that pid and all is fine. When it gets to the older instance, waitpid() fails with ECHILD and you get the warning.

The documentation for g_child_watch_add() does say:

    GLib supports only a single callback per process id

so perhaps that should be diagnosed as programmer error too?
Comment 4 Simon McVittie 2014-02-06 18:35:22 UTC
(In reply to comment #0)
> While trying to construct a minimal test-case for the g_spawn_* weirdness in
> <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=737380>

I discovered when I accidentally scp'd the wrong version of gmain.c onto my test machine that this is in fact Bug #711090, which was fixed in master recently but hasn't made it to the stable-branch yet...
Comment 5 Simon McVittie 2014-02-06 18:36:21 UTC
Created attachment 268326 [details] [review]
g_child_watch_source_new: POSIX pid must be positive

If we used a non-positive pid, we'd call waitpid(that_pid, ...)
which is exactly the situation this function can't deal with.

On Windows, GPid is a HANDLE (pointer), so I don't think the same thing
applies.
Comment 6 Simon McVittie 2014-02-06 18:37:28 UTC
Created attachment 268327 [details] [review]
g_child_watch_new: enforce one-watch-per-pid

---

I don't know whether this is going to break anything... but the code doesn't cope unless this assumption is satisfied, and it is documented.
Comment 7 Allison Karlitskaya (desrt) 2014-02-06 22:01:17 UTC
Review of attachment 268326 [details] [review]:

Can you add the checks to the convenience APIs as well?  It seems like they will segfault if the source_new returns NULL....
Comment 8 Allison Karlitskaya (desrt) 2014-02-06 22:03:08 UTC
Review of attachment 268327 [details] [review]:

Not crazy about this due to the linear scan... and it will only catch the multiple-add in a case that termination of the child has not interceded...

I wonder if we could as well ask the kernel "is this waitable?" without actually waiting in order to catch the full range of possible bad values to this function... I'm still not sure that we would want to do that.

Did this actually bite you?
Comment 9 Simon McVittie 2014-02-07 11:20:20 UTC
(In reply to comment #7)
> Can you add the checks to the convenience APIs as well?  It seems like they
> will segfault if the source_new returns NULL....

I added an early-return in g_child_watch_add_full() too. g_child_watch_add() just calls g_child_watch_add_full() with an extra NULL argument, so there's nothing extra to do there.

(In reply to comment #8)
> Did this actually bite you?

No; it was one theory for why a test might be failing on a slow autobuilder.

I'd be fine with not checking this... but when there are these additional ways to trigger the g_warning() I quoted, the warning seems a bit misleading. Perhaps it would be better to collect all the things that will break child-watch sources into the g_child_watch_source_new() docs, and change the warning to point to that, maybe even by URL?

If you don't like the linear scan, a map { pid => watch } (perhaps a GHashTable?) would make it obvious that there can be at most one watch per pid.
Comment 10 Simon McVittie 2014-02-07 11:29:10 UTC
(In reply to comment #9)
> Perhaps it would be better to collect all the things that will break
> child-watch sources into the g_child_watch_source_new() docs, and change the
> warning to point to that, maybe even by URL?

Here's a sketch of that (untested, gtk-doc might not like my Markdown):

/**
 * g_child_watch_source_new:
...
 * On POSIX platforms, the following restrictions apply to this API:
 *
 * * @pid must be a child of this process
 * * @pid must be positive
 * * the application must not call `waitpid` with a non-positive
 *   first argument, for instance in another thread
 * * the application must not wait for @pid to exit by any other
 *   mechanism, including `waitpid(pid, ...)` or a second child-watch
 *   source for the same @pid
 * * the application must not ignore SIGCHILD
 *
 * If any of those are not followed, GLib will warn that `ECHILD` was
 * received by `waitpid`.
...
 * Since: 2.4
 **/


/**
 * g_child_watch_add_full:
...
 * GLib supports only a single callback per process id.
 * On POSIX platforms, the same restrictions mentioned for
 * g_child_watch_source_new() apply to this function.
...
 */

/**
 * g_child_watch_add:
...
 * GLib supports only a single callback per process id.
 * On POSIX platforms, the same restrictions mentioned for
 * g_child_watch_source_new() apply to this function.
...
 */

                  else if (pid == -1 && errno == ECHILD)
                    {
                      g_warning ("GChildWatchSource: Exit status of a child process was requested but ECHILD was received by waitpid(). See the g_child_watch_source_new() documentation.");

...

/**
 * g_spawn_sync:
...
 * If @exit_status is non-%NULL, the platform-specific exit status of
 * the child is stored there; see the documentation of
 * g_spawn_check_exit_status() for how to use and interpret this.
 * Note that it is invalid to pass %G_SPAWN_DO_NOT_REAP_CHILD in
 * @flags, and on POSIX platforms, the same restrictions as for
 * g_child_watch_source_new() apply.
 */

      else if (errno == ECHILD)
        {
          if (exit_status)
            {
              g_warning ("In call to g_spawn_sync(), exit status of a child process was requested but ECHILD was received by waitpid(). See the g_child_watch_source_new() documentation");
            }
          else
Comment 11 Allison Karlitskaya (desrt) 2014-02-07 16:51:50 UTC
Review of attachment 268326 [details] [review]:

You're right.  Please go ahead.
Comment 12 Allison Karlitskaya (desrt) 2014-02-07 16:52:24 UTC
Review of attachment 268327 [details] [review]:

I'm not keen on adding extra data structures that we maintain solely for the purpose of throwing criticals...
Comment 13 Simon McVittie 2014-02-11 15:01:30 UTC
(In reply to comment #12)
> I'm not keen on adding extra data structures that we maintain solely for the
> purpose of throwing criticals...

That's reasonable. As an alternative, how about the text I suggested in Comment #10? I'll turn it into a proper patch if someone agrees in principle with the approach.
Comment 14 Simon McVittie 2014-02-11 15:37:43 UTC
Review of attachment 268326 [details] [review]:

commit 125913e9, should be in 2.39.5
Comment 15 Philip Withnall 2017-10-11 11:58:42 UTC
(In reply to Simon McVittie from comment #13)
> (In reply to comment #12)
> > I'm not keen on adding extra data structures that we maintain solely for the
> > purpose of throwing criticals...
> 
> That's reasonable. As an alternative, how about the text I suggested in
> Comment #10? I'll turn it into a proper patch if someone agrees in principle
> with the approach.

Seems like a reasonable approach. Patch away at your leisure.
Comment 16 Simon McVittie 2017-10-11 13:31:37 UTC
Created attachment 361327 [details] [review]
g_child_watch_source_new: Document restrictions for POSIX platforms

The warnings issued when dealing with waitpid() raising ECHILD are
somewhat misleading: there are lots of reasons why waitpid() might
fail in this way, and we can't tell which one has happened.
In particular, passing a non-child or a non-pid, waiting for the same
pid elsewhere, or creating a duplicate watch for the same pid would
all fail in the same way.

Consolidate the restrictions into one place, and change all the other
places they were (or should have been!) mentioned to point to
that one place.
Comment 17 Philip Withnall 2017-10-12 09:36:23 UTC
Review of attachment 361327 [details] [review]:

++
Comment 18 Simon McVittie 2017-10-12 11:23:07 UTC
Comment on attachment 361327 [details] [review]
g_child_watch_source_new: Document restrictions for POSIX platforms

6e480634c
Comment 19 Simon McVittie 2017-10-12 11:23:32 UTC
Fixed in git for 2.55.0, thanks for reviewing