GNOME Bugzilla – Bug 672102
GSubprocess class
Last modified: 2013-10-17 19:08:23 UTC
Created attachment 209785 [details] Proposed GSubprocess class So the selling points for this over g_spawn_* are: 0) Is a GObject so fits in with the memory management you're using anyways 1) Supports GCancellable* 2) Operates in terms of G{Input,Output}Stream or file paths, not file descriptors 3) Distinguishes between API that returns valid UTF-8 (gchar*) versus API that returns byte arrays (GBytes) Attaching a stub header for discussion.
if "exit_code/exit_status" is the "status" return from waitpid(), then this has the same "needs the WIFEXITED/WEXITSTATUS/etc macros in order to be usable" problem of the current APIs... which apparently I forgot to file a bug about. Anyway, should fix that. I assume all of the things that have _set_ methods are object properties? Cancelling the cancellable passed to g_subprocess_start() kills the process? Or what? It seems a little weird... Maybe there should be a gio-style async API that invokes a callback when the process exits?
(In reply to comment #1) > if "exit_code/exit_status" is the "status" return from waitpid(), then this has > the same "needs the WIFEXITED/WEXITSTATUS/etc macros in order to be usable" > problem of the current APIs... which apparently I forgot to file a bug about. > Anyway, should fix that. Ah, right - we should have an API that sets a GError if the process exits with a non-zero code. But there also should be an API that doesn't, and you can query the exit code. > I assume all of the things that have _set_ methods are object properties? Yeah. > Cancelling the cancellable passed to g_subprocess_start() kills the process? Or > what? It seems a little weird... That was my thought yes. Actually that leads into a larger question about the lifecycle API. Do we want to support killing the process when the GSubprocess is unref'd? Should that be the default? Also something I want to add as an option, which is optionally calling prctl(PR_SET_PDEATHSIG) on Linux. On non-Linux Unix we'd need to spawn off a helper process with just sits there awaiting termination of either (it checks for termination of the parent via a pipe, and termination of the child via waitpid()). > Maybe there should be a gio-style async API that invokes a callback when the > process exits? Well, if you're using streams, you do want to know when the process exits, but it's not done until you've finished reading/writing. However one clearly important use case is "asynchronously run this process and give me the entire output as a GBytes". Like this: void g_subprocess_spawn_async_utf8 (GSubprocess *subprocess, GCancellable *cancellable, GAsyncReadyCallback callback, gpointer user_data); gboolean g_subprocess_spawn_async_utf8_finish (GSubprocess *subprocess, gchar **output_utf8, GError **error);
Interesting. First, here's some relevant art from udisks2 http://udisks.freedesktop.org/docs/latest/UDisksSpawnedJob.html and is typically used like this http://udisks.freedesktop.org/docs/latest/UDisksDaemon.html#udisks-daemon-launch-spawned-job-sync for example, from here http://cgit.freedesktop.org/udisks/tree/src/udiskslinuxpartition.c?id=1.97.0#n785 It's pretty custom but does exactly what I want (I often need to pass data to helpers via stdin, for example passwords. I also need to control what user to run as, easier just to pass a parameter than setup a child_func. It's also a little over-complicated since it also exports a D-Bus object but thought I'd mention it anyway. For GVfs, I did something similar, see http://git.gnome.org/browse/gvfs/tree/monitor/udisks2/gvfsudisks2utils.c?id=1.13.0#n343
(In reply to comment #0) > Created an attachment (id=209785) [details] > Proposed GSubprocess class > > So the selling points for this over g_spawn_* are: > > 0) Is a GObject so fits in with the memory management you're using anyways > 1) Supports GCancellable* > 2) Operates in terms of G{Input,Output}Stream or file paths, not file > descriptors > 3) Distinguishes between API that returns valid UTF-8 (gchar*) versus API that > returns byte arrays (GBytes) > > Attaching a stub header for discussion. A thing I'd like (which my udisks and gvfs stuff mentioned in comment 3 above lacks) is an extensible way to configure how the child process is being run - for example, setting the uid or euid. Looks like it's already set up this way, I mean, you have a split between the constructor and starting the child process. To provide an example, here's what I'm thinking about sub = g_subprocess_new_for_command_line (argv); .. g_subprocess_set_impersonate_uid (sub, uid_to_impersonate); .. g_subprocess_start (sub); where g_subprocess_set_impersonate_uid() is something we can add later (properly G_OS_UNIX only). I think having this in addition to child_setup is important because it's something you usually need to do all the time. In system daemons, anyway.
Created attachment 214272 [details] [review] GMemoryInputStream: Add API to accept GBytes And s/Chunk/GBytes/ internally. GBytes is really a perfect match for GMemoryInputStream.
Created attachment 214273 [details] [review] GSubprocess: New class for spawning child processes There are a number of nice things this class brings: 0) g_subprocess_append_args() is REALLY nice to use from C, as opposed to building up an argument array all at once. 1) Operates in terms of G{Input,Output}Stream, not file descriptors 2) Sets a GError for you if the child exits with nonzero code; Unix consumers don't have to copy the bits to do the WIFEXITED/WEXITCODE dance. 3) Creates the child watch source on the thread-default main context, matching modern GIO API. 4) The g_subprocess_set_standard_output_file_path() is a very handy API that can work on both Unix and Windows (but is quite efficient on Unix). In the future: 5) Has helper functions that distinguish between binary data versus valid UTF-8.
Just a few quick notes: + new_envp = g_environ_setenv (self->child_envp, variable, value, overwrite); + g_strfreev (self->child_envp); + self->child_envp = new_envp; The g_strfreev is an invalid free, since g_environ_setenv has possibly called g_renew on the self->child_envp it was passed. And if it wasn't changed, new_envp is now invalid. Just doing self->child_envp = g_environ_setenv (self->child_envp, variable, value, overwrite); works fine. Same problem here: + new_envp = g_environ_unsetenv (self->child_envp, variable); + g_strfreev (self->child_envp); + self->child_envp = new_envp; Also, I don't like that g_subprocess_setenv() inits the child_envp is with g_get_environ() if it's NULL... I'd prefer if you'd have to do that explicitly (the API for that exists, as g_subprocess_override_environ()). g_subprocess_set_child_setup() takes a GSpawnChildSetupFunc and one user data... is it usually not necessary to refer to the GSubprocess from the child setup? Because otherwise you'll always have to create a struct with the GSubprocess and your own extra user data... +#ifdef G_OS_UNIX + if (self->stdin_path) + { + self->stdin_fd = open (self->stdin_path, O_RDONLY); All other places use g_open(), why not this one too? + self->stdout_fd = g_open (self->stdout_path, O_CREAT | O_APPEND | O_WRONLY, + 0666); + self->stderr_fd = g_open (self->stderr_path, O_CREAT | O_APPEND | O_WRONLY, + 0666); Why 0666 ?
(In reply to comment #7) > g_subprocess_set_child_setup() takes a GSpawnChildSetupFunc and one user > data... is it usually not necessary to refer to the GSubprocess from the child > setup? Because otherwise you'll always have to create a struct with the > GSubprocess and your own extra user data... It's unsafe to access the GSubprocess from the child setup function. The reason is because other threads may be holding locks (e.g. the GType lock), and calling any function, say g_subprocess_set_standard_error_to_devnull() inside there may deadlock (because that function uses G_IS_SUBPROCESS() inside a macro which calls into GType). The things you want to use the child setup function for are Unix stuff like setsid(). > Why 0666 ? Matches what _g_local_file_output_stream_append() does. Fixed everything else, thanks!
Created attachment 214361 [details] [review] GMemoryInputStream: Add API to accept GBytes Ensure we keep the len == -1 -> len = strlen(str)
Created attachment 214362 [details] [review] GMemoryOutputStream: Add API to return data as a GBytes New; complements the GMemoryInputStream changes.
Created attachment 214363 [details] [review] GSubprocess: New class for spawning child processes Rebased; lots of bug fixes, new API, more tests. Notable changes: * You can now get a stdin GOutputStream returned * There's convenience API for splicing the output buffers into provided streams (e.g. pipe subprocess standard output into a GSocket, or a GMemoryOutputStream, or a GConverterOutputStream) * More convenience API: g_subprocess_run_sync_get_stdout_utf8()
Created attachment 214397 [details] [review] GSubprocess: New class for spawning child processes Pile of updates: * Add it to the docs * Require an executable to be given for creation * bugfixes for argv0 handling * A few more tests
(In reply to comment #4) > ( > A thing I'd like (which my udisks and gvfs stuff mentioned in comment 3 above > lacks) is an extensible way to configure how the child process is being run - > for example, setting the uid or euid. > > I think having this in addition to child_setup is important because it's > something you usually need to do all the time. In system daemons, anyway. Ok...let's keep this in mind for adding later. (In reply to comment #3) > Interesting. First, here's some relevant art from udisks2 > > http://udisks.freedesktop.org/docs/latest/UDisksSpawnedJob.html Looks like GSubprocess does all of this, except we don't presently map signal numbers to strings, and right now the fallback waitpid() blocks. I think I could change that to spawn a thread or something though.
Comment on attachment 214361 [details] [review] GMemoryInputStream: Add API to accept GBytes definitely makes sense >+ bytes = g_bytes_new_with_free_func (data, len, destroy, NULL); last argument needs to be "data", not "NULL" >+ const guint8*chunk_data; missing a space
Comment on attachment 214362 [details] [review] GMemoryOutputStream: Add API to return data as a GBytes looks right
Review of attachment 214397 [details] [review]: generally looks nice ::: gio/giotypes.h @@ +144,3 @@ + * Since: 2.34 + */ +typedef struct _GSubprocess GSubprocess; although the interruptions for gtk-doc comments make it non-obvious, giotypes.h is mostly in alphabetical order, so GSubprocess does not belong between GResolver and GResource. ::: gio/gsubprocess.c @@ +59,3 @@ + gchar **child_envp; + + gboolean detached : 1; gboolean is a (signed) int, so a 1-bit field is undefined. Use guint. @@ +90,3 @@ + GPid pid; + + int exit_status; you mostly use "gint", but there are some scattered "int"s @@ +276,3 @@ + * Set the arguments to be used by the child process. This will + * overwrite the entire argument vector, including the + * #GSubprocess::executable property and any previous calls to one one colon for properties, two colons for signals @@ +277,3 @@ + * overwrite the entire argument vector, including the + * #GSubprocess::executable property and any previous calls to + * g_subprocess_append_args(). It does not overwrite a previous call to g_subprocess_set_child_argv0() though, which is weird @@ +280,3 @@ + * + * Due to limitations of gobject-introspection, the provided argument + * is annotated as an array of strings, not an array of bytestrings. Why can't we annotate them as bytestrings? And if they're strings, wouldn't type=filename be more correct than type=utf8? (Hm... gspawn describes the arguments as being "in the GLib filename encoding", but doesn't actually annotate them as such.) @@ +283,3 @@ + * + * For language bindings, you can use calls to + * g_subprocess_append_arg() indivdually to provide non-UTF8 typo (individually) @@ +288,3 @@ + * For more information, see g_subprocess_append_args_va(). + * + * It invalid to call this function after g_subprocess_start() has "it IS invalid". (likewise in 20 other places this was copied and pasted to) @@ +323,3 @@ + * + * See the discussion of %G_SPAWN_FILE_AND_ARGV_ZERO in the + * g_spawn_async_with_pipes() documentation. I don't think referring to gspawn helps here; you've already described the behavior in sufficient detail, and the way gspawn implements it (a single array containing both executable and argv) is different enough from GSubprocess that reading the docs there would just be confusing. @@ +401,3 @@ + * g_subprocess_append_args_va: + * @self: a #GSubprocess + * @args: List of %NULL-terminated arguments %NULL-terminated list of arguments @@ +415,3 @@ + * + * Note that you do not need to include %NULL itself into the argument + * array, as required by g_spawn_async_with_pipes(); it is inserted Uh? g_spawn_async_with_pipes() takes a %NULL-terminated array, and this takes a %NULL-terminated varargs list @@ +418,3 @@ + * automatically. + * + * For more information, about how arguments are processed, see the "For more information about how arguments are processed on Windows" ? @@ +448,3 @@ + * @detached: %TRUE if the child shouldn't be watched + * + * Unlike the g_spawn_async_with_pipes(), the default for #GSubprocess s/the// @@ +454,3 @@ + * If you don't plan to monitor the child status via a function like + * g_subprocess_add_watch() or g_subprocess_wait_sync(), you should + * set this flag. you *must* set this flag (since finalize() will yell at you otherwise) @@ +472,3 @@ + +/** + * g_subprocess_set_search_path: this sounds like it's used to set PATH, not to set whether or not PATH gets used @@ +477,3 @@ + * + * See the documentation for %G_SPAWN_SEARCH_PATH for details; setting + * @do_search_path to %FALSE is equivalent to not providing that flag. what's the default? Also, this is simple enough that you could just describe it here rather than referring the user to the gspawn docs @@ +499,3 @@ + * @do_leave_descriptors_open: %TRUE if file descriptors should be left open + * + * See the documentation for %G_SPAWN_LEAVE_DESCRIPTORS_OPEN for again, it would be better to just describe the behavior here. (and assume I'm saying that for most of the other following functions too...) @@ +517,3 @@ +} + +/**** Envronment control ****/ typo @@ +579,3 @@ + g_return_if_fail (variable != NULL); + + self->child_envp = g_environ_unsetenv (self->child_envp, variable); you need to initialize self->child_envp if it's NULL here too @@ +718,3 @@ + g_clear_object (&self->stdin_stream); + g_free (self->stdin_path); + self->stdin_path = NULL; g_clear_pointer! @@ +743,3 @@ + */ +void +g_subprocess_set_standard_input_unix_fd (GSubprocess *self, need #ifdef G_OS_UNIX around these @@ +1010,3 @@ + self->stdout_path = NULL; + self->stdout_to_devnull = FALSE; + self->stderr_to_stdout = FALSE; is that (stderr_to_stdout = FALSE) supposed to be there? it's inconsistent with the other set_standard_output_* functions, and not documented @@ +1057,3 @@ + * The default is for the child process to inherit the standard error + * of the current process. Specify %TRUE for it to be merged with + * standard output. (and if you also redirect standard output via g_subprocess_set_standard_output_file() or related functions, that redirection will also affect standard error) @@ +1155,3 @@ + * g_subprocess_start: + * @self: a #GSubprocess + * @cancellable: a #GCancellable which is used for... ? (likewise on other start/run methods) @@ +1161,3 @@ + * the current configuration. You must have at least initialized + * an argument vector using g_subprocess_append_args() or a related + * function. "You must have at least set #GSubprocess:executable." Further args aren't needed. (Likewise on other start/run methods.) @@ +1164,3 @@ + * + * It invalid to call this function after g_subprocess_start() has + * already been called. "to call this a second time after..." ? @@ +1165,3 @@ + * It invalid to call this function after g_subprocess_start() has + * already been called. + * Returns: %TRUE if the subprocess was started, %FALSE on error (in which case @error will be set) (likewise on the other start/run functions) @@ +1227,3 @@ + { + ecode = dup2 (a, b); + } while (ecode == -1 && errno == EINTR); should have a newline after the "}", but you could also just remove the {}s @@ +1285,3 @@ + +/** + * g_subprocess_start_with_pipes: so... would it be more consistent to have g_subprocess_set_standard_input_pipe (GSubprocess *subprocess, GOutputStream *stream); etc? And then get rid of start_with_pipes()? @@ +1300,3 @@ + * + * See the documentation for g_spawn_async_with_pipes() for more + * information about how the child process has been run. "will be run" @@ +1306,3 @@ + * e.g. g_subprocess_add_watch(), or later query its status + * synchronously via g_subprocess_wait_sync(). Failing to do so leads + * to suboptimal behavior.</warning> be concrete. "leads to zombies" @@ +1310,3 @@ + * If both input and output streams are given, using synchronous I/O + * on both sides risks deadlock. Use asynchronous I/O such as + * g_input_stream_read_async() or higher level wrappers. "using synchronous I/O from the same thread on both sides..." @@ +1313,3 @@ + * + * It invalid to call this function after + * g_subprocess_start_with_pipes() has already been called. weird that this one says "start_with_pipes" when all the other doc comments only mention "start" @@ +1367,3 @@ + if (self->stdin_path) + { + self->stdin_fd = g_open (self->stdin_path, O_RDONLY); This gets leaked. You need to close it after spawning (or on error). (Likewise the other fds) @@ +1419,3 @@ + { + g_set_error (error, G_IO_ERROR, G_IO_ERROR_NOT_SUPPORTED, + "FIXME not supported yet"); well, if you added g_subprocess_set_standard_output_stream(), that would solve this too... @@ +1434,3 @@ + && self->stderr_path == NULL); + + if (self->child_argv0) this code would be a lot simpler if self->child_argv was always the actual child argv, and you had a separate self->executable rather than a separate self->child_argv0 for the case where they're different @@ +1519,3 @@ + if (stdin_pipe_fd != -1) + { + GOutputStream *child_stdout = g_unix_output_stream_new (stdin_pipe_fd, TRUE); FIXME @@ +1562,3 @@ + * + * <warning>Due to the way GLib presently implements subprocess + * monitoring works on Unix, attempting to use e.g. kill() to send a needs grammar @@ +1576,3 @@ + g_return_val_if_fail (G_IS_SUBPROCESS (self), 0); + g_return_val_if_fail (self->state == G_SUBPROCESS_STATE_RUNNING + || self->state == G_SUBPROCESS_STATE_TERMINATED, 0); on Windows, self->pid will be a pointer to freed memory after the process is TERMINATED, so maybe this should either return 0 or g_return_if_fail in that case? @@ +1590,3 @@ + * This function is similar to using g_child_watch_add(), except it + * operates on the thread default main context. See + * g_main_context_get_thread_default(). elsewhere we say: "except it operates on <link linkend="g-main-context-push-thread-default">the thread-default main context</link>" Although also, I don't think there are any other functions that return a new GSource* where it has already been attached... there needs to be some method to return a non-attached source, in case the caller needs to attach it somewhere else @@ +1700,3 @@ + * + * Unlike the g_spawn_async_with_pipes() family of functions, this + * function by default sets an error if the child exits abnormally I don't think that's a good comparison. gspawn simply doesn't have any function that corresponds to this one @@ +1741,3 @@ + g_set_error (error, G_IO_ERROR, G_IO_ERROR_SUBPROCESS_EXIT_ABNORMAL, + _("Child process %ld exited with code %ld"), + (long) self->pid, (long) self->exit_status); Since you can't use g_subprocess_get_exit_code() from bindings (because you don't have WEXITSTATUS), this means there's no way to find out the specific exit status if it's not 0. What about: g_set_error (error, G_SUBPROCESS_EXIT_ERROR, WEXITSTATUS (self->exit_status), _("Child process %ld exited with code %ld"), (long) self->pid, (long) self->exit_status); (And should we use self->child_argv[0] instead of self->pid ?) @@ +1782,3 @@ + +/** + * g_subprocess_get_exit_code: I'd call this "exit status". "exit code" sounds to me like it means the value passed to exit(), not the value returned from waitpid(). (Although POSIX seems to consistently refer to them both as "status" even though they're different things. Thanks!) I dunno. Might be just me... @@ +1825,3 @@ + * Synchronously wait for the subprocess to terminate. This function + * will also invoke g_subprocess_query_success(), meaning that by + * default @error will be set if the subprocess exits abornmally. "abnormally" @@ +1827,3 @@ + * default @error will be set if the subprocess exits abornmally. + * + * Returns: %TRUE if child exited successfully, %FALSE otherwise %FALSE if it exited with an error or @cancellable was cancelled. Oh, except you don't actually do that. Which is presumably a bug. @@ +1850,3 @@ + loop = g_main_loop_new (context, TRUE); + + source = g_subprocess_add_watch (self, g_subprocess_on_sync_watch, loop); and then: cancellable_source = g_cancellable_source_new (cancellable); g_source_add_child_source (source, cancellable_source); g_source_unref (cancellable_source); @@ +1853,3 @@ + + g_main_loop_run (loop); + if (g_cancellable_set_error_if_cancelled (cancellable, error)) goto out; @@ +1860,3 @@ + out: + if (pushed_thread_default) + g_main_context_pop_thread_default (context); you don't actually need any of these checks here; source and context are always non-NULL, and context is always pushed @@ +1912,3 @@ + +/** + * g_subprocess_start_and_splice_async: this is really inconsistent with g_subprocess_set_standard_input_stream() (which is exactly the same thing, on the stdin side, but with a totally different API) ::: gio/gsubprocess.h @@ +36,3 @@ +#define G_IS_SUBPROCESS(o) (G_TYPE_CHECK_INSTANCE_TYPE ((o), G_TYPE_SUBPROCESS)) + +GType g_subprocess_get_type (void) G_GNUC_CONST; Should be marked GLIB_AVAILABLE_IN_2_34? (so that G_TYPE_SUBPROCESS causes appropriate warnings) Also, I guess we're still figuring out how to use these macros, but if we wanted to clean up the .h file a little bit, the only functions you really *need* to mark as GLIB_AVAILABLE_IN_2_34 are g_subprocess_get_type(), g_subprocess_new(), and g_subprocess_new_with_args(), because it's not possible to call any of the other GSubprocess methods without calling one of those first.
Just one more nit: g_subprocess_override_environ() is a bit strange name... maybe just g_subprocess_set_environ[ment] ? Also, how about adding a g_subprocess_take_environ[ment] method that takes ownership instead of dup'ing?
(In reply to comment #16) > > although the interruptions for gtk-doc comments make it non-obvious, giotypes.h > is mostly in alphabetical order, so GSubprocess does not belong between > GResolver and GResource. It looks like it's about 1/2 alphabetized, 1/2 ordered by time-of-addition-to-glib. I moved it to the end under the second scheme. > @@ +280,3 @@ > + * > + * Due to limitations of gobject-introspection, the provided argument > + * is annotated as an array of strings, not an array of bytestrings. > > Why can't we annotate them as bytestrings? We don't sanely support nesting types, and there's no shorthand type for bytestring at the moment. > And if they're strings, wouldn't > type=filename be more correct than type=utf8? I guess "filename" really is effectively treated as "bytestring" as far as introspection goes. > (Hm... gspawn describes the > arguments as being "in the GLib filename encoding", but doesn't actually > annotate them as such.) Neither does g_build_filename() for that matter. > @@ +415,3 @@ > + * > + * Note that you do not need to include %NULL itself into the argument > + * array, as required by g_spawn_async_with_pipes(); it is inserted > > Uh? g_spawn_async_with_pipes() takes a %NULL-terminated array, and this takes a > %NULL-terminated varargs list My goal with that bit was to make clear that the GSubprocess doesn't need to provide the trailing NULL as required by e.g. execve() but clearly I was unsuccessful, so let's just delete this. > @@ +1285,3 @@ > + > +/** > + * g_subprocess_start_with_pipes: > > so... would it be more consistent to have > > g_subprocess_set_standard_input_pipe (GSubprocess *subprocess, GOutputStream > *stream); > > etc? And then get rid of start_with_pipes()? That'd preclude caller-controlled I/O. I did originally have an API like: g_subprocess_request_standard_output_pipe (GSubprocess *self, GOutputStream **out_stdout), and the returned stream would be a "stub" stream that was only filled in once the process was started with the pipe fd, but it seemed too magical. > @@ +1306,3 @@ > + * e.g. g_subprocess_add_watch(), or later query its status > + * synchronously via g_subprocess_wait_sync(). Failing to do so leads > + * to suboptimal behavior.</warning> > > be concrete. "leads to zombies" Maybe. I think what I'll probably do is spawn a thread in this case to call waitpid(), and just remove the g_warning(). It'd be efficient enough; if your app is spawning processes over and over and you happen to notice the threads, you could read the docs and fix your app. > @@ +1419,3 @@ > + { > + g_set_error (error, G_IO_ERROR, G_IO_ERROR_NOT_SUPPORTED, > + "FIXME not supported yet"); > > well, if you added g_subprocess_set_standard_output_stream(), that would solve > this too... But that would imply getting rid of the _set_standard_input_file() efficiency win on Unix. > this code would be a lot simpler if self->child_argv was always the actual > child argv, and you had a separate self->executable rather than a separate > self->child_argv0 for the case where they're different Seems like it's special casing either way. > > @@ +1519,3 @@ > + if (stdin_pipe_fd != -1) > + { > + GOutputStream *child_stdout = g_unix_output_stream_new (stdin_pipe_fd, > TRUE); > > FIXME Right, I could use your advice here; should I: 1) Investigate trying to get file handles out of g_spawn() that I could use with GWin32{Input,Output}Stream 2) Write a new G*Stream that operates on a GIOChannel? 3) Something else? > @@ +1576,3 @@ > + g_return_val_if_fail (G_IS_SUBPROCESS (self), 0); > + g_return_val_if_fail (self->state == G_SUBPROCESS_STATE_RUNNING > + || self->state == G_SUBPROCESS_STATE_TERMINATED, 0); > > on Windows, self->pid will be a pointer to freed memory after the process is > TERMINATED, so maybe this should either return 0 or g_return_if_fail in that > case? Hmm, interesting. This relates to the problem of us calling waitpid() automatically on Unix. This needs more thought. For now I've just changed it to return_if_fail in that case. > Although also, I don't think there are any other functions that return a new > GSource* where it has already been attached... Do you see an issue with the precedent here? > there needs to be some method to > return a non-attached source, in case the caller needs to attach it somewhere > else Makes sense; I added a _create_watch(). > > @@ +1741,3 @@ > + g_set_error (error, G_IO_ERROR, G_IO_ERROR_SUBPROCESS_EXIT_ABNORMAL, > + _("Child process %ld exited with code %ld"), > + (long) self->pid, (long) self->exit_status); > > Since you can't use g_subprocess_get_exit_code() from bindings (because you > don't have WEXITSTATUS), this means there's no way to find out the specific > exit status if it's not 0. How about: GLIB_AVAILABLE_IN_2_34 gint g_subprocess_get_unix_exit_status (GSubprocess *self, gboolean *out_exited, gboolean *out_signaled, gint *out_code); > @@ +1860,3 @@ > + out: > + if (pushed_thread_default) > + g_main_context_pop_thread_default (context); > > you don't actually need any of these checks here; source and context are always > non-NULL, and context is always pushed Yeah, but I prefer to *always* write code in this way, so that if someone later modifies the function to insert a failable call in the middle, they don't have to add cleanup for the stuff before. > this is really inconsistent with g_subprocess_set_standard_input_stream() > (which is exactly the same thing, on the stdin side, but with a totally > different API) Well...right but as I said above, I consider this "convenience" API on top of the more fundamental API that returns streams to you, rather than you providing the streams. Forcing the caller to provide the streams gives them less control over I/O. > Also, I guess we're still figuring out how to use these macros, but if we > wanted to clean up the .h file a little bit, the only functions you really > *need* to mark as GLIB_AVAILABLE_IN_2_34 are g_subprocess_get_type(), > g_subprocess_new(), and g_subprocess_new_with_args(), because it's not possible > to call any of the other GSubprocess methods without calling one of those > first. Yeah, probably...not sure. It doesn't hurt to annotate them though either. (Everything I didn't comment on I fixed)
(In reply to comment #18) > > And if they're strings, wouldn't > > type=filename be more correct than type=utf8? > > I guess "filename" really is effectively treated as "bytestring" as far as > introspection goes. I meant that "filename" is basically assumed to be equivalent to "the encoding used by command line arguments", since if it wasn't, then you wouldn't be able to refer to some of your files from the command line. Anyway, they ought to be annotated the same as g_spawn's argv, and I believe g_spawn's documentation is correct (that they're filename encoding) and its annotations wrong. (Or maybe it's time to stop supporting people with non-UTF-8 filesystems...) > I did originally have an API like: > > g_subprocess_request_standard_output_pipe (GSubprocess *self, GOutputStream > **out_stdout) Ah, yeah, that's what I meant I guess. The way I suggested wouldn't work at all. > and the returned stream would be a "stub" stream that was only > filled in once the process was started with the pipe fd, but it seemed too > magical. You can do it non-magically; just call pipe() yourself in g_subprocess_request_standard_output_pipe() and make a GOutputStream from one side and store the other side in self->stdout_fd. (Well... on unix anyway.) > > + * to suboptimal behavior.</warning> > > > > be concrete. "leads to zombies" > > Maybe. I think what I'll probably do is spawn a thread in this case to call > waitpid(), and just remove the g_warning(). That works too. > > + g_set_error (error, G_IO_ERROR, G_IO_ERROR_NOT_SUPPORTED, > > + "FIXME not supported yet"); > > > > well, if you added g_subprocess_set_standard_output_stream(), that would solve > > this too... (I think I had originally suggested that function, earlier, but then I removed it, which is why this part reads like I'm reiterating an earlier point...) > But that would imply getting rid of the _set_standard_input_file() efficiency > win on Unix. No, don't change the input side at all. I just meant, add g_subprocess_set_standard_output_stream() ("Cause the child process's standard output to be written to the given GOutputStream"), and implement it via a second async splice at the end of g_subprocess_start_with_pipes(). And then, the Windows handler for self->stdout_path would just implement that in terms of self->stdout_stream, the same way the stdin_path handler uses stdin_stream. > > this code would be a lot simpler if self->child_argv was always the actual > > child argv, and you had a separate self->executable rather than a separate > > self->child_argv0 for the case where they're different > > Seems like it's special casing either way. Ah, right. I was thinking g_spawn worked like execv where you pass the executable name separately... > Right, I could use your advice here; should I: > > 1) Investigate trying to get file handles out of g_spawn() that I could use > with GWin32{Input,Output}Stream > 2) Write a new G*Stream that operates on a GIOChannel? > 3) Something else? A giowin32 pipe GIOChannel creates a separate thread, so #2 isn't great if we can do something better. If you had the raw fds, you could call _get_osfhandle() on them to get handles. Maybe do some glib-private.h action with gspawn-win32.c? > > Although also, I don't think there are any other functions that return a new > > GSource* where it has already been attached... > > Do you see an issue with the precedent here? As long as you have the non-attaching version, I guess it's hard to argue that providing additional convenience methods is bad... > > there needs to be some method to > > return a non-attached source, in case the caller needs to attach it somewhere > > else > > Makes sense; I added a _create_watch(). Maybe make it _create_source(), for consistency with other objects. > How about: > > GLIB_AVAILABLE_IN_2_34 > gint g_subprocess_get_unix_exit_status (GSubprocess *self, > gboolean *out_exited, > gboolean *out_signaled, > gint *out_code); sure. (Is out_code the signal number if out_signaled is TRUE?) Hm... but is that actually "unix"?
Attachment 214361 [details] pushed as 1bedf24 - GMemoryInputStream: Add API to accept GBytes Attachment 214362 [details] pushed as 44d4990 - GMemoryOutputStream: Add API to return data as a GBytes
(In reply to comment #17) > Just one more nit: g_subprocess_override_environ() is a bit strange name... > maybe just g_subprocess_set_environ[ment] ? Changed. > Also, how about adding a > g_subprocess_take_environ[ment] method that takes ownership instead of dup'ing? Hmm...the main case I see for _take() style API is when it's a major efficiency win; for example, GBytes would be pretty bad without a way to avoid a memcpy(). But in this case how often are you setting environment variables?
(In reply to comment #21) > (In reply to comment #17) > > Also, how about adding a > > g_subprocess_take_environ[ment] method that takes ownership instead of dup'ing? > > Hmm...the main case I see for _take() style API is when it's a major efficiency > win; for example, GBytes would be pretty bad without a way to avoid a memcpy(). > But in this case how often are you setting environment variables? I guess just about every time you'd use the g_subprocess_set_environ() API you've got an envp that you've just built yourself. So an extra copy seems wasteful, although of course it's not *that* bad. Just'd be a nice saving, IMHO.
(In reply to comment #19) > > You can do it non-magically; just call pipe() yourself in > g_subprocess_request_standard_output_pipe() and make a GOutputStream from one > side and store the other side in self->stdout_fd. (Well... on unix anyway.) Yeah, on Unix. > No, don't change the input side at all. I just meant, add > g_subprocess_set_standard_output_stream() ("Cause the child process's standard > output to be written to the given GOutputStream"), and implement it via a > second async splice at the end of g_subprocess_start_with_pipes(). And then, > the Windows handler for self->stdout_path would just implement that in terms of > self->stdout_stream, the same way the stdin_path handler uses stdin_stream. So...I think I want to back away a bit from wrapping splice() calls here; the main reason is that we don't have a sane way to get the caller status information (bytes transferred, errors). Right now for input splicing we return that error from the query_success() api after the child exited. But with child output arguments, the child could exit before we finish reading the output. In that case, the caller could get the status, but we'd lose our ability to propagate I/O errors. So we face the uncomfortable choice of either making the child watch mean "exited and I/O done" which is pretty gross, or losing errors. I may just remove the _set_*_file() functions for now; they do seem handy on Unix, but we can always add them later. > A giowin32 pipe GIOChannel creates a separate thread, so #2 isn't great if we > can do something better. > > If you had the raw fds, you could call _get_osfhandle() on them to get handles. > Maybe do some glib-private.h action with gspawn-win32.c? Ok. This one really needs help from someone who has access to a Windows build/test environment.
Created attachment 214599 [details] [review] GSubprocess: [rebase] Update for comments
Created attachment 214600 [details] [review] GSubprocess: Delete _set_standard_{output,error}_file_path() for now
if (self->stdin_fd >= 0) safe_dup2 (self->stdin_fd, 0); + else if (self->internal_stdin_fd >= 0) + safe_dup2 (self->internal_stdin_fd, 0); [etc] Given bug 638768, shouldn't these checks be for "!= -1" instead of >= 0 ? (Also, should safe_dup2 just return for a == b? I'm not sure dup2 is guaranteed to no-op this case...)
Created attachment 214614 [details] [review] gmain: Add private API to create Unix child watch that uses waitid() This avoids collecting the zombie child, which means that the PID can't be reused. This prevents possible race conditions that might occur were one to send e.g. SIGTERM to a child. This race condition has always existed due to the way we called waitpid() for the app, but the window was widened when we moved the waitpid() calls into a separate thread.
Created attachment 214615 [details] [review] GSubprocess: Add request_exit() and terminate() API This is now race-free when we use waitid().
Comment on attachment 214614 [details] [review] gmain: Add private API to create Unix child watch that uses waitid() From googling man pages, it looks like BSDs and OS X do not have waitid(), so you'll need a configure check and a fallback code path (which I guess will just have to be racy). Although also, it looks like FreeBSD (but not the other BSDs or OS X) accepts WNOWAIT as an option to waitpid(). >+ * The glibc bits/waitstatus.h implies that this is portable enough. >+ */ >+static gint >+waitpid_status_from_siginfo (siginfo_t *siginfo) I'm not sure if the comment in bits/waitstatus.h means it's portable to all glibc platforms, or portable to all platforms in general... maybe add: #if ((!WIFEXITED (__G_MAKE_EXITSTATUS (1, 0))) || (WIFSIGNALED (__G_MAKE_EXITSTATUS (1, 0))) || (WEXITSTATUS (__G_MAKE_EXITSTATUS (1, 0)) != 1) || (WIFEXITED (__G_MAKE_EXITSTATUS (0, 1))) || (!WIFSIGNALED (__G_MAKE_EXITSTATUS (0, 1))) || (WTERMSIG (__G_MAKE_EXITSTATUS (0, 1)) != 1)) # error "__G_MAKE_EXITSTATUS definition is wrong for platform" #endif >+ else if (siginfo->si_code == CLD_KILLED || siginfo->si_code == CLD_STOPPED) we don't pass WSTOPPED to waitid(), so we shouldn't get back CLD_STOPPED (and if we do, __G_MAKE_EXITSTATUS doesn't generate the right value). However, we probably do need to include CLD_DUMPED?
Comment on attachment 214615 [details] [review] GSubprocess: Add request_exit() and terminate() API So the way this works now is that the child process is always reaped when the GSubprocess is finalized, never sooner? OK. >+ * g_subprocess_terminate: It's weird to have the one that *doesn't* use SIGTERM have "terminate" in its name. I'd either call them "terminate" and "kill", or "request_exit" and "kill". >+#ifdef G_OS_UNIX >+ (void) kill (self->pid, SIGKILL); >+#endif Looks like on Windows you can do "forcibly terminate" via: TerminateProcess (self->pid, 1); (which causes it to exit with status "1"). Hm... and not quite related, but also, it looks like a GChildWatchSource's status on Windows is a raw exit code, not a sys/wait.h-style status. There's no API to do a "request exit". http://stackoverflow.com/questions/2055753/how-to-gracefully-terminate-a-process has some discussion of how you can implement it for GUI processes, but there's no equivalent for console/daemon processes. Maybe the two functions should just both do the same thing on Windows.
(In reply to comment #30) > (From update of attachment 214615 [details] [review]) > So the way this works now is that the child process is always reaped when the > GSubprocess is finalized, never sooner? OK. Yeah. The way to think of this is that if we'd had waitid()/WNOWAIT from the start, we could have made g_spawn_close_pid() meaningful on Unix too (it'd do the dispatch-waitpid-to-helper-thread dance). Now Unix-with-WNOWAIT behaves similarly to Windows; the GPid is valid as long as the GSubprocess exists. > >+ * g_subprocess_terminate: > > It's weird to have the one that *doesn't* use SIGTERM have "terminate" in its > name. I'd either call them "terminate" and "kill", or "request_exit" and > "kill". I actually was modeling these mentally on the Windows TerminateProcess API. I've always thought the unix SIGTERM/SIGKILL names were bad. But you're right that it'd be confusing for Unix people... how about "force_exit"? > TerminateProcess (self->pid, 1); > > (which causes it to exit with status "1"). Yeah, we're already using this in gtestdbus.c (which really wants to use GSubprocess). > Hm... and not quite related, but > also, it looks like a GChildWatchSource's status on Windows is a raw exit code, > not a sys/wait.h-style status. Right; the waitid() stuff only applies for G_OS_UNIX. > There's no API to do a "request exit". > http://stackoverflow.com/questions/2055753/how-to-gracefully-terminate-a-process > has some discussion of how you can implement it for GUI processes, but there's > no equivalent for console/daemon processes. Maybe the two functions should just > both do the same thing on Windows. I changed the _request_exit() API return a boolean that says whether or not it's supported. This would allow people to write higher level API that does e.g. "if _request_exit() { g_sleep(5) }; _force_exit()".
Created attachment 214703 [details] [review] gmain: Add private API to create Unix child watch that uses waitid() Add configure check for waitid(), cleanly fall back if not available
Created attachment 214704 [details] [review] gio: Add private API to create win32 streams from fds This will be used by GSubprocess.
Created attachment 214705 [details] [review] GSubprocess: New class for spawning child processes Now works on Windows.
Review of attachment 214705 [details] [review]: ::: gio/gsubprocess.c @@ +150,3 @@ + if (self->state > G_SUBPROCESS_STATE_BUILDING + && !self->detached + && !self->reaped_child) glib style is to put && at the end of the line ::: gio/gsubprocess.h @@ +232,3 @@ +void g_subprocess_force_exit (GSubprocess *self); + +/**** High level wrapers ****/ wrappers missing a p
Created attachment 214813 [details] [review] glib-compile-resources: Port to GSubprocess
Created attachment 214814 [details] [review] GSubprocess: Port gsettings test
Review of attachment 214705 [details] [review]: ::: gio/gsubprocess.c @@ +174,3 @@ + +static void +g_subprocess_set_property (GObject *object, There could be a lot more properties here, for language bindings where you can something like; gio.Subprocess(executable=..., stdin_bytes=glib.Bytes("foobar"), stdout_stream=stream, env={'PATH': 'foo' }, cwd="...') Which would be really nice for js/python. ::: gio/gsubprocess.h @@ +96,3 @@ + +GLIB_AVAILABLE_IN_2_34 +void g_subprocess_set_environment (GSubprocess *self, Having a variant for bindings that takes a GHashTable would be nice. @@ +125,3 @@ + +GLIB_AVAILABLE_IN_2_34 +void g_subprocess_set_standard_input_to_devnull (GSubprocess *self, Unix specific. g_subprocess_standard_input_silence() might be better. @@ +141,3 @@ + +GLIB_AVAILABLE_IN_2_34 +void g_subprocess_set_standard_output_to_devnull (GSubprocess *self, Ditto, see above. @@ +151,3 @@ + +GLIB_AVAILABLE_IN_2_34 +void g_subprocess_set_standard_error_to_devnull (GSubprocess *self, Ditto, see above.
(In reply to comment #38) > gio.Subprocess(executable=..., > stdin_bytes=glib.Bytes("foobar"), > stdout_stream=stream, > env={'PATH': 'foo' }, > cwd="...') > > Which would be really nice for js/python. Yeah...I can add more properties. > ::: gio/gsubprocess.h > @@ +96,3 @@ > + > +GLIB_AVAILABLE_IN_2_34 > +void g_subprocess_set_environment (GSubprocess *self, > > Having a variant for bindings that takes a GHashTable would be nice. Ok, sure. g_subprocess_set_environment_hash()? > @@ +125,3 @@ > + > +GLIB_AVAILABLE_IN_2_34 > +void g_subprocess_set_standard_input_to_devnull (GSubprocess > *self, > > Unix specific. g_subprocess_standard_input_silence() might be better. Well, this matches the existing G_SPAWN_STDOUT_TO_DEV_NULL. One of the fundamental decisions I made with GSubprocess is that as much as possible, it should just be a wrapper around g_spawn_async_with_pipes(). That this wrapping "leaks" is intentional; there's lots of documentation for g_spawn_async_with_pipes(), and lots and lots of code out there using it. (The reason I don't just take GSpawnFlags is I do need to control a few of them internally, like G_SPAWN_DO_NOT_REAP_CHILD).
+ if (source == NULL) + { + source = g_child_watch_source_new (self->pid); + trampoline_data->have_wnowait = FALSE; + } + else + { + trampoline_data->have_wnowait = TRUE; + } + g_source_set_priority (source, priority); + trampoline_data = g_new (GSubprocessWatchTrampolineData, 1); Need to move the g_new up before the |if|.
(In reply to comment #40) > Need to move the g_new up before the |if|. Thanks, fixed: http://git.gnome.org/browse/glib/commit/?h=wip/gsubprocess&id=f111cf0d76496dd011d5f20a1b6652be2259af7c
(In reply to comment #39) > (In reply to comment #38) > > > gio.Subprocess(executable=..., > > stdin_bytes=glib.Bytes("foobar"), > > stdout_stream=stream, > > env={'PATH': 'foo' }, > > cwd="...') > > > > Which would be really nice for js/python. > > Yeah...I can add more properties. Following up on this...the code difference between: var proc = new Gio.Subprocess({'executable': '/usr/bin/myprogram'}); proc.set_standard_input_bytes("foobar"); proc.set_environment(["PATH=/foo:/bar", "OTHERENV=42"]); proc.set_working_directory("/some/dir"); is pretty small compared to: var proc = new Gio.Subprocess({'executable': '/usr/bin/myprogram', 'standard-input': new Gio.MemoryInputStream(new GLib.Bytes("foobar")), 'environment': ["PATH=/foo:/bar", "OTHERENV=42"], 'working-directory': "/some/dir"}); Most of the character count compression in your example comes from the fact that you chose the most abbreviated names possible for properties ("stdin", "cwd", "env"). If we used compressed names for properties, but the long names for methods, that'd be weird. I think it'd be better to aim for GSubprocess being a reliable "done right" base interface upon which higher level forms can be composed. So concretely, if while you're writing your app you find yourself constantly spawning different programs in alternative directories, it's pretty trivial to write a: GSubprocess * my_subprocess_new_in_dir (const char *executable, const char *cwd); wrapper. And even easier in language bindings. So again my goal isn't exactly to make spawning processes *easy* - it's to make *doing it wrong hard*. That's why e.g. the high level wrapper for "get output as a string" also validates as UTF-8. We can always add more convenience API later.
Some feedback just on the API: 1) The myriad ways for adding arguments is a bit distracting and yet it misses the two methods that I'd have found most useful to have: - from string (with shell-style parsing). I can understand the objection to this on security basis, but it is quite a good option for constant strings. I'd expect a constructor for this, I guess... - append printf-formatted argument In general, I'd drop both of the "append args" functions (both the varargs and the va_list one) as well as the separate argv0 setter and maybe add a shell-parsing constructor and a 'append arg printf'. Even the 'set argv' is sort of dubious since I think there is not much use for undoing appends of arguments you already added but I agree with the principle of turning some of these things into properties, so I'd expect to see both C accessors there as a matter of completeness. 2) 'detached' evokes visions of 'joinable' on our old thread API. 3) It's a bit sad to see the old 'for i from 3 to inf: close (i)' logic here. Are we not far enough along with O_CLOEXEC that we can stop doing this? 4) Environment should be a strv property and have a getter. set_environment() docs should point the reader at g_environ_*(). 5) Make cwd a property and add a getter. 6) Child setup functions are evil and not very useful. The only use I can see for one is the case where we don't do the 'close ALL the descriptors!' thing and someone else feels that they need to do so. 7) Setting io-priority as a property of a class is not something done anywhere else. The convention is to provide _full() versions of various APIs that accept an additional io_priority parameter, if needed. I'd say the real convention, though, is not to worry about it at all. :) 8) Too many (weirdly overlapping) ways of setting stdin. to_devnull() should be called from_devnull() but even then it's very confusing since it is actually meaning "do not inherit" stdin (when you consider the interaction with the other stdin options). I suggest that the default is that we have stdin from /dev/null with no API to set it to inherit. We can have a few setters for establishing other sources (although I wish you'd standardise on one thing like a GInputStream). Then we can have a separate API for "run, block and attach stdio" as a sane way of consuming an inherited stdin. Failing that, I'd actually be happy to force the user to wire through the stdin of their own process manually. That actually seems more reasonable to me than to assume that it should be automatically inherited. 9) add_watch() functions return integer source ids, not GSource* 10) No exit code on the prototype of your source func? This is different than oldschool child watches. 11) wait_sync should have an async variant if you want to use that instead of the source. In fact, we should just get rid of the source as a public API and have people use the async API for waiting. I think people understand how to use async APIs more than they know what to do with a GSource* and we also have better language support for them. 12) The run-sync-and-get-output functions are weird. First, having 'utf8' in the name of the one that returns a string is a bit odd. Second, why does this one not return stderr as well? 13) Exit status handling is a bit odd. I don't like _query_ in the name of functions unless they are going to do some work. _is_ or _was_ could be better ('get_was_successful'?). get_unix_exit_status() is in the header but missing from the .c file. Maybe a better way to deal with this would be to have a single API to return the exit status (ie: what main() returned) on each platform and a separate unix-only API to query the 'status code' from wait() instead of having a globally-available-but-without-uniform-semantics API. 14) Some docs refer to a now-nonexistent get_exit_code().
(In reply to comment #43) > Some feedback just on the API: > > 1) The myriad ways for adding arguments is a bit distracting and yet it misses > the two methods that I'd have found most useful to have: > > - from string (with shell-style parsing). I can understand the objection > to this on security basis, Yeah, I deliberately left it out. > but it is quite a good option for constant > strings. I'd expect a constructor for this, I guess... Can you give me an example use case? The varargs _append() has been "good enough" in my experience. Look at the code that I've ported inside GLib, I think it works out quite well enough without needing to parse strings. > - append printf-formatted argument Yeah, I can see the use case. > In general, I'd drop both of the "append args" functions (both the varargs > and the va_list one) In favor of parsing strings? That makes it very easy to have applications that don't handle filenames with spaces/quotes, etc. > as well as the separate argv0 setter What would replace that? > and maybe add > a shell-parsing constructor I'm pretty opposed to parsing shell syntax; enough that I'd rather people have to do it explicitly via pairing g_shell_parse_argv() and g_subprocess_append_args(). > Even the 'set argv' is sort of dubious since I think there is not much use > for undoing appends of arguments you already added but I agree with the > principle of turning some of these things into properties, so I'd expect > to see both C accessors there as a matter of completeness. Can you elaborate on this? C getters for all of the properties? > 2) 'detached' evokes visions of 'joinable' on our old thread API. Yeah, though I may just drop this one, now that we auto-collect by default, and there's no penalty to not setting detached. > 3) It's a bit sad to see the old 'for i from 3 to inf: close (i)' logic here. > Are we not far enough along with O_CLOEXEC that we can stop doing this? Hmm? I don't have that anywhere in GSubprocess. Do you mean inside gspawn? In that case...I don't think we can ever drop it because we have backwards compatibility concerns; apps could rely on us closing fds. A separate bug though. > 4) Environment should be a strv property and have a getter. set_environment() > docs should point the reader at g_environ_*(). Ok. > 5) Make cwd a property and add a getter. Ok. > 6) Child setup functions are evil and not very useful. It's mostly the Unix stuff like setsid(). Or setting up master/slave ptys, or doing Linux-specific stuff like prctl(PR_SET_PDEATHSIG). There's quite a wide variety of stuff here. > 7) Setting io-priority as a property of a class is not something done anywhere > else. The convention is to provide _full() versions of various APIs that > accept an additional io_priority parameter, if needed. Hmm...it is weird I admit. So maybe set_input_stream() should just take an io priority? > I'd say the real convention, though, is not to worry about it at all. :) Heh. > 8) Too many (weirdly overlapping) ways of setting stdin. to_devnull() should > be called from_devnull() but even then it's very confusing since it is > actually meaning "do not inherit" stdin (when you consider the interaction > with the other stdin options). > > I suggest that the default is that we have stdin from /dev/null with > no API to set it to inherit. We can have a few setters for establishing > other sources (although I wish you'd standardise on one thing like a > GInputStream). Then we can have a separate API for "run, block and > attach stdio" as a sane way of consuming an inherited stdin. > > Failing that, I'd actually be happy to force the user to wire through > the stdin of their own process manually. That actually seems more > reasonable to me than to assume that it should be automatically inherited. > > 9) add_watch() functions return integer source ids, not GSource* The major difference here is that the source attaches to the thread-default main context. I could perhaps have a version like g_child_watch_source_new() that uses the GLib default of NULL context plus integer source ids. > 10) No exit code on the prototype of your source func? This is different than > oldschool child watches. Right...the idea is to discourage the user from accessing it directly, because it has platform-specific semantics, and you really want to call _query_success(). > 11) wait_sync should have an async variant if you want to use that instead of > the source. In fact, we should just get rid of the source as a public > API and have people use the async API for waiting. > > I think people understand how to use async APIs more than they know what > to do with a GSource* and we also have better language support for them. This is an interesting discussion. > > 12) The run-sync-and-get-output functions are weird. First, having 'utf8' in > the name of the one that returns a string is a bit odd. Second, why does > this one not return stderr as well? Well, because it's utf8 =) It's unfortunate that a lot of other I/O APIs that return a "string" like g_data_input_stream_read_line() don't verify UTF-8, because the general model we have in GLib is to verify data coming in. It can return stderr if you merge the streams using g_subprocess_set_stderr_to_stdout(), and the docs actually mention this. > 13) Exit status handling is a bit odd. I don't like _query_ in the name of > functions unless they are going to do some work. _is_ or _was_ could be > better ('get_was_successful'?). get_unix_exit_status() is in the header > but missing from the .c file. Hmm. get_was_successful() doesn't seem better to me, but I'm willing to consider alternative names. > Maybe a better way to deal with this would be to have a single API to > return the exit status (ie: what main() returned) on each platform and > a separate unix-only API to query the 'status code' from wait() instead of > having a globally-available-but-without-uniform-semantics API. I thought about that, but the cases where you want to inspect the exit code are quite rare in my experience. (It should be available of course) Let me try going through and fixing up the things I agree with, and I'll post a new comment with stuff we should iterate on.
Ok, I've pushed some updates based on your comments: http://git.gnome.org/browse/glib/log/?h=wip/gsubprocess Let me list your comments that I didn't address: 1) Shell input arguments: You're going to have to try very hard to convince me this is a good idea. Concrete examples of cases where it's noticeably easier than using append_args() would help a lot, and I honestly doubt it exists. I've seen real-world code parsing shell that was exploitable or just broken, and I want to make that not the path of least resistance. 2) Dropping argv0 setter: I don't know how this could be done without just defaulting to it on, which would be *really* confusing for people who have used g_spawn_(). Particularly since it only matters on Unix, and that only very rarely. 3) wait_sync should have an async variant: There are pros and cons here, but what leans me away from an async operation is simply that the operation isn't *doing* anything conceptually, it's just waiting for an event. 4) get_was_successful() vs query_success(): Dunno, if you feel the first is really better we can flip it...the "get_was_" part just feels awkward/wordy to me. 5) 'detached' evokes visions of 'joinable': No strong opinion.
(In reply to comment #45) > Ok, I've pushed some updates based on your comments: > > http://git.gnome.org/browse/glib/log/?h=wip/gsubprocess > > Let me list your comments that I didn't address: > > 1) Shell input arguments: You're going to have to try very hard to convince me > this is a good idea. Concrete examples of cases where it's noticeably easier > than using append_args() would help a lot, and I honestly doubt it exists. I've been doing a bit of data gathering across the GNOME codebase. First, there are use cases where append_args() would involve only a few more characters to quote strings individually. Example: http://git.gnome.org/browse/nautilus/tree/src/nautilus-application.c?id=217185b485edcd8983d44500b2a69bf20992ed78#n276 Then there's the cases that tediously build an argument array, which are MUCH better using append_args() instead: http://git.gnome.org/browse/nautilus/tree/libnautilus-private/nautilus-file-utilities.c?id=217185b485edcd8983d44500b2a69bf20992ed78#n264 (And the case I fixed inside GLib itself) Then there's stuff like this, which is an excellent illustration of just how difficult it is to safely quote shell fragment strings: http://git.gnome.org/browse/gnome-desktop/tree/libgnome-desktop/gnome-desktop-thumbnail.c?id=97ab041ef5fb120d152cd3d3d192f848b459961c#n1159 The sane way to do this code would be to create an argument array. So far, no code that really wants shell syntax.
Ryan, what would help me a lot here is a concrete *positive* suggestions. In IRC discussion, the discussion seemed to boil down to "this whole thing needs redesign". But that doesn't help me figure out how to make it *better*. One case I'm optimizing for is code like this: http://git.gnome.org/browse/glib/commit/?h=wip/gsubprocess&id=c39b089b937bf7ed5d27e3e43928b698591a5f97 The list of things discussed: 1) Split into GSubprocessBuilder/GSubprocess (modeling on GSocketClient/GSocketConnection?) The bonus to this I guess is that each individual class is easier to understand, but supposing we did that, the downside it would double lines of code for the above example. It's true that there are restrictions in GSubprocess as it is now, like that it's invalid to call e.g. g_subprocess_get_pid() before you've called g_subprocess_start(). Studying the GSocketClient split, it appears one reason this split makes sense here is because there are many different API calls to go from a GSocketClient to GSocketConnect: g_socket_client_connect(), g_socket_client_connect_to_host(), g_socket_client_connect_to_service(), g_socket_client_connect_to_uri(), etc. But this doesn't really hold for GSubprocess...there's really only one way to start a process. 2) GStringList - Not opposed exactly, but this has the same problem as 1) - it's another data structure you'd have to create and free, which would double lines of code. (If we did *both* 1) and 2), the example above would go from 3 lines to 9). So...positive suggestions for moving this forward?
[[engage braindump mode. incoherent ramblings ahead.]] I see GSubprocess as a running or just-recently-dead subprocess (ie: something on which it always makes sense to call get_pid()). This is, indeed, quite similar to GSocketConnection which is a connected or just-recently-disconnected socket. imho, it would always do IO via GIO streams, but we could have convenience API for dealing with shunting those streams to/from GBytes, utf8 strings, etc. Maybe we skip the convenience API and rather make sure that all of these things are possible via the generic IO stream API instead. Creating it would either be done via a GSubprocessLauncher or via a GInitable sort of interface (ie: g_subprocess_new() would actually spawn the thing, modulo failability). Perhaps we could have a mix: direct creation for the 'simple' cases and a launcher context class for advanced options. I don't want to see a string-list class folded into this. If we don't like GPtrArray enough then we should either fix it or create something better that is suited specifically for strings. The API for the argument list should either be this new class or (ideally) just a gchar**. For static or near-static argument lists (which is probably fairly common) the native C type is by far the easiest thing anyway. We can have varargs too, I guess... The default constructor would probably take only the argv (in whatever form). Maybe others would take envp. Maybe that's getting into the launch-context territory. Maybe the interaction between the launch context and the subprocess looks less like the launch context creating the subprocess but the subprocess constructor taking the launch context as an argument (implying reusability). GSubprocess * g_subprocess_new (GError **error, const gchar *argv0, ...); GSubprocess * g_subprocess_new_full (GSubprocessContext *context, GError **error, const gchar *argv0, ...); GSubprocess * g_subprocess_new_strv (const gchar * const *argv, GError **error,); GSubprocess * g_subprocess_new_full_strv (const gchar * const *argv, GSubprocessContext *context, GError **error); ya... those varargs ones look ugly. I think they're slightly pointless too when it's so easy to construct an array: { const gchar *argv[] = { args, go, here }; g_subprocess_new (argv); } So maybe we just have that? Outside of the streams, the most important operation on GSubprocess is waitpid(). This should be exposed as a normal GIO-style async call (probably also with a sync variant). Using your example: { const gchar *args[] = { gdk_pixbuf_pixdata, real_file, tmp_file2, NULL }; subprocess = g_subprocess_new (args, error); if (!subprocess) goto cleanup; /* it is running now */ #ifdef we_have_convenience_funcs g_subprocess_stderr_to_devnull(subprocess); g_subprocess_stdout_to_devnull(subprocess); #else g_input_stream_to_devnull (g_subprocess_get_stderr (subprocess)); g_input_stream_to_devnull (g_subprocess_get_stdout (subprocess)); #endif /* could block if the process tries to read stdin? */ g_output_stream_close (g_subprocess_get_stdin (subprocess)); /* eof */ if (!g_subprocess_wait_sync (subprocess, error)) goto cleanup; cleanup: g_object_unref (subprocess); /* did your code require this, or no? */ }
(In reply to comment #48) > I see GSubprocess as a running or just-recently-dead subprocess (ie: something > on which it always makes sense to call get_pid()). This is, indeed, quite > similar to GSocketConnection which is a connected or just-recently-disconnected > socket. (This may or may not be relevant to GSubprocess, but...) that's not true. There's no high-level API that will return an unconnected GSocketConnection, but internally, GSocketClient now creates the GSocketConnection first, and then calls g_socket_connection_connect{,_async} on it. > I don't want to see a string-list class folded into this. If we don't like > GPtrArray enough then we should either fix it or create something better that > is suited specifically for strings. The API for the argument list should > either be this new class or (ideally) just a gchar**. The API should be whatever results in the best combination of simplicity and non-bugginess for its users. > Using your example: > > { > const gchar *args[] = { gdk_pixbuf_pixdata, real_file, tmp_file2, NULL }; That's not valid C90. Maybe we need to start not caring about that, but...
(In reply to comment #49) > > > { > > const gchar *args[] = { gdk_pixbuf_pixdata, real_file, tmp_file2, NULL }; > > That's not valid C90. Maybe we need to start not caring about that, but... Yeah, and it's not just the C90 compatibility, it's that it's annoying to dynamically compute arguments that way. Because you have to reserve slots, then fill them in: GSubprocess *proc; const gchar *args[] = { NULL, NULL }; args[0] = g_strdup_printf ("--path=%s", some_path); proc = g_subprocess_new (args); g_free (args[0]); It's tedious and painful. (In reply to comment #48) > > I don't want to see a string-list class folded into this. Well, it's not a whole class for string lists. The _append_args() is just one of *many* possible options to configure before launching. You need to figure out how to expose all of these knobs, and note most of them really need to be set *before* launching, as in: > #ifdef we_have_convenience_funcs > g_subprocess_stderr_to_devnull(subprocess); > g_subprocess_stdout_to_devnull(subprocess); > #else > g_input_stream_to_devnull (g_subprocess_get_stderr (subprocess)); > g_input_stream_to_devnull (g_subprocess_get_stdout (subprocess)); > #endif I really hope you're not suggesting that we default to getting the subprocess output as a pipe, then implement /dev/null inside the launching process by reading and discarding it.... To implement what you're talking about, we'd have to have some weird in between "running" state. Because we can't give you something valid out of g_subprocess_get_pid() until we've actually run the thing, but we can't set up the child fds 0/1/2 to point to /dev/null directy after running...
> GStringList Unless the introspection situation is worked out, I'm -1 on that. I do not want to do: let args = new GLib.StringList(); args.append_strings(['unzip', '-d', ...]); let proc = new GLib.SubProcess({ arguments: args });
Would still really like to get this one in. The Util.spawn() calls in gnome-shell for example are just lame.
Making this depend on bug 679691 - I'll rebase once that lands.
Created attachment 218473 [details] [review] gmain: Add private API to create Unix child watch that uses waitid() Rebased
Created attachment 218474 [details] [review] gio: Add private API to create win32 streams from fds Rebased
Created attachment 218475 [details] [review] GSubprocess: New class for spawning child processes * Rebased on top of g_spawn_check_exit_status(); the GSubprocess API g_subprocess_check_exit_status() now just wraps that, and we consistently use "exit status" as a term in variable names etc. * Dropped the _set_standard_input_stream() and _set_standard_input_file_path() for now. They are convenient for some users, but it muddled the conceptual clarity of GSubprocess being "just" a g_spawn_async_with_pipes() except using GIO streams, and it was confusing that _query_success() checked both the process exit status *and* whether or not we wrote everything to stdin. Users can implement helper functions on top of course.
Created attachment 218476 [details] [review] glib-compile-resources: Port to GSubprocess Rebased
Created attachment 218477 [details] [review] GSubprocess: Port gsettings test Rebased
Review of attachment 218475 [details] [review]: ::: gio/gsubprocess.c @@ +597,3 @@ + * Note that on Unix, this option is completely independent of file + * descriptors which have the close-on-exec flag set. In other words, + * if you set @do_leave_descriptors_open to %FALSE, descriptors which You mean %TRUE here, I assume ? @@ +669,3 @@ + * + * It is invalid to call this function after g_subprocess_start() has + * been called. Should probably copy the sentence from the setenv doc that talks about taking a snapshot. Also, missing Since: tag. @@ +806,3 @@ + * g_subprocess_set_standard_input_inherit: + * @self: a #GSubprocess + * @inherit: If %TRUE, redirect input from null stream, if %FALSE, inherit Whats 'null stream' ? @@ +836,3 @@ + self->stdin_fd = -1; + + self->stdin_inherit = inherit; This looks odd. Why set stdin_inherit twice ? @@ +867,3 @@ + self->stdout_fd = -1; + + self->stdout_to_devnull = to_devnull; Same comment here. I get what you are trying to do, but I find it more confusing than helpful. @@ +984,3 @@ + * + * It is invalid to call this function after g_subprocess_start() has + * been called. I think these sentences should say "after the #GSubProcess has been started" - it is e.g. also invalid to call after g_subprocess_start_with_pipes, or other variants. @@ +1016,3 @@ + * already been called. + * + * Returns: %TRUE if subprocess was started, %FALSE on error (and @error will be set) Should be: * Returns: ... * * Since: ... @@ +1043,3 @@ + * + * Since: 2.34 + * Returns: %TRUE if subprocess was executed successfully, %FALSE on error (and @error will be set) Put Since: after Returns: @@ +1154,3 @@ + * @out_stdin_stream: (out) (allow-none): Return location for standard input pipe + * @out_stdout_stream: (out) (allow-none): Return location for standard output pipe + * @out_stderr_stream: (out) (allow-none): Return location for standard error pipe I find it inconsistent to give the arguments names like 'stream' but then documenting them as 'pipe'. Better to say 'stream' in both places, in particular since your text further down talks about 'input and output streams' as well. Same goes for the function name, I guess. @@ +1392,3 @@ + * + * This function creates a source via g_subprocess_create_source() and + * attaches it the to the %NULL main context. The %NULL main context ?! That sounds dubious. 'default' maybe ? @@ +1632,3 @@ + * been called. + * + * Returns: %TRUE if the operation is supported, %FALSE otherwise. Seems that it also returns %TRUE if the process has already terminated, regardless whether the operation is supported... Also, missing Since: tag.
After reviewing the patch in some depth for the first time, I find myself agreeing with some of the api criticism from up-bug. - I agree with your horror of shell parsing - lets just not do it - The varargs set_args variants do seem somewhat unnecessary - Having a getter for things that should be properties would be good - I'm with Ryan on 'detached' - should find a different name or just do away with it, if there's no penalty - Not supporting child setup seems a small price to pay for not having to explain all the pitfalls and weird restrictions that such a function operates under - g_subprocess_set_standard_output_to_devnull might be nicer as g_subprocess_set_standard_output_discard ? Redoing this as gio-style async api instead of sources that you have to manually deal with sounds very attractive to me.
(In reply to comment #60) > Redoing this as gio-style async api instead of sources that you have to > manually deal with sounds very attractive to me. I looked into this, but I'm struggling with how to deal with the streams. Basically, when you start a process successfully, you get streams. If you didn't start it, you don't get streams. Trying to have some API to return streams to the user *before* the process is started would require weird magic, and trying to return streams *after* is nonsensical. So we could try to do it in the "middle": void g_subprocess_run (GSubprocess *self, GOutputStream **out_stdin_stream, GInputStream **out_stdout_stream, GInputStream **out_stderr_stream, GAsyncReadyCallback callback, gpointer user_data, GError **error); This is really weird though because the async starter can throw an exception (unlike any other function in GIO that I know of). Basically any nontrivial GSubprocess user is going to need to work on it in all 3 states (BUILDING, RUNNING, TERMINATED). If the GAsyncReadyCallback happens in RUNNING, then do we have a separate API call like g_subprocess_wait()? And if the GAsyncReadyCallback is for TERMINATED, then how do you configure the streams?)
Created attachment 218592 [details] [review] GSubprocess: New class for spawning child processes This rebase includes various updates for review comments, and most importantly removes the GSource API - waiting for process completion is now an async op.
Created attachment 218593 [details] [review] glib-compile-resources: Port to GSubprocess Rebased
Created attachment 218594 [details] [review] GSubprocess: Port gsettings test Rebased
As mentioned a couple of times earlier, I'd feel a lot better if: a) we didn't have stringlist-building functionality built in b) a GSubprocess represented a running or recently deceased subprocess; not the potential to spawn a future subprocess Ignoring the problem of how we get the list of strings for the time being (which I don't consider to be a substantial problem due to the existence of gchar*[] and GPtrArray) I'd imagine seeing something like: GSubprocess * g_subprocess_new (const gchar * const *argv, const gchar * const *envp, GError **error); This would actually create the subprocess (in exactly the way that g_thread_new() immediately starts the new thread). If that failed then we'd see NULL returned with error set accordingly. The issue of pipes would be handled quite easily: in the event that the process was not created, we have no GSubprocess and therefore no way to ask about the pipes (which don't exist). In the event that the process was created, we have pipes (or, to the user, G{Input,Output}Streams), accessible on the API. For more exotic uses (like requiring control over cwd, post-fork() hooks, etc) I'd imagine we have a reusable GSubprocessContext class. It could also store environment variables, which might be a handy abstraction. Then we'd have GSubprocess * g_subprocess_context_spawn (const gchar * const *argv, GError **error); as a way of spawning 'complicated' cases of subprocesses. I'm not sure that even this would be required, though...
That gets dangerously close to GAppLaunchContext
(In reply to comment #65) > In the event that the process was created, we have > pipes (or, to the user, G{Input,Output}Streams), accessible on the API. No - we need the ability to redirect input or output to /dev/null *before* spawning. To repeat from https://bugzilla.gnome.org/show_bug.cgi?id=672102#c50 I really hope you're not suggesting that we default to getting the subprocess output as a pipe, then implement /dev/null inside the launching process by reading and discarding it.... > For more exotic uses (like requiring control over cwd, post-fork() hooks, etc) How is control over cwd "exotic"? > I'd imagine we have a reusable GSubprocessContext class. It could also store > environment variables, which might be a handy abstraction. Then we'd have > > GSubprocess * g_subprocess_context_spawn (const gchar * const *argv, > GError **error); I assume the GSubprocessContext * class was meant to be in the argument list there? But I dunno...what's the ***reason*** why you want a separate context class? You think it's simpler to...implement? Understand? It's certainly not fewer lines of code. You'd be adding at this rate two new things to memory manage for C users (a GPtrArray for arguments ,and a context object). Those issues pale though besides the issue of the pipe setup.
Indeed, I was thinking that you'd either close() the fd (for the case of stdin, you want it to read EOF) or splice it to /dev/null -- in both cases, after the fact. Maybe you prefer a flags field for doing that up front *shrug*. I don't imagine that the performance of directing the console output of a spawned command to /dev/null is a pressing concern... If you think cwd is important it could be part of the arguments to _new(). And yes.. indeed, I meant to include a GSubprocessContext* argument.
(In reply to comment #65) > a) we didn't have stringlist-building functionality built in I don't understand what this means. (In reply to comment #65) > b) a GSubprocess represented a running or recently deceased subprocess; not > the potential to spawn a future subprocess Why do we care about the distinction? I don't think it's a very likely scenario where we want to keep a context around right after we launch the subprocess. Compare: static GSubprocess * launch_my_helper (void) { GSubprocessContext *context = g_subprocess_context_new (); GSubprocess *subprocess; g_subprocess_context_set_argv (context, "/usr/libexec/a-special-helper", "--force", "--replace", "--debug"); g_subprocess_context_set_stdout (context, G_SUBPROCESS_DEV_NULL); g_subprocess_context_set_stderr (context, G_SUBPROCESS_DEV_NULL); subprocess = g_subprocess_context_launch (context); g_object_unref (context); return subprocess; } vs. static GSubprocess * launch_my_helper (void) { GSubprocess *proc = g_subprocess_new (); g_subprocess_set_argv (proc, "/usr/libexec/a-special-helper"); g_subprocess_set_stdout (proc, G_SUBPROCESS_DEV_NULL); g_subprocess_set_stderr (proc, G_SUBPROCESS_DEV_NULL); g_subprocess_launch (proc); return proc; } What does the context buy us? Are we optimizing for the case where we're launching a lot of the same exact process lots of times? A GSubprocessFactory?
(In reply to comment #68) > Maybe you prefer a flags field for doing that up front *shrug*. I don't > imagine that the performance of directing the console output of a spawned > command to /dev/null is a pressing concern... Ugh...no. I can't believe you're even suggesting it. It just makes my stomach twist to even think of putting name on code that does that. And besides the performance problem, the other issue is - what happens when the parent exits before the child? All of a sudden the child would get EPIPE. Not ok. If you have a proposed function signature that includes pipe redirections, we can discuss it.
enum GSubprocessFlags { G_SUBPROCESS_FLAGS_NONE = 0, G_SUBPROCESS_FLAGS_STDIN_EOF = (1u << 0), G_SUBPROCESS_FLAGS_INHERIT_STDIN = (1u << 1), G_SUBPROCESS_FLAGS_STDOUT_TO_DEV_NULL = (1u << 2), G_SUBPROCESS_FLAGS_STDERR_TO_DEV_NULL = (1u << 3) }; GSubprocess * g_subprocess_new (const gchar *cwd, const gchar * const *argv, const gchar * const *envp, GSubprocessFlags flags, GError **error);
s/STD(...)_TO_DEV_NULL/SILENCE_STD\1/
Now that I think about it, it seems that the reasonable defaults are: - EOF for stdin - pass-through for stderr - for stdout pipe (from the argument of this process producing output for us to consume) or pass-through (from the argument of consistency with stderr). so we'd actually have flags along the lines of: - inherit stdin, stdin from pipe - silence stderr, stderr to pipe - silence stdout, stdout to pipe (or stdout to stdout, if we decide this way)
Ok so some background: I am writing some build software, and I'm in the process of switching it to gjs, so I can smoke some of the crack I'm selling and thus be sure it's quality. What I immediately ran into was that I really want g_subprocess_set_standard_output_file(). The build master program will run other programs as children, logging their output to files, and monitoring the log files for interesting things with GFileMonitor. I *could* just splice every child myself, but...as discussed before, not a fan =) Python's subprocess.py supports this in a hacky way - you can do e.g.: f = open('/tmp/foo', 'w') proc = subprocess.Popen(['/bin/sed', ...], stdin=otherfile, stdout=f) and it will attempt to extract the native file descriptor from the file object. Now, we *could* do something similar by saying that for local files, the streams returned from g_file_read() or whatever will be accepted by GSubprocess, but it's gross. Note it's actually impossible to implement g_subprocess_set_standard_output_file() in gjs becuase it has to be done inside the GSpawnChildSetupFunc, and gjs really can't support fork() (because the runtime uses threads). Where I'm getting at with this is that it's very likely we're going to want additional API beyond whatever subset we add initially to g_subprocess_new(). So that leads us to g_subprocess_new_full() but... =( I don't like much the idea of a whole new GObject which is just a bag of subprocess options. It's a style thing I guess... TL;DR: I find Python's subprocess API extremely practical and useful, and feel my original API proposals here were a good GObject-using take on it. You can think of the BUILDING state as replicating keyword arguments for C (and no, raw g_object_new() is not the same since there's no type checking).
On the disposition of stdin/stdout/stderr: one thing to note is that a good default depends on whether you're running a program sync or async. If you're running sync, it's quite reasonable to have all 3 streams be the same as the parent (notably including stdin). If you're running async, you probably want either /dev/null, pipes, or files. Does that lead us to g_subprocess_new_sync() and g_subprocess_new_async()?
(In reply to comment #74) > Ok so some background: I am writing some build software, and I'm in the process > of switching it to gjs, so I can smoke some of the crack I'm selling and thus > be sure it's quality. > > What I immediately ran into was that I really want > g_subprocess_set_standard_output_file(). The build master program will run > other programs as children, logging their output to files, and monitoring the > log files for interesting things with GFileMonitor. Is there something wrong with adding it, and making it take a GFile *? We can do some interesting things with this: * Instead of g_subprocess_set_standard_out_to_dev_null (), we can simply do g_subprocess_set_standard_output (g_file_new_for_path ("/dev/null"));, and detect that case (we could even add a special G_FILE_DEV_NULL that has that path encoded). * We can mark that we want to output a stream by passing a special GFile * marker. Maybe a little bit magic, but I like the stdout=subprocess.PIPE approached by Python, which is the same thing. > Where I'm getting at with this is that it's very likely we're going to want > additional API beyond whatever subset we add initially to g_subprocess_new(). > So that leads us to g_subprocess_new_full() but... =( I don't see how, to be honest. > I don't like much the idea of a whole new GObject which is just a bag of > subprocess options. It's a style thing I guess... I don't either. I don't see what it buys us, other than API complexities. > TL;DR: I find Python's subprocess API extremely practical and useful, and feel > my original API proposals here were a good GObject-using take on it. You can > think of the BUILDING state as replicating keyword arguments for C (and no, raw > g_object_new() is not the same since there's no type checking). There's a bit of that, but it's runtime type checking.
(In reply to comment #73) > - inherit stdin, stdin from pipe > - silence stderr, stderr to pipe > - silence stdout, stdout to pipe (or stdout to stdout, if we decide this way) Long story short, http://git.gnome.org/browse/glib/log/?h=wip/gsubprocess-ostreams has an entirely new GSubprocess API proposal. There's no BUILDING state, which has some advantages, like we can be a GInitable. The disadvantage though is that it's not fewer lines of code anymore since we have to allocate a GPtrArray... So while I think the new design has a lot of conceptual cleanliness, it does suffer a fair amount in terms of lines of code for each user.
We do we stand on this ?
Ryan was doing some work on updating my last branch to a new API. Ryan, can you upload the code somewhere and summarize the status?
Created attachment 220913 [details] [review] GSubprocess: make it build with i686-pc-mingw32 (mingw.org) (In reply to comment #77) > http://git.gnome.org/browse/glib/log/?h=wip/gsubprocess-ostreams > > has an entirely new GSubprocess API proposal. Had a go at this today in the hopes of it being useful for gtester (which is unix only atm) with i686-pc-mingw32 (mingw.org MinGW+MSYS) and had to apply this attached patch. There seems to be something weird going on though, almost like something got opened in text mode instead of binary mode: /c/dev/gnome.org/gnome-windows/checkout/glib/git/src/gio/tests $ ./gsubprocess /gsubprocess/noop: OK /gsubprocess/noop-all-to-null: OK /gsubprocess/noop-no-wait: OK /gsubprocess/noop-stdin-inherit: OK /gsubprocess/exit1: OK /gsubprocess/echo1: ** ERROR:gsubprocess.c:236:test_echo1: assertion failed (result == "hello\nworld!\n"): ("hello\r\nworld!\r\n" == "hello\nworld!\n") This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information.
Ryan, can I get a ping on this?
For anybody still curious, the current status is at: http://git.gnome.org/browse/glib/log/?h=wip/gsubprocess-desrt After comments from Ryan at the Boston Summit. Why he didn't update the bug I'll never know...
There are two issues I see with the current gsubprocess-desrt. First, the GSubprocessFlags ends up being a mess. The disposition flags for each of stdin/stdout/stderr actively conflict; it never makes sense to specify multiple; so they're more like enums than flag members. They're just one property for the GObject, so that level is fine. Second, not exposing a child setup func is really problematic for a number of nontrivial users. Trying to cram everything we could do a GSpawnChildSetupFunc into flags just isn't going to work. It probably would be nice to capture some of the high level use cases, but how would we handle e.g. pseudoterminal calls like grantpt()? On the other hand, people probably don't know just how tricky writing one is. A quick grep shows e.g. brasero calling into GType from the child setup, meaning it can deadlock. Other users of child setup, from a quick grep: evolution-data-server: Calls setsid() gcr: Unsets FD_CLOEXEC from some fds gdm: Lots of stuff: signal handling, prctl, setuid... gnome-desktop: Carries an ancient copy of egg with egg_spawn gnome-user-share: Uses setexeccon() as a workaround for SELinux policy NetworkManager: setpgid(), signal handling tracker: setrlimit() for CPU limits vte: Lots of magic around pseudoterminals, signal handling, and proxies a further child setup API to vte users webkitgtk-tarball-releases: prctl(), setenv(LC_ALL) It's kind of lame to introduce a fancy new GSubprocess API that just can't do what more demanding process users need.
(In reply to comment #83) > Second, not exposing a child setup func is really problematic for a number of > nontrivial users. Trying to cram everything we could do a GSpawnChildSetupFunc > into flags just isn't going to work. It probably would be nice to capture some > of the high level use cases, but how would we handle e.g. pseudoterminal calls > like grantpt()? I'd like to use setrlimit in a couple of places from a child setup func as well, so this API isn't as good for me either.
Another obvious use of child_setup() is changing identity - including setting the audit_uid to make the audit trail convey that this or that action is done on behalf of another uid. This is often used in system daemons, especially when servicing unprivileged users.
Ok, http://git.gnome.org/browse/glib/log/?h=wip/gsubprocess-2.36 has a few new commits that do three things broadly: 1) Add child-setup-func and child-setup-data properties 2) Rename g_subprocess_new to g_subprocess_new_simple, and drop the "cwd" and "env" arguments. It turned out all of the tests were passing NULL for both. 3) Add some tests for passing a GFile and GFileDescriptorBased to GSubprocess
The GPtrArray stuff is a bit disappointing for C code. Why are we using that instead of using C99 initializers? char *args = { xmllint, "--nonet", "--noblanks", "--output", tmp_file, real_file, NULL }; GSubprocess *proc = g_subprocess_new_simple (args, 0, error);
I would find having a closure as a GObject property to be in poor taste. I find having two separate properties as pointers to a callback and user_data to be downright objectionable...
(In reply to comment #88) > I would find having a closure as a GObject property to be in poor taste. I > find having two separate properties as pointers to a callback and user_data to > be downright objectionable... Both are certainly a bit ugly, and without precedent (AFAICS) in Gio. While it would make sense to push back on G_TYPE_POINTER in other Gio APIs, the child setup is special. What do you suggest? I could do a closure property, and have a wrapper C API to make it less annoying for C users. Actually, a closure property isn't doable, because that could end up calling into the gtype machinery from the child setup, which could deadlock if another thread in the the parent happened to be holding the GType lock.
(In reply to comment #87) > The GPtrArray stuff is a bit disappointing for C code. Why are we using that > instead of using C99 initializers? > > char *args = { xmllint, "--nonet", "--noblanks", "--output", tmp_file, > real_file, NULL }; > GSubprocess *proc = g_subprocess_new_simple (args, 0, error); You *can* do that in a GSubprocess-using codebase, nothing stops you. But GLib needs to build in C89 mode for MSVC.
uh, C89 has array initializers...
(In reply to comment #90) > But GLib > needs to build in C89 mode for MSVC. http://blogs.gnome.org/rbultje/2012/09/27/microsoft-visual-studio-support-in-ffmpeg-and-libav/
(In reply to comment #89) > (In reply to comment #88) > > I would find having a closure as a GObject property to be in poor taste. I > > find having two separate properties as pointers to a callback and user_data to > > be downright objectionable... > > Both are certainly a bit ugly, and without precedent (AFAICS) in Gio. While it > would make sense to push back on G_TYPE_POINTER in other Gio APIs, the child > setup is special. > > What do you suggest? I could do a closure property, and have a wrapper C API > to make it less annoying for C users. So unless someone has any better ideas, I'll continue working on this with the pointer pair for child setup.
Created attachment 228644 [details] [review] gmain: Add private API to create Unix child watch that uses waitid() Rebased on master
Created attachment 228645 [details] [review] gio: Add private API to create win32 streams from fds No changes, just rebased
Created attachment 228646 [details] [review] gspawn: support creating pipes with O_CLOEXEC Add a new flag, G_SPAWN_CLOEXEC_PIPES, for creating the stdin/out/err pipes with O_CLOEXEC (for the usual reasons).
Created attachment 228647 [details] [review] GSubprocess: New class for spawning child processes There are a number of nice things this class brings: 0) On Unix, if WNOWAIT is available, has race-free termination API (we don't reap the child until the GSubprocess is finalized, so the GPid is always valid) 1) Operates in terms of G{Input,Output}Stream, not file descriptors 2) Async API instead of GSource 3) Makes some simple cases easy, like synchronously spawning a process with an argument list 4) Makes hard cases possible, like asynchronously running a process with stdout/stderr merged, output directly to a file path Much rewriting and code review from Ryan Lortie <desrt@desrt.ca>
Ok, so now we have a GSubprocessContext class that contains all the advanced options. There are also currently two "simple" functions: g_subprocess_new_simple_arg{l,v}. It feels pretty clean. But the API is less convenient to use for the medium-complexity cases, and less convenient for bindings. Although the context class was a win for spawning multiple of the same process...not sure how often that happens in reality though. I haven't added docs or updated the symbol list intentionally, since it's tedious work, but when this is looking like the final API, I'll finish that up.
Created attachment 228653 [details] [review] [rebase] Fix G_SPAWN_FILE_AND_ARGV_ZERO I can't think of a way to easily test this in the test suite, and doing some manual code self-review I noticed it was broken.
Comment on attachment 228644 [details] [review] gmain: Add private API to create Unix child watch that uses waitid() >+typedef enum { >+ _G_CHILD_WATCH_FLAGS_NONE = 0, >+ _G_CHILD_WATCH_FLAGS_WNOWAIT = (1 << 0) >+} _GChildWatchFlags; This is private ABI; we don't need to make it nicely extensible, because we can just change it completely later if we want something else. Just have g_child_watch_source_new_nowait(). > GSource * > g_child_watch_source_new (GPid pid) > { >+ return g_child_watch_source_new_with_flags (pid, 0); > } Hm... although that doesn't let you do this, and it seems like to avoid a race, you can't just have g_child_watch_source_new_nowait() call g_child_watch_source_new() and then edit it. You'd need a g_child_watch_source_new_unlocked() call that they both used... (Anyway, even if you did keep it like this, we still don't need a flags argument. It could just be "gboolean nowait").
Comment on attachment 228645 [details] [review] gio: Add private API to create win32 streams from fds >+G_GNUC_INTERNAL >+GInputStream * >+g_win32_input_stream_new_from_fd (gint fd, >+ gboolean close_fd); >+ >+GOutputStream * >+g_win32_output_stream_new_from_fd (gint fd, I assume it's a mistake that one is G_GNUC_INTERAL and the other isn't? >+ if (close (win32_stream->priv->fd) < 0) >+ { >+ g_set_error_literal (error, G_IO_ERROR, >+ g_io_error_from_errno (errno), >+ g_strerror (errno)); you need to errsv the errno (in both input and output)
Comment on attachment 228646 [details] [review] gspawn: support creating pipes with O_CLOEXEC >+ G_SPAWN_CLOEXEC_PIPES = 1 << 8 need to add that to the gtk-docs above as well. But otherwise looks good.
Attachment 228644 [details] pushed as 93bf37c - gmain: Add private API to create Unix child watch that uses waitid() Attachment 228645 [details] pushed as 292de8c - gio: Add private API to create win32 streams from fds Attachment 228646 [details] pushed as 2054cca - gspawn: support creating pipes with O_CLOEXEC
Created attachment 229004 [details] [review] GSubprocess: New class for spawning child processes Rebased to master, updated gio.symbols and gio-sections.txt Much rewriting and code review from Ryan Lortie <desrt@desrt.ca>
Review of attachment 228647 [details] [review]: Generally seems nice. Would need to see it in use in the real world to have a sense of whether the GSubprocessContext change is a win or not. Although the simplified API seems to mostly work for glib-compile-resources, so that's good... ::: gio/glib-compile-resources.c @@ +295,3 @@ { + int fd; + GSubprocess *proc; ugh. i was going to complain about mixed tabs and spaces, but the file is already totally inconsistent. still, mclasen prefers spaces for new code @@ -334,3 @@ - { - g_set_error (error, G_IO_ERROR, G_IO_ERROR_FAILED, - _("Error processing input file with xmllint:\n%s"), stderr_child); The new code loses the more detailed error messages of the original. ::: gio/gsubprocess.c @@ +210,3 @@ + worker_context = GLIB_PRIVATE_CALL (g_get_worker_context) (); + waitpid_source = g_child_watch_source_new (self->pid); + g_source_set_callback (waitpid_source, g_subprocess_unix_waitpid_dummy, NULL, NULL); g_source_set_dummy_callback (waitpid_source); @@ +320,3 @@ + + if (g_cancellable_set_error_if_cancelled (cancellable, error)) + return FALSE; you also should error out if self->context is NULL @@ +414,3 @@ + self->context->envp, + spawn_flags, + child_setup, &child_data, you can't use that child_setup() function on Windows, since it would get run in the calling process, not the child... gspawn has gspawn-win32-helper to deal with this stuff. You probably need to use that as well. @@ +441,3 @@ +/** + * g_subprocess_new: + * missing @context, @error ok, actually there are 2 more bugs just in this one doc comment, so I'm going to ignore the rest since you still haven't written most of the docs anyway @@ +449,3 @@ + * Since: 2.36 + */ +GLIB_AVAILABLE_IN_2_36 you only need that in the .h file @@ +470,3 @@ + * + * On some Unix versions, it is possible for there to be a race + * condition where waitpid() may have been called to collect the child the patches in bug 687061 could help to avoid that... @@ +598,3 @@ + GSource *cancellable_source; + + data->cancellable = g_object_ref (cancellable); you leak this, but more generically, you should just use GTask :) @@ +601,3 @@ + + cancellable_source = g_cancellable_source_new (cancellable); + g_source_add_child_source (source, cancellable_source); need to call "g_source_set_dummy_callback (cancellable_source);" (this gotcha brought to you by the separation of libglib and libgobject) hm... or maybe we should just change gmain.c to not require a callback to be set for child sources... @@ +700,3 @@ + context = g_main_context_new (); + g_main_context_push_thread_default (context); + pushed_thread_default = TRUE; you don't actually need pushed_thread_default. you can just key the pop off context != NULL ::: gio/gsubprocesscontext.c @@ +140,3 @@ + * Array of arguments passed to child process; must have at least + * one element. The first element has special handling - if it is + * an not absolute path ( as determined by g_path_is_absolute() ), s/an not/not an/ ::: gio/gsubprocesscontext.h @@ +46,3 @@ +/* Argument control */ +GLIB_AVAILABLE_IN_2_36 +void g_subprocess_context_set_args (GSubprocessContext *self, the spacing in the header files is generally wacky. eg, the spaces here between GSubprocessContext and *self don't serve to align it with anything else...
note that that's a review of the older patch, not the one you attached in between my reviews
(In reply to comment #105) > Review of attachment 228647 [details] [review]: > > Generally seems nice. Would need to see it in use in the real world Right...this is the kind of high level feedback I need now before I go the effort of dotting the "i"s and crossing the "t"s in the docs and such. So I dunno...just imagine using the API or something =) Actually, let me attach an (untested) gnome-shell patch which illustrates how much less gross it is.
Created attachment 229008 [details] [review] untested demonstration gnome-shell patch
(In reply to comment #107) > Right...this is the kind of high level feedback I need now before I go the > effort of dotting the "i"s and crossing the "t"s in the docs and such. > > So I dunno...just imagine using the API or something =) > > Actually, let me attach an (untested) gnome-shell patch which illustrates how > much less gross it is. I'd love to see the API applied to the code in gnome-settings-daemon, 21 different uses of g_spawn*(), including some (limited) stdout/stderr capture for the pk helpers.
Yeah, I'm sure it's much better than direct g_spawn, I was just saying I didn't know if it was better than the original patch with no GSubprocessContext
(In reply to comment #109) > (In reply to comment #107) > > Right...this is the kind of high level feedback I need now before I go the > > effort of dotting the "i"s and crossing the "t"s in the docs and such. > > > > So I dunno...just imagine using the API or something =) > > > > Actually, let me attach an (untested) gnome-shell patch which illustrates how > > much less gross it is. > > I'd love to see the API applied to the code in gnome-settings-daemon, 21 > different uses of g_spawn*(), including some (limited) stdout/stderr capture > for the pk helpers. I stuck a preparatory patch in https://bugzilla.gnome.org/show_bug.cgi?id=688354 A sample diff (also untested) will be attached here.
Created attachment 229015 [details] [review] untested gnome-settings-daemon patch Note...g_subprocess_context_newl is sitting in my local git repo.
Oops, the g-s-d patch also needs: g_subprocess_context_set_cwd (ctx, g_get_home_dir()) Though I doubt it matters since almost certainly the g-s-d cwd will be $HOME anyways.
(In reply to comment #103) > Attachment 228644 [details] pushed as 93bf37c - gmain: Add private API to create Unix > child watch that uses waitid() > Attachment 228645 [details] pushed as 292de8c - gio: Add private API to create win32 > streams from fds > Attachment 228646 [details] pushed as 2054cca - gspawn: support creating pipes with > O_CLOEXEC These three patches were causing at least two bugs (both spotted by Mitch on macos). They've been reverted for now, until we can figure out the issue.
Created attachment 229094 [details] [review] gio: Add private API to create win32 streams from fds No changes, just rebased on master
Created attachment 229095 [details] [review] gspawn: support creating pipes with O_CLOEXEC No changes, just rebased
Created attachment 229096 [details] [review] GSubprocess: New class for spawning child processes * Uses new GLib private API g_main_send_signal() to race-free send a signal to a child process * Includes g_subprocess_context_new{a,v} for convenience of C users. While we originally had g_ptr_array_addv(), I'd like to avoid having three separate allocations.
Created attachment 229097 [details] [review] glib-compile-resources: Port to GSubprocess It is cleaner code.
Review of attachment 229097 [details] [review]: ::: gio/glib-compile-resources.c @@ +295,3 @@ { + int fd; + GSubprocess *proc; You've got tab characters here instead of spaces (and others below).
Note I still didn't fix most of the style issues from https://bugzilla.gnome.org/show_bug.cgi?id=672102#c105 and https://bugzilla.gnome.org/show_bug.cgi?id=672102#c119 - I will, but if we decide to e.g. rename GSubprocessContext, move some things into object properties, or whatever, I want to do that first...
Still waiting for feedback, but in the meantime: http://git.gnome.org/browse/libgsystem/commit/?id=c17376d4acbddfa1909d24167d3ce24531b3db1a I already fixed one bug there found in real-world usage (stdout/stderr defaulted to NULL, not INHERIT).
(In reply to comment #114) > These three patches were causing at least two bugs (both spotted by Mitch on > macos). They've been reverted for now, until we can figure out the issue. might be from bug 690069
I've been working on GSubprocess the past couple of days. I finally got past API nit-picks and now I find some problems with implementation. The waiting is racy with respect to cancellation. If you cancel a wait that's in progress just as the child comes in, it will be reaped but we will still signal a cancellation (and possibly lose the exit status forever). This is an extension of a general problem with child watch sources: it is not safe to cancel a child watch because it may be signalled to dispatch but not yet have been check/dispatched (and there's no way to check this case from outside). So imagine the case where we create a GSubprocess, wait_async() it, cancel that wait, then finalize the subprocess (which is not difficult to imagine as a scenario). It's not safe to cancel the non-reaping first watch and reschedule it a second time, although we could handle those two cases separately: 1) Where WNOWAIT is handled: it's actually safe to cancel the watch in this case because we don't reap the process, so it's still there for us to reap later. 2) Where WNOWAIT is not handled, our original watch will reap the child, so we do not need to schedule a second watch to reap it, but we will need to keep the first one around... This means that in the second case we could have a GSource* that needs to persist after wait_async() is cancelled. OK. This is starting to get a bit weird... I think a better approach might be to add a dedicated API for dealing with this as a glib__private__. Or rather, a pair: 1) an API to register an (ideally) non-reaping watch and report it back to the GSubprocess from the glib worker thread (so that it can be absolutely synchronised with the call to waitpid()) 2) an API to tell gmain that the GSubprocess is going away and we should either reap the process if it's already exited or leave it in the to-be-reaped list but without a callback. Another interesting possibility: we could stop trying to call waitid(). waitid() only gets us race-free signal delivery in the case that we support it. We could just hold the unix signal lock and delivery the signal from gmain.c in the case that the child has not yet been reaped (or just deliver it from the worker thread). I recall some patches to this effect being discussed at some other time but I don't remember what happened there... In that case, we could simplify things quite a lot: when we create the GSubprocess we immediately register it with gmain and let gmain take a reference. When the child exits, gmain tells the GSubprocess (from the worker thread, in case the user does not call GSubprocess from a mainloop of their own) and then drops its reference. This means that instead of having the (possibly zombie) process existing for the lifetime of the GSubprocess (in the case where we have waitid) we turn the story around: the GSubprocess exists (possibly with the only ref held by gmain.c) for as long as the process exists. I kinda like the idea that the pid exists (in zombie state) for as long as GSubprocess is around. I don't like the fact that this complicates things so much and is not guaranteed to work on all systems... I'm still considering which is best....
Couldn't GSubprocess just create a GChildWatchSource right away when it spawns the process, and keep it around until the process exits or the GSubprocess is destroyed, and then g_subprocess_wait() would just use that rather than creating a new one, and if you cancel the wait, then it just stops watching the source, but leaves it around. > I kinda like the idea that the pid exists (in zombie state) for as long as > GSubprocess is around. I don't like the fact that this complicates things so > much and is not guaranteed to work on all systems... I guess another possibility would be that you could just have g_subprocess_get_pid(), which will return -1 after the process has exited, and any code that needs to avoid being racy can use that. (Although also, the fact that the process is guaranteed to be a zombie until the GSubprocess is finalized means that zombies may persist arbitrarily long when using a garbage collected language...)
(In reply to comment #124) > Couldn't GSubprocess just create a GChildWatchSource right away when it spawns > the process, and keep it around until the process exits or the GSubprocess is > destroyed, and then g_subprocess_wait() would just use that rather than > creating a new one, and if you cancel the wait, then it just stops watching the > source, but leaves it around. Ya... That's what I was thinking along the lines of... The dispatching on completion could just be done via idles to the main context on which the async call was done... > I guess another possibility would be that you could just have > g_subprocess_get_pid(), which will return -1 after the process has exited, and > any code that needs to avoid being racy can use that. Ya. I was thinking that this would be a given. We need to do this at least on systems that don't do WNOWAIT... > (Although also, the fact that the process is guaranteed to be a zombie until > the GSubprocess is finalized means that zombies may persist arbitrarily long > when using a garbage collected language...) This is a pretty good argument... Something hanging around in 'ps' for an arbitrarily long time is a pretty user-visible effect...
Okay. I forget whose idea it was but I remember the way to do race-free signal delivery now: send an idle to the glib worker thread and from that idle make sure we didn't get notified of the terminating child yet, only killing it if it is still alive. Also: send the original watch via the worker thread as well. Now we have strict sequence. The idle would only have to be lower priority than the child watch and we're guaranteed that if waitpid() gets called (in the glib worker thread) then we will first see the child notification sent to GSubprocess and then see our own request to kill it. We can easily work it out based on that.
http://git.gnome.org/browse/glib/commit/?h=wip/subprocess-2013&id=d8913abf3b05f701da53cbe39f661a68ffb4d217 seems like a lot of contortions to go back to global flags instead of per-stream disposition, but I still think per-stream is a nicer API. I do like the Context -> Launcher change though. For: http://git.gnome.org/browse/glib/commit/?h=wip/subprocess-2013&id=61156955e688e0350ed48da6c3dc199e0f52d2e2 * We go back to defaulting to not closing FDs. I would really love to be able to make that change, but I think it is going to bite nontrivial users badly in the ass when they blindly port from g_spawn_async() to GSubprocess and end up leaking file descriptors into child processes. It's a really non-obvious semantic change. Think about projects like NetworkManager that get fds from random shitty libraries like libnl, pygobject programs that pull in random shitty python C extension libraries. Maybe we can just document it very obviously and go with the change. * g_source_set_callback (source, (GSourceFunc) g_subprocess_exited, g_object_ref (self), g_object_unref); Kind of odd that the source holds a strong ref, I'd expect it to be weak. * +g_subprocess_dispatch_signal Looks like you need some ifdefs G_OS_UNIX here. * +static GSList *subprocess_watches; I am confused...is there a patch missing to glib-private.[ch] or something? I don't see what modifies this list.
(In reply to comment #127) > http://git.gnome.org/browse/glib/commit/?h=wip/subprocess-2013&id=d8913abf3b05f701da53cbe39f661a68ffb4d217 > > seems like a lot of contortions to go back to global flags instead of > per-stream disposition, but I still think per-stream is a nicer API. I do like > the Context -> Launcher change though. Discussed this on IRC. Flags are pretty much always a win in terms of lines of code and clarity when reading code calling GSubprocess APIs even if they do introduce some theoretical conceptual problems (ie: what does it mean if we set both STDOUT_PIPE and STDOUT_SILENCE?) The tradeoff favours keeping flags. > For: > http://git.gnome.org/browse/glib/commit/?h=wip/subprocess-2013&id=61156955e688e0350ed48da6c3dc199e0f52d2e2 > > * We go back to defaulting to not closing FDs. I would really love to be able > to make that change, but I think it is going to bite nontrivial users badly in > the ass when they blindly port from g_spawn_async() to GSubprocess and end up > leaking file descriptors into child processes. It's a really non-obvious > semantic change. Think about projects like NetworkManager that get fds from > random shitty libraries like libnl, pygobject programs that pull in random > shitty python C extension libraries. > > Maybe we can just document it very obviously and go with the change. I will do this, also suggesting the use of a child setup function if it's a problem. If this really turns into a bad situation then we can add a flag for it. I think the default should be to assume that libraries are well-behaved, though, even if this definition of 'good behaviour' is relatively new. > * g_source_set_callback (source, (GSourceFunc) g_subprocess_exited, > g_object_ref (self), g_object_unref); > Kind of odd that the source holds a strong ref, I'd expect it to be weak. Ya... I take a strong reference here because it simplifies the state tracking of GSubprocess. I wrote a comment about that. In reality, though, a GWeakRef would actually work fairly well here, I guess... Probably it would mean that we want to take an extra ref when doing async operations, then. At least that way we can be sure the list of pending watches is empty on finalize. > * +g_subprocess_dispatch_signal > Looks like you need some ifdefs G_OS_UNIX here. I thought I did. Will look again. > * +static GSList *subprocess_watches; > > I am confused...is there a patch missing to glib-private.[ch] or something? I > don't see what modifies this list. This was from some earlier experiments with trying to keep the waitid() approach before I came to the conclusion that it would be much easier to just entirely avoid modifying gmain.c and switch to the new approach. I'll nuke this part when I do my cleanup rebase.
(In reply to comment #128) > > * We go back to defaulting to not closing FDs... > > > > Maybe we can just document it very obviously and go with the change. > > I will do this, also suggesting the use of a child setup function if it's a > problem. > > If this really turns into a bad situation then we can add a flag for it. I > think the default should be to assume that libraries are well-behaved, though, > even if this definition of 'good behaviour' is relatively new. add this to .gdbinit: set $F_GETFD = 1 set $FD_CLOEXEC = 1 define nocloexec set $fd = 1 while $fd < 255 set $flags = fcntl ($fd, $F_GETFD) if $flags != -1 && !($flags & $FD_CLOEXEC) printf "fd %d is not CLOEXEC\n", $fd end set $fd = $fd + 1 end end then attach to random processes and type "nocloexec". Looks like there's still a lot of bad behavior. (Also, looks like I have a GNetworkMonitor bug to fix.) And anyway, it seems like it's very rare that you'd actually want the behavior "leave all fds open in the child no matter where they came from". You'd generally only want a specific set of fds that you'd set up to be passed on. So why not make the callers specify the fds to keep open, and close all the rest?
(In reply to comment #129) > > then attach to random processes and type "nocloexec". Looks like there's still > a lot of bad behavior. One thing to keep in mind is it's not just libraries or programs that are forgetting to set O_CLOEXEC. In a multithreaded library like GLib, it's libraries and programs that aren't doing it *atomically*. The set of programs that call plain old unix pipe() instead of g_unix_open_pipe(FD_CLOEXEC) or equivalent internal wrapper is likely quite huge. For example, I just downloaded the Python-2.7.3 release tarball, and there in Modules/posixmodule.c is racy legacy code that pairs pipe() with fcntl(.., FD_CLOEXEC). And think about stuff like PAM modules which get linked into the GDM process. > And anyway, it seems like it's very rare that you'd actually want the behavior > "leave all fds open in the child no matter where they came from". You'd > generally only want a specific set of fds that you'd set up to be passed on. So > why not make the callers specify the fds to keep open, and close all the rest? Yeah, that's basically what G_SPAWN_LEAVE_DESCRIPTORS_OPEN is at a low level - we could add an explicit g_subprocess_keep_fd() API that would do what you're saying more nicely though.
Besides Python, if you just use some basic "git grep" around GNOME, you'll find stuff like: https://bugzilla.gnome.org/show_bug.cgi?id=692330 Oh and raw pipe() calls aside, everyone invoking g_spawn_async_with_pipes will need to update their calls to include G_SPAWN_CLOEXEC_PIPES which they can't even do because it's not in glib yet. GLib being the low level library that it is that gets reused in a wide variety of situations, I think strongly argues for conservatism, particularly since no one has shown the cost of the internal fcntl(FD_CLOEXEC) loop to be expensive in a real-world workload. Consider all the work that simply invoking another process involves in the dynamic linker...
I really don't like workarounds for problems. What if we came up with some clever method of using GSubprocess to yell at people who are doing the wrong thing? Dan's discovery above was a good thing, and it wouldn't have happened unless we were having this discussion now... I'd rather make it easier for people to find and fix these problems than to continue to work around them... The only compelling argument I take from the above is that it may be impossible to do this atomically on non-Linux platforms and someone could be opening an fd in another thread. That's a pretty thin race, but a race none-the-less, and with no simple fix.
(In reply to comment #132) > I really don't like workarounds for problems. Keep in mind that while the core leaking problem has existed forever, it became significantly worse with the introduction of threads. It's quite possible that when e.g. that Python code was written, threads didn't exist. Ultimately, if we have an explicit interface to retain fds, then I wouldn't call the post-fork FD_CLOEXEC loop a workaround, just the necessary way to implement things on Unix.
(In reply to comment #133) > Ultimately, if we have an explicit interface to retain fds, then I wouldn't > call the post-fork FD_CLOEXEC loop a workaround, just the necessary way to > implement things on Unix. I start to see it this way... But why do we bother at all with CLOEXEC, then?
(In reply to comment #134) > (In reply to comment #133) > > Ultimately, if we have an explicit interface to retain fds, then I wouldn't > > call the post-fork FD_CLOEXEC loop a workaround, just the necessary way to > > implement things on Unix. > > I start to see it this way... > > But why do we bother at all with CLOEXEC, then? To unbreak software that does plain fork()+exec(), like Firefox.
When doing further work using my old GSubprocess branch, I've run into the issue that I want to open a "nonstandard" pipe between the parent and child - e.g. a pipe where the child gets fd 3 (or 42, whatever), and I pass --with-password-fd=3 to tell it which one to use. Prior art here is for things like qemu's -net tap,fd= which should be a fd for a host tap device. gpg has --status-fd for receiving structured output, etc. The problem is that I can't easily make it a method on GSubprocessLauncher because after that method makes a pipe, I get the fd number, and then need to append it as an argument. But I can't do that because the launcher's argv is immutable. Only options I can think of: 1) Add gs_subprocess_launcher_keep_fd(), and create the pipe externally with g_unix_open_pipe(). The problems with this are that it's unnecessarily Unix specific, but worse g_unix_open_pipe() isn't really introspectable because it uses the raw FD_CLOEXEC flag, and I don't know what that is from gjs. One cooler option I'd like to add is one to make a socketpair, so you have one fd that's both readable and writable. On Windows there's some equivalent of this. If we have a a proper abstraction the same method could work on Unix and Windows. 2) Add gs_subprocess_argv_append(), more generally allow mutation of argv. This is probably what I'll do, since after all you can change everything else too.
(In reply to comment #136) > 1) Add gs_subprocess_launcher_keep_fd(), and create the pipe externally with > g_unix_open_pipe(). The problems with this are that it's unnecessarily Unix > specific Yeah, but in the Windows case you'd probably really rather have gs_subprocess_launcher_keep_handle() anyway since that's more generic/idiomatic. > but worse g_unix_open_pipe() isn't really introspectable because it > uses the raw FD_CLOEXEC flag, and I don't know what that is from gjs. Seems like that needs to be fixed anyway. (GLIB_CHECK_VALUE() -> GLIB_SYSDEF_FD_CLOEXEC -> G_UNIX_FD_CLOEXEC) > One cooler option I'd like to add is one to make a socketpair, so you have one > fd that's both readable and writable. On Windows there's some equivalent of > this. If we have a a proper abstraction the same method could work on Unix and > Windows. Hm... In the parent: iostream = g_subprocess_launcher_create_ipc_stream (launcher, &stream_id); argv[1] = "--with-password-fd"; argv[2] = stream_id; g_subprocess_launcher_set_args (launcher, argv) and in the child: iostream = g_ipc_io_stream_new_from_id (argv[2]); ? (stream_id would be g_strdup_printf ("%d", fd) on unix and g_strdup_printf ("%p", handle) on windows.)
Created attachment 249428 [details] [review] GSubprocess: New class for spawning child processes Latest changes, and rebased to master again.
Review of attachment 249428 [details] [review]: Basic preliminary review. I'd like Colin to take a look at this too. ::: gio/gsubprocess.c @@ +437,3 @@ + + s = snprintf (self->identifier, sizeof self->identifier, "%"G_GUINT64_FORMAT, identifier); + g_assert (0 < s && s < sizeof self->identifier); I believe sizeof needs to have parens for MSVC quirks. @@ +526,3 @@ + */ +GSubprocess * +g_subprocess_new (GSubprocessFlags flags, I complained about this to Ryan on IRC, but I'm not happy with the inconsistency in this API vs. subprocess_newv. I think that the one-line savings we get with varargs isn't worth the inconsistency. Considering that C99 supports compound literals ((char *[]) { "foo", "bar", NULL }), and that we only lose a line to the initializer, the consistency in the API is much more important. @@ +581,3 @@ + +const gchar * +g_subprocess_get_identifier (GSubprocess *self) This needs documentation. Why is it a gchar? Will it always be just the pid? @@ +884,3 @@ + * the process after calling this function. + * + * On Unix, this function sends %SIGKILL. SIGKILL or SIGQUIT? @@ +913,3 @@ + * followed by g_subprocess_get_exit_status(). + * + * It is an error to call this function before g_subprocess_wait() has Will we return -1 or anything well-defined like that? @@ +916,3 @@ + * returned. + * + * Returns: the (meaningless) waitpid() exit status from the kernel Meaningless? Why do we have a function to retrieve meaningless stuff? @@ +950,3 @@ + g_return_val_if_fail (self->pid == 0, FALSE); + + return WIFEXITED (self->status) && WEXITSTATUS (self->status) == 0; This and the rest of them should have a win32 implementation. @@ +970,3 @@ + **/ +gboolean +g_subprocess_get_if_exited (GSubprocess *self) Hm, not too excited about this name. @@ +1030,3 @@ + +/** + * g_subprocess_get_term_sig: This is all low-level stuff. I think we should have a way to get the low-level "meaningless" status for a platform, and then simply have is_exited, get_exit_code, was_killed, etc. Think about what high-level operations we care about, then encode those into the API, allowing users to do something low-level if we want that. We can always add new API later, but we can't take it out, and we really need to think about Win32 support. @@ +1132,3 @@ + if (str->len + COMMUNICATE_READ_SIZE > str->allocated_len) + { + str->allocated_len = MAX(COMMUNICATE_READ_SIZE, str->allocated_len * 2); We shouldn't keep growing by 2 infinitely, I don't think. ::: gio/gsubprocess.h @@ +1,3 @@ +/* GIO - GLib Input, Output and Streaming Library + * + * Copyright (C) 2012 Colin Walters <walters@verbum.org> Drop the (C), update the year, and you should probably add your name, Ryan. ::: gio/gsubprocesslauncher.c @@ +121,3 @@ + GSubprocessLauncher *launcher = G_SUBPROCESS_LAUNCHER (object); + + g_assert (prop_id == 1); I don't know if gobject checks to make sure the property id is valid, but considering that we have G_OBJECT_WARN_INVALID_PROPERTY_ID, I'm imagining that some binding poking at property IDs might crash, and that's wrong. @@ +176,3 @@ + g_object_class_install_property (gobject_class, 1, + g_param_spec_flags ("flags", "Flags", "GSubprocessFlags for launched processes", + G_TYPE_SUBPROCESS_FLAGS, 0, G_PARAM_WRITABLE | It's weird to have a write-only property. Why can't we read it back?
(In reply to comment #139) > Considering that C99 supports compound literals ((char *[]) { "foo", "bar", > NULL }), and that we only lose a line to the initializer, the consistency in > the API is much more important. FWIW I saw something recently suggesting that MS is finally planning to add some C99/C11 syntax to MSVC, though I can't find that now...
At a very high level the new code looks good. Did you have any comments about my need in https://bugzilla.gnome.org/show_bug.cgi?id=672102#c136 ? That can come later though.
Review of attachment 249428 [details] [review]: ::: gio/gsubprocess.c @@ +1565,3 @@ + +/** + * g_subprocess_communicate_bytes: Why isn't this just named g_subprocess_communicate()? When would you want the char*/gsize pair version? Now I do see a potential use for a UTF8-validating version, since doing stuff like spawning "git rev-parse" is something I do a lot - and if that happened to spew garbage, I want to not return garbage to the user's terminal/GtkTextView. I had something like that in a long ago version of this code.
https://git.gnome.org/browse/glib/log/?h=wip/le-gsubprocess is now updated. I added commit messages to describe my changes, although I think we should just rebase it all into one patch for ultimate landing.
as an aside, one thing i've found useful in the past is to keep individual commits with fine-grained messages, but before pushing to master do, e.g., $ git merge --no-ff --no-commit wip/le-gsubprocess $ git commit Then enter a message describing the changeset as a whole. This will make cgit show all the micro commits squashed together into one big diff, with the as-a-whole message describing the big diff for the merge commit entry, but will also show the individual micro commits as separate entries.
(In reply to comment #144) > as an aside, one thing i've found useful in the past is to keep individual > commits with fine-grained messages, but before pushing to master do, e.g., > > $ git merge --no-ff --no-commit wip/le-gsubprocess > $ git commit pretty sure just "git merge --no-ff wip/le-gsubprocess" works we've started doing this in NM too
Attachment 229094 [details] pushed as c515c3e - gio: Add private API to create win32 streams from fds Attachment 249428 [details] pushed as 5b48dc4 - GSubprocess: New class for spawning child processes OK. Done.