GNOME Bugzilla – Bug 691581
g_output_stream_real_splice_async doesn't use overriden read/write_async functions
Last modified: 2013-09-29 21:49:37 UTC
Currently g_output_stream_real_splice_async runs the sync splice function in a new thread. This ignores any subclasses' custom read_async or write_async functions which can lead to complicated workarounds to safely handle reading from say a GInputStream subclass in a new thread. I'd like to implement this to call each class's individual read_async and write_async. However the comment in the code suggests this should only be done when both the read and write async functions are overridden (gio/goutputstream.c:1549). Wouldn't this also be good to do even if only one of them are overridden? Especially the input stream's?
Yes
Created attachment 236604 [details] [review] Patch to use the async read/write functions of the invovled streams Here's my patch for this issue. I'm a little concerned about the buffer size being 8192 bytes. Being that it's allocated every time g_output_stream_splice_async() is called, I'm thinking that may kill performance on smaller splices. The sync version uses that size buffer, but it's on the stack so it's not as expensive. Thoughts/comments welcome.
Comment on attachment 236604 [details] [review] Patch to use the async read/write functions of the invovled streams > Here's my patch for this issue. I'm a little concerned about the buffer size > being 8192 bytes. Being that it's allocated every time > g_output_stream_splice_async() is called, I'm thinking that may kill > performance on smaller splices. The sync version uses that size buffer, but > it's on the stack so it's not as expensive. Thoughts/comments welcome. Are you really expecting to do that much splicing that it would become a bottleneck? Anyway, I don't have any better suggestions. >Subject: [PATCH 1/3] GOutputStream: Remove unused SpliceUserData struct it's better to attach each patch separately. (git-bz (git://git.fishsoup.net/git-bz) makes this easy) anyway, this first patch is clearly right. (Do you have a gnome git account? If so, you can commit this patch now.) >Subject: [PATCH 2/3] GOutputStream: Split _close_async for internal use >+_real_close_async_cb (GObject *source_object, I don't know why _g_output_stream_close_internal() has a leading underscore, but it goes against the convention in the rest of the file (and most of glib). Drop the leading underscores on your functions. (Both in this patch and the next.) (You can fix _g_output_stream_close_internal() too if you want...) >+ if (stream->priv->closed) >+ { >+ g_task_return_boolean (task, TRUE); You have this case in both _close_async() and _close_async_internal() now. You could just remove it from _close_async(). Also, I think it would make sense to have a _finish() function to go with _close_async_internal(), rather than having people call _close_async_internal() but then muck with its GTask directly to get the result. Naming-wise, this would probably work better as g_output_stream_internal_close_async() and g_output_stream_internal_close_finish(). >Subject: [PATCH 3/3] GOutputStream: Use async read/write of streams in > splice_async() So there's at least one crasher bug in this patch: >+ g_input_stream_close_async (op->source, g_task_get_priority (task), >+ g_task_get_cancellable (task), _real_splice_async_close_input_cb, op); "op" should be "task" at the end there. This suggests that we need more splice_async test cases in gio/tests/ to go with this new code. (Or, possibly, it just means that you didn't run "make check"...) Also, here and in lots of other places, the second and later lines should be indented all the way to the "(": g_input_stream_close_async (op->source, g_task_get_priority (task), g_task_get_cancellable (task), real_splice_async_close_input_cb, task); >- /* TODO: In the case where both source and destintion have >- non-threadbased async calls we can use a true async copy here */ In the case where both of them are going to do fallback, it would be better to still use the sync-in-a-thread version. (Or even better: to do the copying synchronously in a thread, but then call real_splice_async_complete() back in the original thread, and do the closes asynchronously there.) There's no "g_input_stream_async_read_is_via_threads()" currently, but you could split the logic out of g_input_stream_read_async(), and likewise with GOutputStream to create them.
(In reply to comment #3) > (From update of attachment 236604 [details] [review]) > > Here's my patch for this issue. I'm a little concerned about the buffer size > > being 8192 bytes. Being that it's allocated every time > > g_output_stream_splice_async() is called, I'm thinking that may kill > > performance on smaller splices. The sync version uses that size buffer, but > > it's on the stack so it's not as expensive. Thoughts/comments welcome. > > Are you really expecting to do that much splicing that it would become a > bottleneck? > > Anyway, I don't have any better suggestions. Not personally, no. Just brainstorming potential issues. > >Subject: [PATCH 1 [details]/3] GOutputStream: Remove unused SpliceUserData struct > > it's better to attach each patch separately. (git-bz > (git://git.fishsoup.net/git-bz) makes this easy) > > anyway, this first patch is clearly right. (Do you have a gnome git account? If > so, you can commit this patch now.) Done. > >Subject: [PATCH 3 [details]/3] GOutputStream: Use async read/write of streams in > > splice_async() > > So there's at least one crasher bug in this patch: > > >+ g_input_stream_close_async (op->source, g_task_get_priority (task), > >+ g_task_get_cancellable (task), _real_splice_async_close_input_cb, op); > > "op" should be "task" at the end there. > > This suggests that we need more splice_async test cases in gio/tests/ to go > with this new code. (Or, possibly, it just means that you didn't run "make > check"...) It appears that g_io_stream_splice() as used in the test cases doesn't use g_output_stream_splice()'s close flags. I'll add a test for that. I'll address the rest of the comments and cook up some new patches in the near future. Thanks for the review.
Created attachment 237302 [details] [review] GInputStream: Add g_input_stream_async_read_is_via_threads() In implementing a better g_output_stream_splice_async() and possibly other situtations it's helpful to know whether the input stream's read function internally uses threads. If it and the output stream's write async functions use threads, then the splice function could spawn a single thread for better efficiency. This patch adds a function to determine whether an input stream's g_input_stream_read_async() function internally uses threads.
Created attachment 237303 [details] [review] GOutputStream: Add g_output_stream_async_write_is_via_threads() In implementing a better g_output_stream_splice_async() and possibly other situtations it's helpful to know whether the output stream's write function internally uses threads. If it and the input stream's read async functions use threads, then the splice function could spawn a single thread for better efficiency. This patch adds a function to determine whether an output stream's g_output_stream_write_async() function internally uses threads.
Created attachment 237304 [details] [review] tests: Add testcases for g_output_stream_splice_async() Previously, no testcases tested the close flags of g_output_stream_splice_async. This patch adds tests for that and also tests various combinations of threaded and non-threaded GInputStream async reads and GOutputStream async writes.
Created attachment 237305 [details] [review] GOutputStream: Rename _g_output_stream_close_internal() for consistency
Created attachment 237306 [details] [review] GOutputStream: Split _close_async for internal use Refactor g_output_stream_close_async() into itself and an internal variant for potential use inside other operations (splice_async). The internal version must be called between g_output_stream_set_pending() and g_output_stream_clear_pending().
Created attachment 237307 [details] [review] GOutputStream: Use async read/write of streams in splice_async() There are some corner cases where using the sync version of read/write in a thread could cause thread-safety issues. In these cases it's possible to override the output stream's splice_async() function, but for input streams one would need to do some acrobatics to stay thread-safe. Alternatively, some implementations may not even override their sync read/write functions. This patch refactors the default splice_async() implementation to call the sync read and write functions in a thread only when both async versions are thread-based. When one or both are non-threaded, it calls the virtual write_async() and read_async() functions of the involved streams within the same thread.
(In reply to comment #3) > it's better to attach each patch separately. (git-bz > (git://git.fishsoup.net/git-bz) makes this easy) Thanks for this suggestion. It did indeed make this much easier. > >Subject: [PATCH 3 [details]/3] GOutputStream: Use async read/write of streams > > in splice_async() > > In the case where both of them are going to do fallback, it would be better to > still use the sync-in-a-thread version. (Or even better: to do the copying > synchronously in a thread, but then call real_splice_async_complete() back in > the original thread, and do the closes asynchronously there.) I opted to fallback to the old functionality if both async functions are threaded. I'm willing to improve it to transfer the data in a single thread and then close asynchronously, but I want to make sure this first set is correct and makes it in.
Is there something more I can do to help move this along?
Comment on attachment 237302 [details] [review] GInputStream: Add g_input_stream_async_read_is_via_threads() >+ * Returns: %TRUE if @stream's read_async function uses threads. >+ **/ This would need "Since: 2.36" as well. But since it's very late in the release cycle, I'd prefer to add this as private API for now; you can create an uninstalled "gioprivate.h" header to prototype this and the corresponding output stream function. >+ return (class->read_async == g_input_stream_real_read_async && >+ !(G_IS_POLLABLE_INPUT_STREAM (stream) && >+ g_pollable_input_stream_can_poll (G_POLLABLE_INPUT_STREAM (stream)))); the last line should be indented two spaces more (so it's inside the "(" on the previous line)
Comment on attachment 237303 [details] [review] GOutputStream: Add g_output_stream_async_write_is_via_threads() same comments as on the input stream patch, plus you should update g_output_stream_real_write_async() to use the new method.
Comment on attachment 237304 [details] [review] tests: Add testcases for g_output_stream_splice_async() just nitpicks here: >+test_copy_chunks_splice_cb (GObject *source, >+ GAsyncResult *res, >+ gpointer user_data) line up the args: GObject source, GAsyncResult *res, gpointer user_data) >+ g_assert (g_input_stream_async_read_is_via_threads (data.istream) == >+ ((data.flags & TEST_THREADED_ISTREAM) != 0)); ah... well, you wouldn't be able to do this if that was private API, but that's OK I guess. >+ /* We do not hold a ref in data struct, this is to make sure the operation >+ * keeps the iostream objects alive until it finishes */ put the "*/" on its own line
Comment on attachment 237306 [details] [review] GOutputStream: Split _close_async for internal use >+/* Must always be called inside >+ * g_output_stream_set_pending()/g_output_stream_clear_pending(). */ fix the */ again
Created attachment 242496 [details] [review] GInputStream: Add g_input_stream_async_read_is_via_threads() In implementing a better g_output_stream_splice_async() and possibly other situtations it's helpful to know whether the input stream's read function internally uses threads. If it and the output stream's write async functions use threads, then the splice function could spawn a single thread for better efficiency. This patch adds a function to determine whether an input stream's g_input_stream_read_async() function internally uses threads.
Created attachment 242497 [details] [review] GOutputStream: Add g_output_stream_async_write_is_via_threads() In implementing a better g_output_stream_splice_async() and possibly other situtations it's helpful to know whether the output stream's write function internally uses threads. If it and the input stream's read async functions use threads, then the splice function could spawn a single thread for better efficiency. This patch adds a function to determine whether an output stream's g_output_stream_write_async() function internally uses threads.
Created attachment 242498 [details] [review] tests: Add testcases for g_output_stream_splice_async() Previously, no testcases tested the close flags of g_output_stream_splice_async. This patch adds tests for that and also tests various combinations of threaded and non-threaded GInputStream async reads and GOutputStream async writes.
Created attachment 242499 [details] [review] GOutputStream: Rename _g_output_stream_close_internal() for consistency
Created attachment 242500 [details] [review] GOutputStream: Split _close_async for internal use Refactor g_output_stream_close_async() into itself and an internal variant for potential use inside other operations (splice_async). The internal version must be called between g_output_stream_set_pending() and g_output_stream_clear_pending().
Created attachment 242501 [details] [review] GOutputStream: Use async read/write of streams in splice_async() There are some corner cases where using the sync version of read/write in a thread could cause thread-safety issues. In these cases it's possible to override the output stream's splice_async() function, but for input streams one would need to do some acrobatics to stay thread-safe. Alternatively, some implementations may not even override their sync read/write functions. This patch refactors the default splice_async() implementation to call the sync read and write functions in a thread only when both async versions are thread-based. When one or both are non-threaded, it calls the virtual write_async() and read_async() functions of the involved streams within the same thread.
Sorry for the delay. I've updated the above patches per your comments.
Comment on attachment 242498 [details] [review] tests: Add testcases for g_output_stream_splice_async() >+ g_assert (g_file_get_contents (data->output_path, &received_data, >+ &length, NULL)); It's better to do: g_file_get_contents (data->output_path, &received_data, &length, &error); g_assert_no_error (error); because in that case, if it fails, the assertion message will include the error message, making it clearer exactly what failed >+ file = g_file_new_tmp ("test-inputXXXXXX", &stream, NULL); >+ g_file_set_contents (data.input_path, data.data, strlen (data.data), >+ NULL); >+ data.istream = G_INPUT_STREAM (g_file_read (file, NULL, NULL)); likewise throughout; it's not hard to use &error everywhere and add a "g_assert_no_error (error);", after each one. (sorry, should have caught this the first time)
patches don't apply cleanly, and git-bz warns about them introducing whitespace errors.
fixed up and pushed to master Attachment 242496 [details] pushed as 94a232a - GInputStream: Add g_input_stream_async_read_is_via_threads() Attachment 242497 [details] pushed as dec3bfe - GOutputStream: Add g_output_stream_async_write_is_via_threads() Attachment 242498 [details] pushed as e967a76 - tests: Add testcases for g_output_stream_splice_async() Attachment 242499 [details] pushed as 416ca8a - GOutputStream: Rename _g_output_stream_close_internal() for consistency Attachment 242500 [details] pushed as 87e5617 - GOutputStream: Split _close_async for internal use Attachment 242501 [details] pushed as 4e9e7d0 - GOutputStream: Use async read/write of streams in splice_async()