GNOME Bugzilla – Bug 604086
Use splice(2) when doing local file copies
Last modified: 2010-02-15 12:27:57 UTC
When using splice() for local file copies we can transfer data between file descriptors without copying the data from kernel- to user-space.
This will happen in g_local_file_copy (), and I already started on this task.
I was thinking about this a little when working on gsocket and thinking about splice/sendfile and related APIs on other platforms. One idea I had was that we could add a GFileDescriptorBased interface, which would be implemented by G{Input,Output}StreamUnix, GLocalFile{Input,Output}Stream, GSocket{Input,Output}Stream, and anything else that's backed by a file descriptor, and then g_output_stream_splice() could check to see if the two streams you passed were splice(2)-compatible (or sendfile(2)-compatible, etc) and do that if appropriate. And on the subject of "splice(2)-compatible", the splice man page says "one of the descriptors must refer to a pipe" which would seem to make it inappropriate for g_local_file_copy()? (Unless the man page is just wrong.)
Having an interface for that seems nice. For the "must refer to a pipe" thing, i think you are misunderstanding how splice() works. You're not meant to splice directly from a to b and have the kernel do this all by itself. Instead it uses the traditional copy loop: while (read(a, buffer) != eof) { while (more in buffer) write (b, buffer) } However, instead of allocating buffer in userspace we allocate the buffer in kernel space by opening a pipe and using the pipe buffer as storage, then we read turns to splice into pipe and write to splice from pipe.
After a short discussion with alex, we both agreed that having an interface for fd based objects is indeed the way to go, and I am on that. I'll post patches.
Created attachment 153204 [details] [review] GFileDescriptorBased interface
Created attachment 153205 [details] [review] Implement interface in GLocalFileInputStream
Created attachment 153206 [details] [review] Implement interface in GLocalFileOutputFile
Created attachment 153207 [details] [review] Add support for splice(2)
(In reply to comment #8) > Created an attachment (id=153207) [details] [review] > Add support for splice(2) >+ total_size = -1; >+ /* avoid performance impact of querying total size when it's not needed */ >+ if (progress_callback) >+ { >+ struct stat sbuf; >+ >+ if (fstat (fd_in, &sbuf) == 0) >+ total_size = sbuf.st_size; >+ } >+ >+ if (total_size == -1) >+ total_size = 0; Small note: The "== -1" belongs inside the if().
(In reply to comment #5) > Created an attachment (id=153204) [details] [review] > GFileDescriptorBased interface + * g_file_descriptor_based_get_handle: + * @fd_based: a #GFileDescriptorBased. + * + * Gets the underlying file descriptor. + * + * Returns: The file descriptor + **/ +int +g_file_descriptor_based_get_handle (GFileDescriptorBased *fd_based) There's a "Since: 2.24" missing.
Created attachment 153397 [details] [review] GFileDescriptorBased interface
Created attachment 153398 [details] [review] Add support for splice(2)
Created attachment 153403 [details] [review] GFileDescriptorBased interface Add "Since: 2.24" to docs.
I think gfiledescriptorbased.[ch] should be in unix_sources/giounixinclude_HEADERS, to make it clear that it's only for real unix file descriptors, so we don't have to worry about exactly which kind of "file descriptor" it uses on Windows. (and I'd call the getter "g_file_descriptor_based_get_fd", not "get_handle", because "handle" sounds like you're trying to be abstract, and we're not).
agree with dan Also, splice_stream_with_progress() should be able to fail at runtime (if splice is not supported on e.g. the kernel or a specific filesystem, or if pipe() fails) and then fall back to copy_stream_with_progress. Otherwise it looks good to me.
Just for the record some very rough benchmarks using "time gvfs-copy": Non-Splice: 0.02s user 2.98s system 81% cpu 3.678 total 0.02s user 2.78s system 22% cpu 12.430 total 0.03s user 2.66s system 74% cpu 3.605 total 0.02s user 2.50s system 33% cpu 7.434 total Splice 0.00s user 0.94s system 73% cpu 1.296 total 0.00s user 1.55s system 85% cpu 1.814 total 0.00s user 0.99s system 73% cpu 1.347 total 0.01s user 0.91s system 12% cpu 7.238 total Splice (copy & sync) 0.00s user 1.44s system 85% cpu 1.683 total 0.00s user 1.01s system 23% cpu 4.311 total 0.00s user 1.20s system 81% cpu 1.462 total 0.00s user 0.94s system 21% cpu 4.340 total Splice (copy & sync & echo 3 > /proc/sys/vm/drop_caches) 0.00s user 1.92s system 68% cpu 2.802 total 0.00s user 1.23s system 27% cpu 4.512 total 0.00s user 1.25s system 76% cpu 1.621 total 0.01s user 1.16s system 23% cpu 5.046 total
Created attachment 153624 [details] [review] GFileDescriptorBased interface
Created attachment 153625 [details] [review] Implement interface in GLocalFileInputStream
Created attachment 153626 [details] [review] Implement interface in GLocalFileOutputFile
Created attachment 153627 [details] [review] Add support for splice(2) Changes of the patches: * Install GFileDescriptorBased header as unix specific header file. It still compiles on !unix platform as well since it is used internally. * Also convert GLocalFileIOStream to use the interface instead of the private _get_fd call of GLocalFileOutputStream. * Fallback support to splice_stream_with_progress
+#ifdef HAVE_SPLICE +#include "gfiledescriptorbased.h" +#endif I don't think you have to condition this, as it always exists (its used for all local files, even on win32 after all). Not that it matters, just means less ugly ifdefs. I don't see offset_in/out being changed ever, and the splice manpage says nothing about it being update to the new position. Is this just a problem with the docs? Also, i think do_splice should handle EINTR
Created attachment 153817 [details] [review] Add support for splice(2) (In reply to comment #21) > +#ifdef HAVE_SPLICE > +#include "gfiledescriptorbased.h" > +#endif > I don't think you have to condition this, as it always exists (its used for all > local files, even on win32 after all). Not that it matters, just means less > ugly ifdefs. Yeah, I totally agree. That is some leftover from my attempts to compile *everything* only on unix systems. Fixed. > I don't see offset_in/out being changed ever, and the splice manpage says > nothing about it being update to the new position. Is this just a problem with > the docs? The docs are unclear here. The offsets are updated by splice(). I had the other way around in the first place and I got some funny results ;-) > Also, i think do_splice should handle EINTR Doh! Yeah of course.
looks good to commit
Pushed to master. Thanks everybody for the review.