GNOME Bugzilla – Bug 791457
Slow transfer rate when writing to smb/cifs
Last modified: 2018-05-07 05:15:56 UTC
I have a SMB share in a gigabit ethernet network mounted by the system (it is not using gvfs) using the next options: vers=3.0,rsize=1048576,wsize=1048576,rw,iocharset=utf8,dir_mode=0777,file_mode=0666 The problem is that when I copy a file from the local disk to the SMB share using nautilus, data transfer rate is never over 9MB/s. I have also tried using thunar and nemo and they behave exactly the same way. However, when nautilus reads from the SMB share data transfer reaches 100MB/s. Using gvfs (i.e.: directly accessing from nautilus at smb://<host>) behaves better but still slower than expected: ~51MB/s Using cp I got a similar transfer rate: ~50MB/s I have also tried using kde's dolphin and obtain ~40MB/s And finally, I tested the different data transfer rates depending on the block size using dd and below are the results I have obtained: 1.5kB 3.0 MB/s 4kb 7.2 MB/s 16kB 25.3 MB/s 32kB 38.6 MB/s 64kB 51.2 MB/s 128kB 64.5 MB/s 256kB 78.4 MB/s 512kB 89.0 MB/s 1MB 90.7 MB/s 2MB 88.8 MB/s 4MB 86.1 MB/s 8MB 88.1 MB/s So, I thought the problem could be related to writing too small blocks to disk and I started checking what nautilus does and I found out it uses the g_file_copy function (https://developer.gnome.org/gio/stable/GFile.html#g-file-copy) But I didn't understand how this function works and find nothing here specifying how many bytes are write at once. Everything was tested using Ubuntu 17.10
file_copy_fallback is what you are looking for: https://git.gnome.org/browse/glib/tree/gio/gfile.c#n3040 and it has several branches, gvfs uses copy_stream_with_progress: https://git.gnome.org/browse/glib/tree/gio/gfile.c#n2773 and its buffer has been bumped to 256kB quite recently, see: https://bugzilla.gnome.org/show_bug.cgi?id=773823 but I suppose that splice_stream_with_progress is used in this case, which uses just 64kB buffer: https://git.gnome.org/browse/glib/tree/gio/gfile.c#n2893 so maybe we should also bump this buffer to 256kB, or there must be another bottleneck... It is hard to reach the maximal transfer rates in all cases, but we should reach comparable speed with cp.
I increased the buffer for the splice_stream_with_progress to 256kB and I noticed no changed. Then I added a #undef HAVE_SPLICE at the beginning of gfile.c to force it to use read/write (like cp and dd) and I achieved a throughtput of ~73MB/s. It seems there is a problem in the cifs filesystem when writing using splice. I have still to further investigate this but it seems the problem is in the kernel, not here. btw, I also tried copying a file in my local hard disk (ssd using ext4) and this are the (strange?) results I got using the different system calls: splice 64kB ~153MB/s splice 256kB ~160MB/s read/write 256kB ~168MB/s It seemed strange as splice is supposed to be fastest than plain read/write...
I have submitted a patch to the kernel that fixes the first bottleneck: https://patchwork.kernel.org/patch/10134653/ Now the transfer rate I obtain using nautilus with the patch applied is: 64kB buffer: ~56MB/s (current value in glib) 256kB buffer: ~80MB/s 512kB buffer: ~87MB/s 1024kB buffer: ~97MB/s So bumping this buffer at least to 256kB seems desirable. (I'll attach a patch with this change) Just for fun, I did the same benchmark in my local ssd with the next results: 64kB buffer: ~153MB/s 256kB buffer: ~160MB/s 1024kB buffer: ~195MB/s This is nearly a 22% improvement from 256kB buffer to 1MB buffer for both cases. So now I wonder: there is some drawback if we increase buffer to 1MB? And if it has some drawback could make sense defining buffer size dynamically based on some parameter?
Created attachment 366049 [details] [review] gio: bump splice copy buffer size to 256k by default
I can’t immediately think of any problems with using a 1MB buffer size. Ondrej, can you? The original commit which introduced splice() support (bb4f63d6390fe5efd183f259e5bd891f89de9e24) doesn’t mention anything about the choice of buffer size, and neither does the bug about it (bug #604086).
See also https://github.com/coreutils/coreutils/blob/master/src/ioblksize.h
(In reply to Philip Withnall from comment #5) > I can’t immediately think of any problems with using a 1MB buffer size. > Ondrej, can you? I don't know... Just what I have in mind is that the progress can be reported slowly for network mounts in case of big buffers and wrong connectivity...
OK, well it looks like we should definitely be increasing the buffer size to 128KiB as per comment #6, and keeping an eye on ioblksize.h in future. Andrés, what file system are you using on your local SSD (comment #3)? I assume it’s ext4. Could you also do some tests with copying to/from FAT USB sticks and to/from DAV network shares with various buffer sizes? I’m currently tempted to go with a 1MB buffer size, since it provides measurable improvements for CIFS and your local SSD, but before we go with that I want to verify that it doesn’t degrade performance for other common file transfer use cases. (Hopefully it’ll improve them too.)
A coworker recently ran the ioblksize.h tests on a Yoga 900 and got the following results. tl;dr: 128KiB is best for syscall overhead (/dev/zero → /dev/null), but 256KiB is best for copying files on an SSD (ext4). There’s not much between them though. He decided to go with a 1MiB buffer size for the task he was working on. Syscall overhead: blocksize (B) | transfer rate (GB/s) --------------+--------------------- 1024 | 3.4 2048 | 5.9 4096 | 9.1 8192 | 11.1 16384 | 13.3 32768 | 14.2 65536 | 15.7 131072 | 16.1 (highest) 262144 | 15.9 524288 | 15.5 1048576 | 15.3 Writing to disk: blocksize (B) | transfer rate (MB/s) --------------+--------------------- 1024 | 37.0 2048 | 40.1 4096 | 465 8192 | 447 16384 | 432 32768 | 458 65536 | 458 131072 | 443 262144 | 466 (highest) 524288 | 450 1048576 | 463 2097152 | 452 4194304 | 438 8388608 | 457 16777216 | 435 33554432 | 451
He also says: * 1MiB is an upper bound on the buffer size which could be used (in his case) because it’s the largest pipe buffer size which an unprivileged process can set (by default) * coreutils ensures its buffers are page-aligned, but he didn’t find that this made a significant difference in throughput
Yes my local ssd is using ext4. Here are some benchmarks: Copying a 1GB file from local SSD to USB stick: blocksize (kB)| transfer rate (MB/s) --------------+---------------------- 64 | 2.54 128 | 2.47 256 | 2.57 512 | 2.41 1024 | 2.48 Note: I did not empty cache between execution in this test but I supposed it wouldn't make a difference. I did also some test with a smaller file and results are nearly the same for every block size (2.71 MB/s 64kB, 2.79MB/s 1024kB) In the next benchmarks reads and writes are done to a tmpfs, cache was emptied before each test with "sync; echo 3 >/proc/sys/vm/drop_caches" and I executed most of them 3 times and calculate the mean value and the standard deviation. I used "time -f %e" to obtain the time needed by "gio copy" to copy a 2GB file. Read from ext4 SSD blocksize (kB)| transfer rate (MB/s) --------------+---------------------- 64 | 389.85 σ=1.13 128 | 386.91 σ=2.49 256 | 392.09 σ=1.89 512 | 402.67 σ=5.42 1024 | 424.90 σ=0.88 Write to ext4 SSD blocksize (kB)| transfer rate (MB/s) --------------+---------------------- 64 | 285.66 σ=3.41 256 | 287.55 σ=20.92 1024 | 282.84 σ=9.75 Read from cifs blocksize (kB)| transfer rate (MB/s) --------------+---------------------- 64 | 106.23 σ=1.03 128 | 109.07 σ=1.58 256 | 106.52 σ=4.50 512 | 109.01 σ=1.36 1024 | 108.58 σ=1.09 Write to cifs blocksize (kB)| transfer rate (MB/s) --------------+---------------------- 64 | 51.52 σ=0.99 128 | 65.41 σ=0.57 256 | 76.64 σ=0.16 512 | 85.72 σ=1.33 1024 | 90.21 σ=0.72 Read from nfs blocksize (kB)| transfer rate (MB/s) --------------+---------------------- 64 | 108.80 σ=1.61 128 | 107.96 σ=0.83 256 | 108.85 σ=0.96 512 | 107.49 σ=0.36 1024 | 108.67 σ=0.67 Write to nfs blocksize (kB)| transfer rate (MB/s) --------------+---------------------- 64 | 85.12 128 | 85.25 σ=0.13 256 | 84.51 σ=0.84 512 | 85.01 σ=0.61 1024 | 84.81 σ=0.65 So apparently buffer size makes no difference except for writing to cifs and for reading from ext4 ssd. The first one could make sense because cifs module is not smart enough to wait for more data in order to minimize control traffic but no idea about the second one. It also caught my attention that reading/writing from/to the ssd makes a bit of noise with 64kb, 128kB or 256kB buffer but it doesn't with 512kB or 1MB buffer.
Seems like 1MiB buffers are a safe choice to go with, given they beat other buffer sizes, or are not particularly slower, on all of those benchmarks. Thanks for doing the benchmarks. :-) Andrés, do you want to update your patch to go with a 1MiB buffer size? Please make sure it includes a comment which points to the analysis here.
Review of attachment 366049 [details] [review]: (Marking as needs-work accordingly.)
Created attachment 367162 [details] [review] gio: bump splice copy buffer size to 1024k
Review of attachment 367162 [details] [review]: Please include a comment in the code pointing to the analysis in this bug. ::: gio/gfile.c @@ +2909,3 @@ if (!g_unix_open_pipe (buffer, FD_CLOEXEC, error)) return FALSE; + fcntl(buffer[1], F_SETPIPE_SZ, 1024*1024); I suspect you need to check the return value of this call. If the kernel pipe size limits are lower than 1MiB, the do_splice() call below will end up trying to splice 1MiB of data into a pipe which is not big enough, which will probably fail. (I haven’t analysed the failure mode though.) fcntl() returns the actual set capacity of the pipe, so you should pass that to the do_splice() call. If fcntl() returns EPERM or EBUSY, it’s probably safest to call fcntl(F_GETPIPE_SZ) and use the pipe size from that in the call to do_splice() below.
Created attachment 367966 [details] [review] gio: bump splice copy buffer size to 1024k
This is an update with last discoveries: The cifs module patch is already accepted: https://github.com/torvalds/linux/commit/cd1aca29fa020b6e6edcd3d5b3e49ab877d1bed7 Slow transfer rate is directly related with how small is the payload size being sent in each SMB message. Theoretical transfer rate is achieve when payload is 1 MB per message. If the server uses samba 4.6, each message carries 1 MB of payload independently of the buffer size used. Kernel patch and this patch makes no difference here. If the server uses samba 4.4 (this is the case for the server where I discovered the problem) only 4096 bytes are sent by message. Kernel patch fixes this and allows each message to contain the same data as the buffer size. Finally, increasing the buffer to 1 MB (glib patch) makes it behave as expected also in this situation. No idea why everything worked perfect with samba 4.6 version but it does not with 4.4 version.
What's about samba 4.7?
(In reply to Steve Christen from comment #18) > What's about samba 4.7? It behaves like samba 4.6 (i.e.: 1MB per message using 64kB buffer).
Review of attachment 367966 [details] [review]: ::: gio/gfile.c @@ +2915,3 @@ + if (buffer_size < 0) + { + buffer_size = fcntl(buffer[1], F_GETPIPE_SZ); Having another think about it, I worry what will happen if fcntl() returns -1 (error) here. I’m going to rework this patch locally and push it.
I pushed a modified version of the patch which adds a bit more error handling, in case the fcntl() calls fail or return 0 (for some reason). The following fix has been pushed: a5778ef gio: bump splice copy buffer size to 1024k
Created attachment 368412 [details] [review] gio: bump splice copy buffer size to 1024k This change increases throughput when copying files for some filesystems (Modified by Philip Withnall <withnall@endlessm.com> to add more error handling.)