GNOME Bugzilla – Bug 794173
filesink pointlessly uses write/writev
Last modified: 2018-08-16 14:02:02 UTC
Created attachment 369437 [details] [review] filesink: Change write function to fwrite() filesink pointlessly uses write()/writev() system calls instead of the buffered fwrite(). This effectively bypasses all buffering, and also withdraws any effect of the buffer properties. Attached patch uses fwrite() instead. Performance gain is remarkable: Especially when writing small buffers to a network storage, where the file is accessed by another machine at the same time, writing speed can increase several hundred times. I ran the following pipeline: # gst-launch-1.0 fakesrc sizetype=2 sizemax=4 num-buffers=400000 ! filesink location=testfile.raw Current code: Execution ended after 0:03:40.686780769 New patch: Execution ended after 0:00:00.322147398 This is certainly an extreme case, but also on a local SSD, speed was between 1.5 and 2 times as fast. With larger buffers, there is a minor increase in speed. I didn't find any case where using fwrite() was significantly slower than the current code.
Or simply reverting 5b50f39015b970af61f0f7f8b688327ea742cf0b and 5a4fb5c314a4c02e31f5716426ba57b2fe6a73e2. That should get rid of some now useless code too.
That would break handling a buffer list iiuc
Do we know anything about the fwrite implementation ? One of the problem is that fwrite() never becomes owner of the data, hence is requires to copy that data into another buffer (higher CPU). I think it would be extra nice if we gain the same syscall optimization, but without loosing writev optimization for zero-copy. But that would be quite some work for such a niche use case (high throughput disk writing). So I'd say yes, fwrite might be the right way. I've also tested it locally, as I know Linux 4.14 is supposed to have some patch to help throughput with large amount of queued IOs, but it's still twice faster without your patches here. Interesting data is the number of write call we do: Before: $ strace gst-launch-1.0 fakesrc sizetype=2 sizemax=4 num-buffers=40 ! filesink location=testfile.raw 2>&1 | grep write | wc 22092 116783 1080171 After: $ strace gst-launch-1.0 fakesrc sizetype=2 sizemax=4 num-buffers=40 ! filesink location=testfile.raw 2>&1 | grep write | wc 95 527 4522 The includes couple of print f of course, but it gives an idea that there is something wrong.
Twice faster without the patch, are you sure? I'm on kernel 4.15 and it's ~2.7 times faster *with* the patch. Regarding fwrite(), a sane implementation would only copy if the input is smaller than the internal buffer, but directly pass through anything large. I also found one implementation which does it that way. Didn't look into glibc though, but AFAIK it's also highly optimized. It will be a huge task trying to write an own implementation which beats the one in glibc (if possible at all).
BTW, the strace count is the same for me with and without the patch using your command line. I only get a high count once after recompiling, which seem to come from re-initializing the plugin cache.
The opposite, it's twice faster with the patch (I'll check my test, it could be the a bad method, and yes, I must have been trick by the registry update). I've also played with existing buffer-size property, which seems to have little effect on performance.
(In reply to Nicolas Dufresne (ndufresne) from comment #6) > I've also played with existing buffer-size property, which seems to > have little effect on performance. That property also has absolutely no effect without the patch :)
We should get this fixed before 1.16 as it's a severe performance regression. Alternatively we could also consider doing our own buffering inside filesink. Collect buffers until a threshold is reached and then writev() them all in one go, which is basically what fwrite() is doing internally.
That might actually be the nicer and more performant solution, as that way there's no copying together of buffers before the kernel level. fwrite() is not doing anything different than that.
Created attachment 373323 [details] [review] filesink: Remove buffer, deprecate line-buffer mode and don't use fflush() fflush() has no effect because we use writev() directly, so fsync() should be used instead which is actually flushing the kernel-side buffers. As a next step, a non-line-buffered buffering mode is to be added.
Created attachment 373324 [details] [review] filesink: Implement buffering internally We use writev() so every call ends up going to the kernel but for small buffers we generally would prefer to do as few write calls as possible.
With these patches on 2048 byte buffers (audiotestsrc ! filesink) I get: - filesink to /dev/null with patch: 8.393s (average) - filesink to /dev/null without patch: 8.508s (average) - fakesink: 8.253s (average) That is, a speedup of 1.8x ((8.508-8.253)/(8.393-8.253)) With the testcase above (fakesrc sizetype=2 sizemax=4 ! filesink) I get (setting buffers to a higher number): - filesink to /dev/null with patch: 10.12s (average) - filesink to /dev/null without patch: 11.12s (average) - fakesink: 6.80s (average) That is, a speedup of 1.3x ((11.12-6.8)/(10.12-6.8)) Note that this is all /dev/null, with actual files it is worse. Numbers are 9.73s and 17.28s respectively, so 3.6x speedup with the patches.
Created attachment 373329 [details] [review] filesink: Implement buffering internally We use writev() so every call ends up going to the kernel but for small buffers we generally would prefer to do as few write calls as possible.
Attachment 373323 [details] pushed as e975e0c - filesink: Remove buffer, deprecate line-buffer mode and don't use fflush() Attachment 373329 [details] pushed as 6b4fc62 - filesink: Implement buffering internally