Bug 794173 – filesink pointlessly uses write/writev

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 794173 - filesink pointlessly uses write/writev


Summary:	filesink pointlessly uses write/writev


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gstreamer (core)
Version:	git master
Hardware:	Other Linux

Importance:	Normal blocker
Target Milestone:	1.15.1
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2018-03-08 12:20 UTC by Georg Lippitsch
Modified:	2018-08-16 14:02 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
filesink: Change write function to fwrite() (2.19 KB, patch) 2018-03-08 12:20 UTC, Georg Lippitsch	rejected	Details \| Review
filesink: Remove buffer, deprecate line-buffer mode and don't use fflush() (6.69 KB, patch) 2018-08-14 08:33 UTC, Sebastian Dröge (slomo)	committed	Details \| Review
filesink: Implement buffering internally (8.73 KB, patch) 2018-08-14 08:33 UTC, Sebastian Dröge (slomo)	none	Details \| Review
filesink: Implement buffering internally (8.99 KB, patch) 2018-08-14 10:57 UTC, Sebastian Dröge (slomo)	committed	Details \| Review

Description Georg Lippitsch 2018-03-08 12:20:06 UTC

Created attachment 369437 [details] [review]
filesink: Change write function to fwrite()

filesink pointlessly uses write()/writev() system calls instead of the buffered fwrite(). This effectively bypasses all buffering, and also withdraws any effect of the buffer properties.

Attached patch uses fwrite() instead.

Performance gain is remarkable: Especially when writing small buffers to a network storage, where the file is accessed by another machine at the same time, writing speed can increase several hundred times.

I ran the following pipeline:
# gst-launch-1.0 fakesrc sizetype=2 sizemax=4 num-buffers=400000 ! filesink location=testfile.raw

Current code:
Execution ended after 0:03:40.686780769

New patch:
Execution ended after 0:00:00.322147398

This is certainly an extreme case, but also on a local SSD, speed was between 1.5 and 2 times as fast.

With larger buffers, there is a minor increase in speed. I didn't find any case where using fwrite() was significantly slower than the current code.

Comment 1 Sebastian Dröge (slomo) 2018-03-08 13:50:13 UTC

Or simply reverting 5b50f39015b970af61f0f7f8b688327ea742cf0b and 5a4fb5c314a4c02e31f5716426ba57b2fe6a73e2. That should get rid of some now useless code too.

Comment 2 Georg Lippitsch 2018-03-08 13:57:31 UTC

That would break handling a buffer list iiuc

Comment 3 Nicolas Dufresne (ndufresne) 2018-03-08 14:30:51 UTC

Do we know anything about the fwrite implementation ? One of the problem is that fwrite() never becomes owner of the data, hence is requires to copy that data into another buffer (higher CPU).

I think it would be extra nice if we gain the same syscall optimization, but without loosing writev optimization for zero-copy. But that would be quite some work for such a niche use case (high throughput disk writing). So I'd say yes, fwrite might be the right way.

I've also tested it locally, as I know Linux 4.14 is supposed to have some patch to help throughput with large amount of queued IOs, but it's still twice faster without your patches here. Interesting data is the number of write call we do:

Before:
$ strace gst-launch-1.0 fakesrc sizetype=2 sizemax=4 num-buffers=40 ! filesink location=testfile.raw 2>&1 | grep write | wc
     22092  116783 1080171

After:
$ strace gst-launch-1.0 fakesrc sizetype=2 sizemax=4 num-buffers=40 ! filesink location=testfile.raw 2>&1 | grep write | wc
     95     527    4522

The includes couple of print f of course, but it gives an idea that there is something wrong.

Comment 4 Georg Lippitsch 2018-03-08 15:14:11 UTC

Twice faster without the patch, are you sure?
I'm on kernel 4.15 and it's ~2.7 times faster *with* the patch.

Regarding fwrite(), a sane implementation would only copy if the input is smaller than the internal buffer, but directly pass through anything large. I also found one implementation which does it that way.

Didn't look into glibc though, but AFAIK it's also highly optimized. It will be a huge task trying to write an own implementation which beats the one in glibc (if possible at all).

Comment 5 Georg Lippitsch 2018-03-08 15:24:10 UTC

BTW, the strace count is the same for me with and without the patch using your command line.
I only get a high count once after recompiling, which seem to come from re-initializing the plugin cache.

Comment 6 Nicolas Dufresne (ndufresne) 2018-03-08 15:36:05 UTC

The opposite, it's twice faster with the patch (I'll check my test, it could be the a bad method, and yes, I must have been trick by the registry update). I've also played with existing buffer-size property, which seems to have little effect on performance.

Comment 7 Sebastian Dröge (slomo) 2018-05-14 17:36:04 UTC

(In reply to Nicolas Dufresne (ndufresne) from comment #6)
> I've also played with existing buffer-size property, which seems to
> have little effect on performance.

That property also has absolutely no effect without the patch :)

Comment 8 Sebastian Dröge (slomo) 2018-08-13 07:33:56 UTC

We should get this fixed before 1.16 as it's a severe performance regression.


Alternatively we could also consider doing our own buffering inside filesink. Collect buffers until a threshold is reached and then writev() them all in one go, which is basically what fwrite() is doing internally.

Comment 9 Sebastian Dröge (slomo) 2018-08-13 07:42:17 UTC

That might actually be the nicer and more performant solution, as that way there's no copying together of buffers before the kernel level. fwrite() is not doing anything different than that.

Comment 10 Sebastian Dröge (slomo) 2018-08-14 08:33:23 UTC

Created attachment 373323 [details] [review]
filesink: Remove buffer, deprecate line-buffer mode and don't use fflush()

fflush() has no effect because we use writev() directly, so fsync()
should be used instead which is actually flushing the kernel-side
buffers.

As a next step, a non-line-buffered buffering mode is to be added.

Comment 11 Sebastian Dröge (slomo) 2018-08-14 08:33:29 UTC

Created attachment 373324 [details] [review]
filesink: Implement buffering internally

We use writev() so every call ends up going to the kernel but for small
buffers we generally would prefer to do as few write calls as possible.

Comment 12 Sebastian Dröge (slomo) 2018-08-14 08:50:29 UTC

With these patches on 2048 byte buffers (audiotestsrc ! filesink) I get:

- filesink to /dev/null with patch: 8.393s (average)
- filesink to /dev/null without patch: 8.508s (average)
- fakesink: 8.253s (average)

That is, a speedup of 1.8x ((8.508-8.253)/(8.393-8.253))

With the testcase above (fakesrc sizetype=2 sizemax=4 ! filesink) I get (setting buffers to a higher number):

- filesink to /dev/null with patch: 10.12s (average)
- filesink to /dev/null without patch: 11.12s (average)
- fakesink: 6.80s (average)

That is, a speedup of 1.3x ((11.12-6.8)/(10.12-6.8))

Note that this is all /dev/null, with actual files it is worse. Numbers are 9.73s and 17.28s respectively, so 3.6x speedup with the patches.

Comment 13 Sebastian Dröge (slomo) 2018-08-14 10:57:07 UTC

Created attachment 373329 [details] [review]
filesink: Implement buffering internally

We use writev() so every call ends up going to the kernel but for small
buffers we generally would prefer to do as few write calls as possible.

Comment 14 Sebastian Dröge (slomo) 2018-08-16 14:01:39 UTC

Attachment 373323 [details] pushed as e975e0c - filesink: Remove buffer, deprecate line-buffer mode and don't use fflush()
Attachment 373329 [details] pushed as 6b4fc62 - filesink: Implement buffering internally