GNOME Bugzilla – Bug 554018
Inefficient implementation of InputStream::read_async
Last modified: 2008-09-27 03:37:26 UTC
I wrote a benchmark program to measure the speed at which I could read data, as function of used buffersize (the size that 'read(2)' would be called with) and the used software API. It turned out that any implementation I could think of was equally fast, even with buffer sizes as small as 4096 byte, they would read at the maximum speed (110 MB/s for my RAID)... with the exception of one implementation: where I used Gio::FileInputStream. After a few hours of testing, profiling and debugging, I found the problem: Inside InputStream::read_async, the call back slot is copied (which is a deep copy, causing another call to new) "because it is deleted again in the callback", Gio::SignalProxy_async_callback. These two calls to new (and their destruction) are expensive in a tight loop that reads from disk. Surely it is possible to not copy the slot, and not delete it in Gio::SignalProxy_async_callback? Both seem to be totally unnecessary and I cannot think of a disadvantage to just remove the copy and the delete.
We need a pointer to a slot object which is surely not going to go out of scope to pass to the C API; that's why the copy is made. How much slower exactly is the reading in this case? I'm just curious, since you said you've measured.
It's only 12%... but well, I have a fast cpu ;). On some machines I can imagine it would be more. However, I deleted the copy and it didn't get much faster :/. I was convinced that this was it, because it showed up in my profile and when I put a break point on the "extra" function it was inside the deep copy of this slot. Now I'm thinking it's the collection of all extra things, the wrapping and unwrapping, also in the call to read_finish... I suppose you should just close this "bug". I'm going to use the glib interface however, to me 12% makes a difference :p.
Ok, it was false alarm... I waste 24 hours, but I finally really found what caused the difference: a bug in my benchmark program :/. In order to be able to read 1 GB repeatedly from disk, the program clears the disk cache by writing "1" to /proc/sys/vm/drop_caches. Since that also drops all disk caches of the shared libraries, it then continue with a "warm up" by calling the benchmark function with some small file to read, and subsequently even sleeps 1 second to let other applications that are running to load their stuff into RAM again if needed... Now due to this async callback stuff, I was confused about were to start the timer; and in the case of InputStream::read_async the timer wasn't started at all, which means it was running since the beginning of the warm-up! So... the delay that I've been looking for existed mainly of that sleep(1)! lol Sorry to have waste some of your time in the process as well!