After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 554018 - Inefficient implementation of InputStream::read_async
Inefficient implementation of InputStream::read_async
Status: RESOLVED NOTABUG
Product: glibmm
Classification: Bindings
Component: giomm
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtkmm-forge
gtkmm-forge
Depends on:
Blocks:
 
 
Reported: 2008-09-26 21:18 UTC by Carlo Wood
Modified: 2008-09-27 03:37 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Carlo Wood 2008-09-26 21:18:36 UTC
I wrote a benchmark program to measure the speed at which I could read data, as function of used buffersize (the size that 'read(2)' would be called with) and the used software API. It turned out that any implementation I could think of was equally fast, even with buffer sizes as small as 4096 byte, they would read at the maximum speed (110 MB/s for my RAID)... with the exception of one implementation: where I used Gio::FileInputStream.

After a few hours of testing, profiling and debugging, I found the problem:

Inside InputStream::read_async, the call back slot is copied (which is a deep
copy, causing another call to new) "because it is deleted again in the callback",
Gio::SignalProxy_async_callback.

These two calls to new (and their destruction) are expensive in a tight loop
that reads from disk. Surely it is possible to not copy the slot, and not delete it in Gio::SignalProxy_async_callback? Both seem to be totally unnecessary and I cannot think of a disadvantage to just remove the copy and the delete.
Comment 1 Marko Anastasov 2008-09-27 00:20:25 UTC
We need a pointer to a slot object which is surely not going to go out of scope to pass to the C API; that's why the copy is made.

How much slower exactly is the reading in this case? I'm just curious, since you said you've measured.
Comment 2 Carlo Wood 2008-09-27 01:32:40 UTC
It's only 12%... but well, I have a fast cpu ;). On some machines I can imagine it would be more. However, I deleted the copy and it didn't get much faster :/.
I was convinced that this was it, because it showed up in my profile and when I put a break point on the "extra" function it was inside the deep copy of this slot. Now I'm thinking it's the collection of all extra things, the wrapping and unwrapping, also in the call to read_finish...

I suppose you should just close this "bug". I'm going to use the glib interface however, to me 12% makes a difference :p.
Comment 3 Carlo Wood 2008-09-27 03:37:26 UTC
Ok, it was false alarm... I waste 24 hours, but I finally really found what caused the difference: a bug in my benchmark program :/.
In order to be able to read 1 GB repeatedly from disk, the program clears
the disk cache by writing "1" to /proc/sys/vm/drop_caches. Since that also
drops all disk caches of the shared libraries, it then continue with a
"warm up" by calling the benchmark function with some small file to read,
and subsequently even sleeps 1 second to let other applications that are
running to load their stuff into RAM again if needed... Now due to this
async callback stuff, I was confused about were to start the timer; and
in the case of InputStream::read_async the timer wasn't started at all,
which means it was running since the beginning of the warm-up! So... the
delay that I've been looking for existed mainly of that sleep(1)! lol

Sorry to have waste some of your time in the process as well!