After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 791994 - gblur-1d: Optimize by exploiting cache lines
gblur-1d: Optimize by exploiting cache lines
Status: RESOLVED FIXED
Product: GEGL
Classification: Other
Component: operations
git master
Other All
: Normal normal
: ---
Assigned To: Default Gegl Component Owner
Default Gegl Component Owner
Depends on:
Blocks:
 
 
Reported: 2017-12-27 22:57 UTC by Debarshi Ray
Modified: 2018-01-05 01:51 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
operations/common/gblur-1d: Optimize by exploiting cache lines (17.22 KB, patch)
2017-12-27 22:58 UTC, Debarshi Ray
committed Details | Review

Description Debarshi Ray 2017-12-27 22:57:05 UTC
I have been studying the performance of gegl:gaussian-blur with pippin's help. My motivation is to speed-up operations like gegl:shadows-highlights and gegl:unsharp-mask, which are some of the slower operations used in gnome-photos.

There is an opportunity to improve the locality of memory accesses in the IIR filter by exploiting the cache lines. Having an outer loop for each colour channel prevents us from fully utilizing the cache. Manually unrolling it and processing the entire pixel at once seems better. It would increase the size of the temporary buffer, but given that people have been keeping multiple GdkPixbuf's around in memory, I don't think it would be the end of the world if we used a "sizeof (double) * n_channels * height/width" buffer instead of "sizeof (double) * height/width".

Note that until recently, the performance of the Young & van Vliet IIR blur was being dwarfed by the time taken to read and write GeglBuffer columns. It was particularly bad when it involved Babl conversions. See the series of GeglBuffer commits from pippin to address that.

Here are some readings based on perf/test-blur:

* Time taken to only read rows or columns and nothing else:
  https://rishi.fedorapeople.org/gegl-gaussian-blur-get-small-buffer

* Time taken to only write rows or columns and nothing else:
  https://rishi.fedorapeople.org/gegl-gaussian-blur-set-small-buffer

* Time taken by the entire IIR filter:
  https://rishi.fedorapeople.org/gegl-gaussian-blur-small-buffer

("small buffer" refers to the 1024x1024 buffers used by perf/test-blur because I was also playing with a 2048x2048 "big buffer" to see how it affects the outcome.)

perf/test-unsharpmask has now gone from 150 MB/s to 163 MB/s (with pippin's GeglBuffer optimization) to 200 MB/s (with my proposed improvement to cache use).
Comment 1 Debarshi Ray 2017-12-27 22:58:16 UTC
Created attachment 366030 [details] [review]
operations/common/gblur-1d: Optimize by exploiting cache lines
Comment 2 Debarshi Ray 2017-12-27 22:59:55 UTC
All the readings were taken on an Intel i7 Haswell [1] with 8G RAM and an SSD.

[1] http://www.7-cpu.com/cpu/Haswell.html
Comment 3 Øyvind Kolås (pippin) 2018-01-03 22:02:35 UTC
Also seeing ~15% speedups for many gaussian blur code paths on a

Intel(R) Core(TM) i5-7Y54 CPU @ 1.20GHz
Comment 4 Øyvind Kolås (pippin) 2018-01-05 01:47:32 UTC
I've been running with this patch in my tree with and seeing no other change than improved performance.

commit e7f98682b9bddc7f7236fabe4cbc77e701510aa4
Author: Debarshi Ray <debarshir@gnome.org>
Date:   Mon Dec 25 02:47:42 2017 +0100

    operations/common/gblur-1d: Optimize by exploiting cache lines
    
    Leapfrogging through the buffer for each colour channel doesn't fully
    utilize the data cache. When a pixel's colour channel is accessed, it
    is very likely that the other channels have also been read into the
    cache because they are on the same cache line. Therefore, computing
    the entire pixel at the same time avoids having to reload the same
    cache line again.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=791994
Comment 5 Øyvind Kolås (pippin) 2018-01-05 01:51:09 UTC
Review of attachment 366030 [details] [review]:

commited