After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 678318 - Gamma conversion slow
Gamma conversion slow
Status: RESOLVED FIXED
Product: GEGL
Classification: Other
Component: babl
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Default Gegl Component Owner
Default Gegl Component Owner
Depends on:
Blocks:
 
 
Reported: 2012-06-18 13:25 UTC by Alexander Larsson
Modified: 2013-05-03 13:41 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Add fast approximations of x^2.4 and x^(1/2.4) (7.41 KB, patch)
2012-06-18 13:28 UTC, Alexander Larsson
none Details | Review
Add fast approximations of x^2.4 and x^(1/2.4) (7.42 KB, patch)
2012-06-18 13:36 UTC, Alexander Larsson
none Details | Review
Use new approximations for gamma conversions (1.72 KB, patch)
2012-06-18 13:36 UTC, Alexander Larsson
none Details | Review
Add simple test app to test pow-24 accuracy (1.91 KB, patch)
2012-06-18 13:37 UTC, Alexander Larsson
none Details | Review
Newton and simd (12.85 KB, patch)
2013-05-03 12:05 UTC, Loren Merritt
none Details | Review

Description Alexander Larsson 2012-06-18 13:25:31 UTC
When I run a simple op like color balance on a rgb 8bpp image in the Geglified gimp it runs pretty slowly. Simple profiling shows that most of the time is spent in conversion to linear to gamma and back. Looking at the code it seems we can do this better, so I implemented the chebyshev approximation from:

http://stackoverflow.com/questions/6475373/optimizations-for-pow-with-const-non-integer-exponent/6478839#6478839

With this code doing x^2.4 and x^(1/2.4) the color balance plugin goes from reporting about 1.5 MPixels/sec to about 2 MPixels/sec, i.e. a speed up by 33%.

There is probably some other problem too, because i think for a rgb24 buffer I *should* be hitting some fastpath for this operation (I hope?).
Comment 1 Alexander Larsson 2012-06-18 13:28:44 UTC
Created attachment 216675 [details] [review]
Add fast approximations of x^2.4 and x^(1/2.4)

Use a chebyshev polynominal approximation of these to speed up
gamma conversion. Based on the post in:

http://stackoverflow.com/questions/6475373/optimizations-for-pow-with-const-non-integer-exponent/6478839#6478839
Comment 2 Alexander Larsson 2012-06-18 13:36:23 UTC
Created attachment 216676 [details] [review]
Add fast approximations of x^2.4 and x^(1/2.4)
Comment 3 Alexander Larsson 2012-06-18 13:36:52 UTC
Created attachment 216677 [details] [review]
Use new approximations for gamma conversions
Comment 4 Alexander Larsson 2012-06-18 13:37:20 UTC
Created attachment 216678 [details] [review]
Add simple test app to test pow-24 accuracy
Comment 5 Øyvind Kolås (pippin) 2012-06-18 14:04:00 UTC
I've pushed the two first commits, for such inner-loops the overhead of function calls start mattering as well, it might be beneficial to make it possible to inline all of this directly from util.h where it is used.

There is probably some other regression in GIMP biting us, I get similar performance to what you are reporting. A couple of months ago similar tests yielded ~8 megapixels/second IIRC.
Comment 6 Alexander Larsson 2012-06-18 14:17:14 UTC
Its a lot of code to inline...
Comment 7 Behdad Esfahbod 2012-09-25 13:21:01 UTC
The code looks awesome.  Have you tried vectorizing it?
Comment 8 Alexander Larsson 2012-10-02 13:43:38 UTC
Behdad: I have not, but it might be worth doing. You could e.g. do r g b in parallel.
Comment 9 Loren Merritt 2013-05-03 12:05:19 UTC
Created attachment 243174 [details] [review]
Newton and simd

I vectorized it. Also improved the scalar version.
Total speedup: 11x.
Comment 10 Michael Natterer 2013-05-03 12:37:07 UTC
WTF :) We keep you!
Comment 11 Øyvind Kolås (pippin) 2013-05-03 13:41:40 UTC
yay! thanks =)

commit 33bbf893b4b2c9a2ea9c2ccafc14a51c2c1757e7
Author: Loren Merritt <pengvado@akuvian.org>
Date:   Mon Apr 29 09:49:18 2013 +0000

    SSE2-optimized gamma correction
    
    7x faster than the scalar implementation.
    (4x the obvious way from simd, and the other 1.75x because I'm exploiting
    knowledge of the ieee754 float format rather than using portable frexp().)

commit f683335fb0d06aef7170fac523ae2ab83173a7de
Author: Loren Merritt <pengvado@akuvian.org>
Date:   Thu Apr 25 17:50:53 2013 +0000

    Optimize gamma correction
    
    Switch from Chebyshev polynomial to Newton's method, which is both simpler a
    1.5x faster for the same precision.