GNOME Bugzilla – Bug 678318
Gamma conversion slow
Last modified: 2013-05-03 13:41:40 UTC
When I run a simple op like color balance on a rgb 8bpp image in the Geglified gimp it runs pretty slowly. Simple profiling shows that most of the time is spent in conversion to linear to gamma and back. Looking at the code it seems we can do this better, so I implemented the chebyshev approximation from: http://stackoverflow.com/questions/6475373/optimizations-for-pow-with-const-non-integer-exponent/6478839#6478839 With this code doing x^2.4 and x^(1/2.4) the color balance plugin goes from reporting about 1.5 MPixels/sec to about 2 MPixels/sec, i.e. a speed up by 33%. There is probably some other problem too, because i think for a rgb24 buffer I *should* be hitting some fastpath for this operation (I hope?).
Created attachment 216675 [details] [review] Add fast approximations of x^2.4 and x^(1/2.4) Use a chebyshev polynominal approximation of these to speed up gamma conversion. Based on the post in: http://stackoverflow.com/questions/6475373/optimizations-for-pow-with-const-non-integer-exponent/6478839#6478839
Created attachment 216676 [details] [review] Add fast approximations of x^2.4 and x^(1/2.4)
Created attachment 216677 [details] [review] Use new approximations for gamma conversions
Created attachment 216678 [details] [review] Add simple test app to test pow-24 accuracy
I've pushed the two first commits, for such inner-loops the overhead of function calls start mattering as well, it might be beneficial to make it possible to inline all of this directly from util.h where it is used. There is probably some other regression in GIMP biting us, I get similar performance to what you are reporting. A couple of months ago similar tests yielded ~8 megapixels/second IIRC.
Its a lot of code to inline...
The code looks awesome. Have you tried vectorizing it?
Behdad: I have not, but it might be worth doing. You could e.g. do r g b in parallel.
Created attachment 243174 [details] [review] Newton and simd I vectorized it. Also improved the scalar version. Total speedup: 11x.
WTF :) We keep you!
yay! thanks =) commit 33bbf893b4b2c9a2ea9c2ccafc14a51c2c1757e7 Author: Loren Merritt <pengvado@akuvian.org> Date: Mon Apr 29 09:49:18 2013 +0000 SSE2-optimized gamma correction 7x faster than the scalar implementation. (4x the obvious way from simd, and the other 1.75x because I'm exploiting knowledge of the ieee754 float format rather than using portable frexp().) commit f683335fb0d06aef7170fac523ae2ab83173a7de Author: Loren Merritt <pengvado@akuvian.org> Date: Thu Apr 25 17:50:53 2013 +0000 Optimize gamma correction Switch from Chebyshev polynomial to Newton's method, which is both simpler a 1.5x faster for the same precision.