Bug 678318 – Gamma conversion slow

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 678318 - Gamma conversion slow


Summary:	Gamma conversion slow


Status:	RESOLVED FIXED

Product:	GEGL
Classification:	Other
Component:	babl
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Default Gegl Component Owner
QA Contact:	Default Gegl Component Owner

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2012-06-18 13:25 UTC by Alexander Larsson
Modified:	2013-05-03 13:41 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Add fast approximations of x^2.4 and x^(1/2.4) (7.41 KB, patch) 2012-06-18 13:28 UTC, Alexander Larsson	none	Details \| Review
Add fast approximations of x^2.4 and x^(1/2.4) (7.42 KB, patch) 2012-06-18 13:36 UTC, Alexander Larsson	none	Details \| Review
Use new approximations for gamma conversions (1.72 KB, patch) 2012-06-18 13:36 UTC, Alexander Larsson	none	Details \| Review
Add simple test app to test pow-24 accuracy (1.91 KB, patch) 2012-06-18 13:37 UTC, Alexander Larsson	none	Details \| Review
Newton and simd (12.85 KB, patch) 2013-05-03 12:05 UTC, Loren Merritt	none	Details \| Review

Description Alexander Larsson 2012-06-18 13:25:31 UTC

When I run a simple op like color balance on a rgb 8bpp image in the Geglified gimp it runs pretty slowly. Simple profiling shows that most of the time is spent in conversion to linear to gamma and back. Looking at the code it seems we can do this better, so I implemented the chebyshev approximation from:

http://stackoverflow.com/questions/6475373/optimizations-for-pow-with-const-non-integer-exponent/6478839#6478839

With this code doing x^2.4 and x^(1/2.4) the color balance plugin goes from reporting about 1.5 MPixels/sec to about 2 MPixels/sec, i.e. a speed up by 33%.

There is probably some other problem too, because i think for a rgb24 buffer I *should* be hitting some fastpath for this operation (I hope?).

Comment 1 Alexander Larsson 2012-06-18 13:28:44 UTC

Created attachment 216675 [details] [review]
Add fast approximations of x^2.4 and x^(1/2.4)

Use a chebyshev polynominal approximation of these to speed up
gamma conversion. Based on the post in:

http://stackoverflow.com/questions/6475373/optimizations-for-pow-with-const-non-integer-exponent/6478839#6478839

Comment 2 Alexander Larsson 2012-06-18 13:36:23 UTC

Created attachment 216676 [details] [review]
Add fast approximations of x^2.4 and x^(1/2.4)

Comment 3 Alexander Larsson 2012-06-18 13:36:52 UTC

Created attachment 216677 [details] [review]
Use new approximations for gamma conversions

Comment 4 Alexander Larsson 2012-06-18 13:37:20 UTC

Created attachment 216678 [details] [review]
Add simple test app to test pow-24 accuracy

Comment 5 Øyvind Kolås (pippin) 2012-06-18 14:04:00 UTC

I've pushed the two first commits, for such inner-loops the overhead of function calls start mattering as well, it might be beneficial to make it possible to inline all of this directly from util.h where it is used.

There is probably some other regression in GIMP biting us, I get similar performance to what you are reporting. A couple of months ago similar tests yielded ~8 megapixels/second IIRC.

Comment 6 Alexander Larsson 2012-06-18 14:17:14 UTC

Its a lot of code to inline...

Comment 7 Behdad Esfahbod 2012-09-25 13:21:01 UTC

The code looks awesome.  Have you tried vectorizing it?

Comment 8 Alexander Larsson 2012-10-02 13:43:38 UTC

Behdad: I have not, but it might be worth doing. You could e.g. do r g b in parallel.

Comment 9 Loren Merritt 2013-05-03 12:05:19 UTC

Created attachment 243174 [details] [review]
Newton and simd

I vectorized it. Also improved the scalar version.
Total speedup: 11x.

Comment 10 Michael Natterer 2013-05-03 12:37:07 UTC

WTF :) We keep you!

Comment 11 Øyvind Kolås (pippin) 2013-05-03 13:41:40 UTC

yay! thanks =)

commit 33bbf893b4b2c9a2ea9c2ccafc14a51c2c1757e7
Author: Loren Merritt <pengvado@akuvian.org>
Date:   Mon Apr 29 09:49:18 2013 +0000

    SSE2-optimized gamma correction
    
    7x faster than the scalar implementation.
    (4x the obvious way from simd, and the other 1.75x because I'm exploiting
    knowledge of the ieee754 float format rather than using portable frexp().)

commit f683335fb0d06aef7170fac523ae2ab83173a7de
Author: Loren Merritt <pengvado@akuvian.org>
Date:   Thu Apr 25 17:50:53 2013 +0000

    Optimize gamma correction
    
    Switch from Chebyshev polynomial to Newton's method, which is both simpler a
    1.5x faster for the same precision.