Bug 795686 – CIE: Add Intel SIMD versions of conversions

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 795686 - CIE: Add Intel SIMD versions of conversions


Summary:	CIE: Add Intel SIMD versions of conversions


Status:	RESOLVED OBSOLETE

Product:	GEGL
Classification:	Other
Component:	babl
Version:	git master
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Default Gegl Component Owner
QA Contact:	Default Gegl Component Owner

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2018-04-30 08:53 UTC by Debarshi Ray
Modified:	2018-05-22 12:23 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
build: Add scaffolding for SSE3 (4.36 KB, patch) 2018-04-30 09:19 UTC, Debarshi Ray	none	Details \| Review
CIE: Add an SSE3 version of "RGBA float" to "CIE Lab alpha float" (10.57 KB, patch) 2018-04-30 09:19 UTC, Debarshi Ray	none	Details \| Review
Test program used for measurements (1.50 KB, text/plain) 2018-05-01 07:48 UTC, Debarshi Ray		Details
build: Add scaffolding for SSE3 (4.46 KB, patch) 2018-05-10 12:43 UTC, Debarshi Ray	committed	Details \| Review
CIE: Add an SSE2 version of "RGBA float" to "CIE Lab alpha float" (9.88 KB, patch) 2018-05-10 12:47 UTC, Debarshi Ray	committed	Details \| Review
CIE: Add an SSE2 version of "RGBA float" to "CIE L float" (2.87 KB, patch) 2018-05-14 10:02 UTC, Debarshi Ray	committed	Details \| Review

Description Debarshi Ray 2018-04-30 08:53:23 UTC

See patches.

Comment 1 Debarshi Ray 2018-04-30 09:19:26 UTC

Created attachment 371540 [details] [review]
build: Add scaffolding for SSE3

Comment 2 Debarshi Ray 2018-04-30 09:19:57 UTC

Created attachment 371541 [details] [review]
CIE: Add an SSE3 version of "RGBA float" to "CIE Lab alpha float"

Comment 3 Debarshi Ray 2018-04-30 09:30:41 UTC

Patches are also in babl.git:wip/rishi/cie-simd

On an older Intel i7 Sandybridge, converting 15 megapixels from "RGBA float" to "CIE Lab alpha float" now takes .283s instead of .388s with the current scalar conversions, and 0.437s before bug 791837. That's an improvement of 27% and 35% respectively.

Those numbers aren't as awesome I had expected them to be, but at least the gains are measurable. This my first time writing SIMD code, so maybe there's room for further optimizations. To me, the need to do dot products of floating point vectors and relatively accurate cube roots seem to be the sticking point of these conversions.

Comment 4 Debarshi Ray 2018-05-01 07:48:22 UTC

Created attachment 371571 [details]
Test program used for measurements

Comment 5 Debarshi Ray 2018-05-03 06:46:44 UTC

On an Intel i7 Haswell, converting 15 megapixels from "RGBA float" to "CIE Lab alpha float" now takes 0.23s instead of 0.27s with the current scalar conversions, and 0.35s before bug 791837.

Comment 6 Debarshi Ray 2018-05-10 12:43:17 UTC

Created attachment 371892 [details] [review]
build: Add scaffolding for SSE3

Comment 7 Debarshi Ray 2018-05-10 12:47:08 UTC

Created attachment 371894 [details] [review]
CIE: Add an SSE2 version of "RGBA float" to "CIE Lab alpha float"

This is a better implementation than the previous version. It implicitly unrolls the loop four times; avoids the horizontal summation, which reduces the CPU requirement to SSE2; and can be more easily adapted to different RGB and CIE Lab variants.

On an Intel i7 Haswell, converting 15 megapixels from "RGBA float" to "CIE Lab alpha float" now takes 0.13s instead of 0.27s with the current scalar conversions, and 0.35s before bug 791837.

Comment 8 Debarshi Ray 2018-05-10 14:19:35 UTC

On an older Intel i7 Sandybridge, converting 15 megapixels from "RGBA float" to "CIE Lab alpha float" now takes .22s instead of .388s with the current scalar conversions, and 0.437s before bug 791837.

Comment 9 Debarshi Ray 2018-05-14 10:02:27 UTC

Created attachment 372002 [details] [review]
CIE: Add an SSE2 version of "RGBA float" to "CIE L float"

See commit bdcd090c17aebd8f for the original need for a "RGBA float" to "CIE L float" conversion.

On an Intel i7 Haswell, converting 15 megapixels from "RGBA float" to "CIE L float" with SSE2 now takes 0.056s.  Earlier the indirect conversion via "Y float" took 0.107s and the direct conversion took 0.111s.

Comment 10 Debarshi Ray 2018-05-14 14:06:00 UTC

On an older Intel i7 Sandybridge, converting 15 megapixels from "RGBA float" to "CIE L float" now takes 0.073s instead of 0.149s.

Comment 11 GNOME Infrastructure Team 2018-05-22 12:23:18 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gegl/issues/68.