After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 795686 - CIE: Add Intel SIMD versions of conversions
CIE: Add Intel SIMD versions of conversions
Status: RESOLVED OBSOLETE
Product: GEGL
Classification: Other
Component: babl
git master
Other All
: Normal normal
: ---
Assigned To: Default Gegl Component Owner
Default Gegl Component Owner
Depends on:
Blocks:
 
 
Reported: 2018-04-30 08:53 UTC by Debarshi Ray
Modified: 2018-05-22 12:23 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
build: Add scaffolding for SSE3 (4.36 KB, patch)
2018-04-30 09:19 UTC, Debarshi Ray
none Details | Review
CIE: Add an SSE3 version of "RGBA float" to "CIE Lab alpha float" (10.57 KB, patch)
2018-04-30 09:19 UTC, Debarshi Ray
none Details | Review
Test program used for measurements (1.50 KB, text/plain)
2018-05-01 07:48 UTC, Debarshi Ray
  Details
build: Add scaffolding for SSE3 (4.46 KB, patch)
2018-05-10 12:43 UTC, Debarshi Ray
committed Details | Review
CIE: Add an SSE2 version of "RGBA float" to "CIE Lab alpha float" (9.88 KB, patch)
2018-05-10 12:47 UTC, Debarshi Ray
committed Details | Review
CIE: Add an SSE2 version of "RGBA float" to "CIE L float" (2.87 KB, patch)
2018-05-14 10:02 UTC, Debarshi Ray
committed Details | Review

Description Debarshi Ray 2018-04-30 08:53:23 UTC
See patches.
Comment 1 Debarshi Ray 2018-04-30 09:19:26 UTC
Created attachment 371540 [details] [review]
build: Add scaffolding for SSE3
Comment 2 Debarshi Ray 2018-04-30 09:19:57 UTC
Created attachment 371541 [details] [review]
CIE: Add an SSE3 version of "RGBA float" to "CIE Lab alpha float"
Comment 3 Debarshi Ray 2018-04-30 09:30:41 UTC
Patches are also in babl.git:wip/rishi/cie-simd

On an older Intel i7 Sandybridge, converting 15 megapixels from "RGBA float" to "CIE Lab alpha float" now takes .283s instead of .388s with the current scalar conversions, and 0.437s before bug 791837. That's an improvement of 27% and 35% respectively.

Those numbers aren't as awesome I had expected them to be, but at least the gains are measurable. This my first time writing SIMD code, so maybe there's room for further optimizations. To me, the need to do dot products of floating point vectors and relatively accurate cube roots seem to be the sticking point of these conversions.
Comment 4 Debarshi Ray 2018-05-01 07:48:22 UTC
Created attachment 371571 [details]
Test program used for measurements
Comment 5 Debarshi Ray 2018-05-03 06:46:44 UTC
On an Intel i7 Haswell, converting 15 megapixels from "RGBA float" to "CIE Lab alpha float" now takes 0.23s instead of 0.27s with the current scalar conversions, and 0.35s before bug 791837.
Comment 6 Debarshi Ray 2018-05-10 12:43:17 UTC
Created attachment 371892 [details] [review]
build: Add scaffolding for SSE3
Comment 7 Debarshi Ray 2018-05-10 12:47:08 UTC
Created attachment 371894 [details] [review]
CIE: Add an SSE2 version of "RGBA float" to "CIE Lab alpha float"

This is a better implementation than the previous version. It implicitly unrolls the loop four times; avoids the horizontal summation, which reduces the CPU requirement to SSE2; and can be more easily adapted to different RGB and CIE Lab variants.

On an Intel i7 Haswell, converting 15 megapixels from "RGBA float" to "CIE Lab alpha float" now takes 0.13s instead of 0.27s with the current scalar conversions, and 0.35s before bug 791837.
Comment 8 Debarshi Ray 2018-05-10 14:19:35 UTC
On an older Intel i7 Sandybridge, converting 15 megapixels from "RGBA float" to "CIE Lab alpha float" now takes .22s instead of .388s with the current scalar conversions, and 0.437s before bug 791837.
Comment 9 Debarshi Ray 2018-05-14 10:02:27 UTC
Created attachment 372002 [details] [review]
CIE: Add an SSE2 version of "RGBA float" to "CIE L float"

See commit bdcd090c17aebd8f for the original need for a "RGBA float" to "CIE L float" conversion.

On an Intel i7 Haswell, converting 15 megapixels from "RGBA float" to "CIE L float" with SSE2 now takes 0.056s.  Earlier the indirect conversion via "Y float" took 0.107s and the direct conversion took 0.111s.
Comment 10 Debarshi Ray 2018-05-14 14:06:00 UTC
On an older Intel i7 Sandybridge, converting 15 megapixels from "RGBA float" to "CIE L float" now takes 0.073s instead of 0.149s.
Comment 11 GNOME Infrastructure Team 2018-05-22 12:23:18 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gegl/issues/68.