GNOME Bugzilla – Bug 795686
CIE: Add Intel SIMD versions of conversions
Last modified: 2018-05-22 12:23:18 UTC
See patches.
Created attachment 371540 [details] [review] build: Add scaffolding for SSE3
Created attachment 371541 [details] [review] CIE: Add an SSE3 version of "RGBA float" to "CIE Lab alpha float"
Patches are also in babl.git:wip/rishi/cie-simd On an older Intel i7 Sandybridge, converting 15 megapixels from "RGBA float" to "CIE Lab alpha float" now takes .283s instead of .388s with the current scalar conversions, and 0.437s before bug 791837. That's an improvement of 27% and 35% respectively. Those numbers aren't as awesome I had expected them to be, but at least the gains are measurable. This my first time writing SIMD code, so maybe there's room for further optimizations. To me, the need to do dot products of floating point vectors and relatively accurate cube roots seem to be the sticking point of these conversions.
Created attachment 371571 [details] Test program used for measurements
On an Intel i7 Haswell, converting 15 megapixels from "RGBA float" to "CIE Lab alpha float" now takes 0.23s instead of 0.27s with the current scalar conversions, and 0.35s before bug 791837.
Created attachment 371892 [details] [review] build: Add scaffolding for SSE3
Created attachment 371894 [details] [review] CIE: Add an SSE2 version of "RGBA float" to "CIE Lab alpha float" This is a better implementation than the previous version. It implicitly unrolls the loop four times; avoids the horizontal summation, which reduces the CPU requirement to SSE2; and can be more easily adapted to different RGB and CIE Lab variants. On an Intel i7 Haswell, converting 15 megapixels from "RGBA float" to "CIE Lab alpha float" now takes 0.13s instead of 0.27s with the current scalar conversions, and 0.35s before bug 791837.
On an older Intel i7 Sandybridge, converting 15 megapixels from "RGBA float" to "CIE Lab alpha float" now takes .22s instead of .388s with the current scalar conversions, and 0.437s before bug 791837.
Created attachment 372002 [details] [review] CIE: Add an SSE2 version of "RGBA float" to "CIE L float" See commit bdcd090c17aebd8f for the original need for a "RGBA float" to "CIE L float" conversion. On an Intel i7 Haswell, converting 15 megapixels from "RGBA float" to "CIE L float" with SSE2 now takes 0.056s. Earlier the indirect conversion via "Y float" took 0.107s and the direct conversion took 0.111s.
On an older Intel i7 Sandybridge, converting 15 megapixels from "RGBA float" to "CIE L float" now takes 0.073s instead of 0.149s.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gegl/issues/68.