After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 685637 - [audioresample] Performance improvements & ARM NEON support
[audioresample] Performance improvements & ARM NEON support
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
1.0.0
Other Linux
: Normal enhancement
: 1.1.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2012-10-07 01:39 UTC by Carlos Rafael Giani
Modified: 2012-10-25 15:50 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[PATCH 1/4] Reintroduced xmmintrin.h/emmintrin.h header checks (884 bytes, patch)
2012-10-07 01:39 UTC, Carlos Rafael Giani
committed Details | Review
[PATCH 2/4] audioresample: sinc filter performance improvements (27.12 KB, patch)
2012-10-07 01:40 UTC, Carlos Rafael Giani
committed Details | Review
[PATCH 3/4] audioresample: changed inner_product_single semantics (3.53 KB, patch)
2012-10-07 01:40 UTC, Carlos Rafael Giani
reviewed Details | Review
[PATCH 4/4] audioresample: added ARM NEON support (13.84 KB, patch)
2012-10-07 01:41 UTC, Carlos Rafael Giani
needs-work Details | Review
audioresample: changed inner_product_single semantics (updated version) (2.94 KB, patch)
2012-10-15 21:29 UTC, Carlos Rafael Giani
committed Details | Review
audioresample: added ARM NEON support (updated version) (13.45 KB, patch)
2012-10-15 21:31 UTC, Carlos Rafael Giani
committed Details | Review

Description Carlos Rafael Giani 2012-10-07 01:39:19 UTC
This set of patches introduces resampler speedups and support for the
ARM NEON instruction set. The patches were originally done for Speex
upstream by Jyri Sarha, and can be found in the Speex mailing list:

   http://lists.xiph.org/pipermail/speex-dev/2011-September/008242.html
   http://lists.xiph.org/pipermail/speex-dev/2011-September/008243.html
   http://lists.xiph.org/pipermail/speex-dev/2011-September/008241.html
   http://lists.xiph.org/pipermail/speex-dev/2011-September/008240.html
   http://lists.xiph.org/pipermail/speex-dev/2011-September/008239.html
   http://lists.xiph.org/pipermail/speex-dev/2011-September/008238.html

These patches were then discovered by Branislav Katreniak
( branislav.katreniak@streamunlimited.com ) for our projects at
StreamUnlimited ( http://streamunlimited.com/ ). He did an initial
adaptation of the patches for the GStreamer audioresample element.
Tests showed a significant speedup; resampling of 44.1 to 48 kHz ran
up to 5 times faster on a 720Mhz ARM Cortex-A8 CPU.

I then adapted the patches for GStreamer 1.0 and added extra
functionality to control the behavior better and to check for ARM NEON
support in configure.ac.

Two new properties are introduced: sinc-filter-mode and
sinc-filter-auto-threshold. These control the way the sinc filter table
behaves and gets computed. The default behavior is what has been used
until now, "interpolated" (sparse filter table with cubic interpolation);
other modes are "full" (all filter values computed) and "auto" (full table
unless estimated size of said table exceeds the threshold).

The "auto" mode exists, because in extreme cases (like resampling from
48000 to 48001 Hz) the full filter table can become very large, and take
a long time to compute. Default threshold is 1 MB.

As for the default mode: I thought setting it to "interpolated" is useful
for backwards compatibility, but "auto" as default makes sense, too.
Thoughts?

In addition, the very first patch reintroduces SSE header checks
to configure.ac , which re-enables the SSE/SSE2 code paths inside the
resampler and also helps with the performance.
Comment 1 Carlos Rafael Giani 2012-10-07 01:39:55 UTC
Created attachment 225967 [details] [review]
[PATCH 1/4] Reintroduced xmmintrin.h/emmintrin.h header checks
Comment 2 Carlos Rafael Giani 2012-10-07 01:40:32 UTC
Created attachment 225968 [details] [review]
[PATCH 2/4] audioresample: sinc filter performance improvements
Comment 3 Carlos Rafael Giani 2012-10-07 01:40:53 UTC
Created attachment 225969 [details] [review]
[PATCH 3/4] audioresample: changed inner_product_single semantics
Comment 4 Carlos Rafael Giani 2012-10-07 01:41:14 UTC
Created attachment 225970 [details] [review]
[PATCH 4/4] audioresample: added ARM NEON support
Comment 5 Sebastian Dröge (slomo) 2012-10-08 10:51:04 UTC
Are these changes to the speex resampler code upstream already? If they are, please just merge the speex resample code from upstream in a single, separate commit. If they are not, please first merge the latest speex resample code from upstream in a separate commit before changing code in there to make it easier to maintain this. And also update the diff against upstream in the README file in the end.

Other than that these changes look good.
Comment 6 Carlos Rafael Giani 2012-10-12 07:02:59 UTC
(In reply to comment #5)
> Are these changes to the speex resampler code upstream already? If they are,
> please just merge the speex resample code from upstream in a single, separate
> commit. If they are not, please first merge the latest speex resample code from
> upstream in a separate commit before changing code in there to make it easier
> to maintain this. And also update the diff against upstream in the README file
> in the end.
> 
> Other than that these changes look good.

Looking at the Speex git repository, the last changes to the resampler were done in 2009-06-18 . According to the audioresample README, its Speex resampler snapshot was taken 2009-11-10. A code comparison shows only formatting changes (spaces etc.) and GStreamer specific additions (preprocessor define directives, allocators, etc.). The changes in the README are incomplete, though.

So I don't think the resampler code needs an upstream update, since upstream is pretty much dead (makes sense - Opus makes Speex obsolete). I'll see if I can update the README file.
Comment 7 Sebastian Dröge (slomo) 2012-10-12 07:37:19 UTC
Thanks, speex is dead but IIRC this resampler is also used inside opus
Comment 8 Sebastian Dröge (slomo) 2012-10-14 09:08:40 UTC
Review of attachment 225969 [details] [review]:

::: gst/audioresample/resample.c
@@ +480,3 @@
     SSE_END (INNER_PRODUCT_SINGLE)
 #endif
+    out[out_stride * out_sample++] = sum;

Why is the SATURATE32PSHR moved above the OVERRIDE_INNER_PRODUCT_SINGLE case? Shouldn't it stay at the same place because sum is changed in that case?

@@ +619,3 @@
     SSE_END (INTERPOLATE_PRODUCT_SINGLE)
 #endif
+    out[out_stride * out_sample++] = sum;

Same here
Comment 9 Sebastian Dröge (slomo) 2012-10-14 09:09:09 UTC
Comment on attachment 225968 [details] [review]
[PATCH 2/4] audioresample: sinc filter performance improvements

Will push after the GIT is open for 1.1
Comment 10 Sebastian Dröge (slomo) 2012-10-14 09:13:09 UTC
Review of attachment 225970 [details] [review]:

::: configure.ac
@@ +207,3 @@
+                 #include <arm_neon.h>
+                 int32x4_t testfunc(int16_t *a, int16_t *b) {
+                     return vmull_s16(vld1_s16(a), vld1_s16(b));

I don't think there should be a configure parameter for this. Just check if the opcodes can be used and use them if they can be used

::: gst/audioresample/resample.c
@@ +556,3 @@
     }
     sum = accum[0] + accum[1] + accum[2] + accum[3];
+#if defined(OVERRIDE_INNER_PRODUCT_DOUBLE) && defined(_USE_SSE2)

No neon optimizations for inner product double and the interpolations?
Comment 11 Carlos Rafael Giani 2012-10-15 07:43:53 UTC
> 
> I don't think there should be a configure parameter for this. Just check if the
> opcodes can be used and use them if they can be used

I tend to agree in hindsight, especially since the tests for the SSE/SSE2 headers arent configurable either.

> 
> ::: gst/audioresample/resample.c
> @@ +556,3 @@
>      }
>      sum = accum[0] + accum[1] + accum[2] + accum[3];
> +#if defined(OVERRIDE_INNER_PRODUCT_DOUBLE) && defined(_USE_SSE2)
> 
> No neon optimizations for inner product double and the interpolations?

Unfortunately, no. double precision isn't supported by NEON (perhaps with the brand new ARM64 architecture, but I have little knowledge of it), and the original author wrote only optimizations for the inner product single case. Still, better than nothing..
Comment 12 Carlos Rafael Giani 2012-10-15 21:29:26 UTC
Created attachment 226509 [details] [review]
audioresample: changed inner_product_single semantics (updated version)

Moved the SATURATE32PSHR further down as suggested - the location in the original patch does not make sense and is likely to be an error. As a side-effect, it made the patch simpler.
Comment 13 Carlos Rafael Giani 2012-10-15 21:31:42 UTC
Created attachment 226510 [details] [review]
audioresample: added ARM NEON support (updated version)

Configure option removed, it indeed did not make much sense.
Comment 14 Sebastian Dröge (slomo) 2012-10-22 07:10:43 UTC
Thanks, will push this after master is open for new features again :)