GNOME Bugzilla – Bug 685637
[audioresample] Performance improvements & ARM NEON support
Last modified: 2012-10-25 15:50:12 UTC
This set of patches introduces resampler speedups and support for the ARM NEON instruction set. The patches were originally done for Speex upstream by Jyri Sarha, and can be found in the Speex mailing list: http://lists.xiph.org/pipermail/speex-dev/2011-September/008242.html http://lists.xiph.org/pipermail/speex-dev/2011-September/008243.html http://lists.xiph.org/pipermail/speex-dev/2011-September/008241.html http://lists.xiph.org/pipermail/speex-dev/2011-September/008240.html http://lists.xiph.org/pipermail/speex-dev/2011-September/008239.html http://lists.xiph.org/pipermail/speex-dev/2011-September/008238.html These patches were then discovered by Branislav Katreniak ( branislav.katreniak@streamunlimited.com ) for our projects at StreamUnlimited ( http://streamunlimited.com/ ). He did an initial adaptation of the patches for the GStreamer audioresample element. Tests showed a significant speedup; resampling of 44.1 to 48 kHz ran up to 5 times faster on a 720Mhz ARM Cortex-A8 CPU. I then adapted the patches for GStreamer 1.0 and added extra functionality to control the behavior better and to check for ARM NEON support in configure.ac. Two new properties are introduced: sinc-filter-mode and sinc-filter-auto-threshold. These control the way the sinc filter table behaves and gets computed. The default behavior is what has been used until now, "interpolated" (sparse filter table with cubic interpolation); other modes are "full" (all filter values computed) and "auto" (full table unless estimated size of said table exceeds the threshold). The "auto" mode exists, because in extreme cases (like resampling from 48000 to 48001 Hz) the full filter table can become very large, and take a long time to compute. Default threshold is 1 MB. As for the default mode: I thought setting it to "interpolated" is useful for backwards compatibility, but "auto" as default makes sense, too. Thoughts? In addition, the very first patch reintroduces SSE header checks to configure.ac , which re-enables the SSE/SSE2 code paths inside the resampler and also helps with the performance.
Created attachment 225967 [details] [review] [PATCH 1/4] Reintroduced xmmintrin.h/emmintrin.h header checks
Created attachment 225968 [details] [review] [PATCH 2/4] audioresample: sinc filter performance improvements
Created attachment 225969 [details] [review] [PATCH 3/4] audioresample: changed inner_product_single semantics
Created attachment 225970 [details] [review] [PATCH 4/4] audioresample: added ARM NEON support
Are these changes to the speex resampler code upstream already? If they are, please just merge the speex resample code from upstream in a single, separate commit. If they are not, please first merge the latest speex resample code from upstream in a separate commit before changing code in there to make it easier to maintain this. And also update the diff against upstream in the README file in the end. Other than that these changes look good.
(In reply to comment #5) > Are these changes to the speex resampler code upstream already? If they are, > please just merge the speex resample code from upstream in a single, separate > commit. If they are not, please first merge the latest speex resample code from > upstream in a separate commit before changing code in there to make it easier > to maintain this. And also update the diff against upstream in the README file > in the end. > > Other than that these changes look good. Looking at the Speex git repository, the last changes to the resampler were done in 2009-06-18 . According to the audioresample README, its Speex resampler snapshot was taken 2009-11-10. A code comparison shows only formatting changes (spaces etc.) and GStreamer specific additions (preprocessor define directives, allocators, etc.). The changes in the README are incomplete, though. So I don't think the resampler code needs an upstream update, since upstream is pretty much dead (makes sense - Opus makes Speex obsolete). I'll see if I can update the README file.
Thanks, speex is dead but IIRC this resampler is also used inside opus
Review of attachment 225969 [details] [review]: ::: gst/audioresample/resample.c @@ +480,3 @@ SSE_END (INNER_PRODUCT_SINGLE) #endif + out[out_stride * out_sample++] = sum; Why is the SATURATE32PSHR moved above the OVERRIDE_INNER_PRODUCT_SINGLE case? Shouldn't it stay at the same place because sum is changed in that case? @@ +619,3 @@ SSE_END (INTERPOLATE_PRODUCT_SINGLE) #endif + out[out_stride * out_sample++] = sum; Same here
Comment on attachment 225968 [details] [review] [PATCH 2/4] audioresample: sinc filter performance improvements Will push after the GIT is open for 1.1
Review of attachment 225970 [details] [review]: ::: configure.ac @@ +207,3 @@ + #include <arm_neon.h> + int32x4_t testfunc(int16_t *a, int16_t *b) { + return vmull_s16(vld1_s16(a), vld1_s16(b)); I don't think there should be a configure parameter for this. Just check if the opcodes can be used and use them if they can be used ::: gst/audioresample/resample.c @@ +556,3 @@ } sum = accum[0] + accum[1] + accum[2] + accum[3]; +#if defined(OVERRIDE_INNER_PRODUCT_DOUBLE) && defined(_USE_SSE2) No neon optimizations for inner product double and the interpolations?
> > I don't think there should be a configure parameter for this. Just check if the > opcodes can be used and use them if they can be used I tend to agree in hindsight, especially since the tests for the SSE/SSE2 headers arent configurable either. > > ::: gst/audioresample/resample.c > @@ +556,3 @@ > } > sum = accum[0] + accum[1] + accum[2] + accum[3]; > +#if defined(OVERRIDE_INNER_PRODUCT_DOUBLE) && defined(_USE_SSE2) > > No neon optimizations for inner product double and the interpolations? Unfortunately, no. double precision isn't supported by NEON (perhaps with the brand new ARM64 architecture, but I have little knowledge of it), and the original author wrote only optimizations for the inner product single case. Still, better than nothing..
Created attachment 226509 [details] [review] audioresample: changed inner_product_single semantics (updated version) Moved the SATURATE32PSHR further down as suggested - the location in the original patch does not make sense and is likely to be an error. As a side-effect, it made the patch simpler.
Created attachment 226510 [details] [review] audioresample: added ARM NEON support (updated version) Configure option removed, it indeed did not make much sense.
Thanks, will push this after master is open for new features again :)