GNOME Bugzilla – Bug 636562
SSE misalignment removal in "audioresample"
Last modified: 2013-07-17 16:14:20 UTC
Audio Resample codec in media stack (gstreamer) of MeeGo should benefit from SSE because most related signal sampling codec use the same SSE friendly theory (Shannon Sampling Theory).However, in fact,such codec disables SSE implementation, because only trivial perf. gain due to the SSE misalignment. So we need to : a. remove SSE misalignment and restore the SSE perf. gain b. enable SSE implementation steps to reproduce: 1. launch simple gstreamer pipeline to play wav file: "gst-launch filesrc location=astero8k.wav ! wavparse ! audioconvert ! audioresample ! gconfaudiosink" 2. Use "sudo perf record -f -a" to profile and use "sudo perf report" to display hot spot 3. It shows that the hot function "resampler_basic_interpolate_single" occupies a lot of overhead which should be optimized by SSE implementation packed SSE instructions should be used, the overhead of function "resampler_basic_interpolate_single" should not be so high. link from MeeGo bugzilla: http://bugs.meego.com/show_bug.cgi?id=5603
Created attachment 175904 [details] prototype (optimization) code to remove SSE misalignment Yongnian tried to write several prototype functions to remove SSE misalignment for one function. Using the best prototype and after enabling SSE, the overhead is reduced by 0.6x; system CPU utilization and C0 percentage (power) is reduced as well. |-----------|--------------------------|------------|--------------| | prototype | hot function overhead | CPU util% | C0 percentage| |-----------|--------------------------|------------|--------------| | No.2 | 69.60%->26.29% (0.4x) |12.6%->9.6% | 8.5%->7.6% | |-----------|--------------------------|------------|--------------|
There's also a much more trivial way of drastically reducing the cpu load if you don't care about quality *that* much : reducing the 'filter-length' property. Especially when downscaling the sampling rate the quality difference is barely noticeable between filter-length=64 (default) and filter-length=16 at a vastly reduced cpu usage. Maybe those optimisations could be done with ORC, which would provide optimisations not only for one platform but for all platforms for which ORC has a backend (x86, mmx, sseX, arm, neon, ...).
You could also try to enable the SSE and ARM optimizations that already exist in the speex resampler. And it also might make sense to resync the resampler again with speex, there were probably some speed related changes ;)
No, apparently there were no changes in speex to the resampler since last time we synced. Anyway, a patch that enables the assembly optimizations and selects the correct functions at runtime depending on the CPUs capabilities would definitely be accepted :)
Created attachment 193559 [details] [review] audioresample: compile in SSE code, and use it where available
Still no changes in the speex resampler. Attached is a patch to use the SSE code when running on a SSE machine. It's SSE specific, but SSE seems to be the only special case for those larger functions. The switch is also hidden through macros, which keeps the code tidy and clear, but which might offend someone's sense of style :P Rough timings in seconds for decoding some Ogg/Vorbis file and resampling it twice before going to fakesink: before: 32.2 after, with SSE functions empty: 6.58 after, with SSE functions enabled: 14.5 after, with SSE functions disabled: 31.2 I don't have a machine without SSE, so I'm not totally sure it's going to work on one, testing by someone with one would be nice. Also on a different arch, just in case.
Review of attachment 193559 [details] [review]: ::: gst/audioresample/resample.c @@ +89,3 @@ #include "speex_resampler.h" #include "arch.h" +#if defined HAVE_CPU_I386 || defined HAVE_CPU_X86_64 You should also check for xmmintrin.h (SSE) emmintrin.h (SSE2) in configure and only enable the SSE(2) code if they're available @@ +90,3 @@ #include "arch.h" +#if defined HAVE_CPU_I386 || defined HAVE_CPU_X86_64 +#include "sse.h" Where do you get sse.h from? @@ +949,3 @@ +#if defined HAVE_CPU_I386 || defined HAVE_CPU_X86_64 + st->use_sse = sse_ok (); You should make usage of SSE and SSE2 conditional based on the CPU capability function in ORC. Note that there is SSE and SSE2 code in resample_sse.h and ideally you want to compile in both and select the better one.
I seem to have forgotten to add sse.h (inline assembler to run cpuid, copied from somewhere else in GStreamer). No matter though, if I'm to use ORC instead. Good point about SSE2, it's tested at build time, but not run time. They're for different functions though AFAICT. I'll post another patch soon.
Created attachment 193643 [details] [review] audioresample: use SSE/SSE2 when possible Compile in the code on i386 and x86_64, and use ORC to determine when the runtime platform can run the code.
Created attachment 193644 [details] [review] audioresample: fix SSE2 routines They were still taking floats as input, when the resampler tried to pass doubles
Comment on attachment 193644 [details] [review] audioresample: fix SSE2 routines you need to add new functions that take double parameters and use them in the double case. the float variants are still required for high-quality float resampling
Created attachment 193648 [details] [review] audioresample: use SSE/SSE2 when possible Compile in the code on i386 and x86_64, and use ORC to determine when the runtime platform can run the code.
Created attachment 193649 [details] [review] audioresample: fix SSE2 routines They were still taking floats as input, when the resampler tried to pass doubles
Part of the float->double got placed in the wrong patch by a rebase mistake, fixed patches uploaded. I'll look at where those float variants are used, I didn't see that before.
Created attachment 193652 [details] [review] audioresample: fix quality setting being ignored by the resampler state
Created attachment 193653 [details] [review] audioresample: fix SSE2 building with double precision The full double implementation was missing.
Created attachment 193654 [details] [review] audioresample: use SSE/SSE2 when possible Compile in the code on i386 and x86_64, and use ORC to determine when the runtime platform can run the code.
I found the mixed float/double case, all three cases tested and sound like they should. Patches updated to match.
commit 49ec6899f4909e94cfca2d7844800c1ac77f80a6 Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Thu Aug 11 18:50:08 2011 +0100 audioresample: fix quality setting being ignored by the resampler state https://bugzilla.gnome.org/show_bug.cgi?id=636562 commit 746415a6e3a40d4d688aa25bcbb7062b54dfd4a8 Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Thu Aug 11 15:54:15 2011 +0100 audioresample: use SSE/SSE2 when possible Compile in the code on i386 and x86_64, and use ORC to determine when the runtime platform can run the code. https://bugzilla.gnome.org/show_bug.cgi?id=636562 commit 58fd202b7d5cb0e734713be153cca63330ba87e0 Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Thu Aug 11 19:23:42 2011 +0100 audioresample: fix SSE2 building with double precision The full double implementation was missing. https://bugzilla.gnome.org/show_bug.cgi?id=636562
Is someone going to work on the original reason of this bug, the misalignment? A patch against latest GIT would be very useful
So far, nobody stepped up. Shall we close as INCOMPLETE?
What exactly is the misalignment problem?
Sebastian, ping
Akhil, see David's question in comment #22.
Since patches are committed, I mark as FIXED. Please reopen if this is not the case.
Re-opening, since I am not sure the patches address the original concern, and it sounds like something actually worth looking into. > What exactly is the misalignment problem? It looks like the Meego bug has more details.. Whoever is currently working on this from Meego: do you have any patches to be considered?
Still no reply. Original bug report (from meego.com) is dead. Do we close this ?
Let's close it.