After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 636562 - SSE misalignment removal in "audioresample"
SSE misalignment removal in "audioresample"
Status: RESOLVED INCOMPLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
0.10.30
Other Linux
: Normal enhancement
: NONE
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2010-12-06 05:43 UTC by lisayueyue
Modified: 2013-07-17 16:14 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
prototype (optimization) code to remove SSE misalignment (5.04 KB, text/plain)
2010-12-06 05:49 UTC, lisayueyue
  Details
audioresample: compile in SSE code, and use it where available (6.49 KB, patch)
2011-08-10 15:02 UTC, Vincent Penquerc'h
needs-work Details | Review
audioresample: use SSE/SSE2 when possible (9.22 KB, patch)
2011-08-11 15:25 UTC, Vincent Penquerc'h
none Details | Review
audioresample: fix SSE2 routines (3.01 KB, patch)
2011-08-11 15:25 UTC, Vincent Penquerc'h
needs-work Details | Review
audioresample: use SSE/SSE2 when possible (9.10 KB, patch)
2011-08-11 15:51 UTC, Vincent Penquerc'h
none Details | Review
audioresample: fix SSE2 routines (3.26 KB, patch)
2011-08-11 15:51 UTC, Vincent Penquerc'h
none Details | Review
audioresample: fix quality setting being ignored by the resampler state (2.98 KB, patch)
2011-08-11 17:50 UTC, Vincent Penquerc'h
committed Details | Review
audioresample: fix SSE2 building with double precision (2.84 KB, patch)
2011-08-11 18:35 UTC, Vincent Penquerc'h
committed Details | Review
audioresample: use SSE/SSE2 when possible (9.05 KB, patch)
2011-08-11 18:35 UTC, Vincent Penquerc'h
committed Details | Review

Description lisayueyue 2010-12-06 05:43:53 UTC
Audio Resample codec in media stack (gstreamer) of MeeGo should benefit from
SSE because most related signal sampling codec use the same SSE friendly theory
(Shannon Sampling Theory).However, in fact,such codec disables SSE
implementation, because only trivial perf. gain due to the SSE misalignment. 

So we need to :
a. remove SSE misalignment and restore the SSE perf. gain
b. enable SSE implementation

steps to reproduce:
1. launch simple gstreamer pipeline to play wav file: "gst-launch filesrc
location=astero8k.wav ! wavparse ! audioconvert ! audioresample !
gconfaudiosink" 
2. Use "sudo perf record -f -a" to profile and use "sudo perf report" to
display hot spot
3. It shows that the hot function "resampler_basic_interpolate_single" occupies
a lot of overhead which should be optimized by SSE implementation

packed SSE instructions should be used, the overhead of function
"resampler_basic_interpolate_single" should not be so high.

link from MeeGo bugzilla:
http://bugs.meego.com/show_bug.cgi?id=5603
Comment 1 lisayueyue 2010-12-06 05:49:38 UTC
Created attachment 175904 [details]
prototype (optimization) code to remove SSE misalignment

Yongnian tried to write several prototype functions to remove SSE misalignment for one
function. Using the best prototype and after enabling SSE, the overhead is
reduced by 0.6x; system CPU utilization and C0 percentage (power) is reduced as
well.  
|-----------|--------------------------|------------|--------------|
| prototype | hot function overhead    | CPU util%  | C0 percentage|
|-----------|--------------------------|------------|--------------|
|  No.2     | 69.60%->26.29% (0.4x)    |12.6%->9.6% |   8.5%->7.6% |
|-----------|--------------------------|------------|--------------|
Comment 2 Edward Hervey 2010-12-06 08:13:11 UTC
There's also a much more trivial way of drastically reducing the cpu load if you don't care about quality *that* much : reducing the 'filter-length' property.

Especially when downscaling the sampling rate the quality difference is barely noticeable between filter-length=64 (default) and filter-length=16 at a vastly reduced cpu usage.

Maybe those optimisations could be done with ORC, which would provide optimisations not only for one platform but for all platforms for which ORC has a backend (x86, mmx, sseX, arm, neon, ...).
Comment 3 Sebastian Dröge (slomo) 2010-12-12 10:46:33 UTC
You could also try to enable the SSE and ARM optimizations that already exist in the speex resampler. And it also might make sense to resync the resampler again with speex, there were probably some speed related changes ;)
Comment 4 Sebastian Dröge (slomo) 2010-12-12 10:55:50 UTC
No, apparently there were no changes in speex to the resampler since last time we synced.

Anyway, a patch that enables the assembly optimizations and selects the correct functions at runtime depending on the CPUs capabilities would definitely be accepted :)
Comment 5 Vincent Penquerc'h 2011-08-10 15:02:38 UTC
Created attachment 193559 [details] [review]
audioresample: compile in SSE code, and use it where available
Comment 6 Vincent Penquerc'h 2011-08-10 15:08:47 UTC
Still no changes in the speex resampler.
Attached is a patch to use the SSE code when running on a SSE machine.
It's SSE specific, but SSE seems to be the only special case for those larger functions.
The switch is also hidden through macros, which keeps the code tidy and clear, but which might offend someone's sense of style :P

Rough timings in seconds for decoding some Ogg/Vorbis file and resampling it twice before going to fakesink:

before: 32.2
after, with SSE functions empty: 6.58
after, with SSE functions enabled: 14.5
after, with SSE functions disabled: 31.2

I don't have a machine without SSE, so I'm not totally sure it's going to work on one, testing by someone with one would be nice. Also on a different arch, just in case.
Comment 7 Sebastian Dröge (slomo) 2011-08-11 07:17:01 UTC
Review of attachment 193559 [details] [review]:

::: gst/audioresample/resample.c
@@ +89,3 @@
 #include "speex_resampler.h"
 #include "arch.h"
+#if defined HAVE_CPU_I386 || defined HAVE_CPU_X86_64

You should also check for xmmintrin.h (SSE) emmintrin.h (SSE2) in configure and only enable the SSE(2) code if they're available

@@ +90,3 @@
 #include "arch.h"
+#if defined HAVE_CPU_I386 || defined HAVE_CPU_X86_64
+#include "sse.h"

Where do you get sse.h from?

@@ +949,3 @@
 
+#if defined HAVE_CPU_I386 || defined HAVE_CPU_X86_64
+  st->use_sse = sse_ok ();

You should make usage of SSE and SSE2 conditional based on the CPU capability function in ORC. Note that there is SSE and SSE2 code in resample_sse.h and ideally you want to compile in both and select the better one.
Comment 8 Vincent Penquerc'h 2011-08-11 09:32:13 UTC
I seem to have forgotten to add sse.h (inline assembler to run cpuid, copied from somewhere else in GStreamer). No matter though, if I'm to use ORC instead.
Good point about SSE2, it's tested at build time, but not run time. They're for different functions though AFAICT.
I'll post another patch soon.
Comment 9 Vincent Penquerc'h 2011-08-11 15:25:31 UTC
Created attachment 193643 [details] [review]
audioresample: use SSE/SSE2 when possible

Compile in the code on i386 and x86_64, and use ORC to determine
when the runtime platform can run the code.
Comment 10 Vincent Penquerc'h 2011-08-11 15:25:40 UTC
Created attachment 193644 [details] [review]
audioresample: fix SSE2 routines

They were still taking floats as input, when the resampler
tried to pass doubles
Comment 11 Sebastian Dröge (slomo) 2011-08-11 15:41:09 UTC
Comment on attachment 193644 [details] [review]
audioresample: fix SSE2 routines

you need to add new functions that take double parameters and use them in the double case. the float variants are still required for high-quality float resampling
Comment 12 Vincent Penquerc'h 2011-08-11 15:51:35 UTC
Created attachment 193648 [details] [review]
audioresample: use SSE/SSE2 when possible

Compile in the code on i386 and x86_64, and use ORC to determine
when the runtime platform can run the code.
Comment 13 Vincent Penquerc'h 2011-08-11 15:51:42 UTC
Created attachment 193649 [details] [review]
audioresample: fix SSE2 routines

They were still taking floats as input, when the resampler
tried to pass doubles
Comment 14 Vincent Penquerc'h 2011-08-11 15:53:16 UTC
Part of the float->double got placed in the wrong patch by a rebase mistake, fixed patches uploaded.
I'll look at where those float variants are used, I didn't see that before.
Comment 15 Vincent Penquerc'h 2011-08-11 17:50:56 UTC
Created attachment 193652 [details] [review]
audioresample: fix quality setting being ignored by the resampler state
Comment 16 Vincent Penquerc'h 2011-08-11 18:35:27 UTC
Created attachment 193653 [details] [review]
audioresample: fix SSE2 building with double precision

The full double implementation was missing.
Comment 17 Vincent Penquerc'h 2011-08-11 18:35:53 UTC
Created attachment 193654 [details] [review]
audioresample: use SSE/SSE2 when possible

Compile in the code on i386 and x86_64, and use ORC to determine
when the runtime platform can run the code.
Comment 18 Vincent Penquerc'h 2011-08-11 18:38:26 UTC
I found the mixed float/double case, all three cases tested and sound like they should.
Patches updated to match.
Comment 19 Sebastian Dröge (slomo) 2011-08-12 07:59:13 UTC
commit 49ec6899f4909e94cfca2d7844800c1ac77f80a6
Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk>
Date:   Thu Aug 11 18:50:08 2011 +0100

    audioresample: fix quality setting being ignored by the resampler state
    
    https://bugzilla.gnome.org/show_bug.cgi?id=636562

commit 746415a6e3a40d4d688aa25bcbb7062b54dfd4a8
Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk>
Date:   Thu Aug 11 15:54:15 2011 +0100

    audioresample: use SSE/SSE2 when possible
    
    Compile in the code on i386 and x86_64, and use ORC to determine
    when the runtime platform can run the code.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=636562

commit 58fd202b7d5cb0e734713be153cca63330ba87e0
Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk>
Date:   Thu Aug 11 19:23:42 2011 +0100

    audioresample: fix SSE2 building with double precision
    
    The full double implementation was missing.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=636562
Comment 20 Sebastian Dröge (slomo) 2011-08-12 09:15:54 UTC
Is someone going to work on the original reason of this bug, the misalignment? A patch against latest GIT would be very useful
Comment 21 Tobias Mueller 2011-12-21 12:18:32 UTC
So far, nobody stepped up. Shall we close as INCOMPLETE?
Comment 22 David Schleef 2011-12-27 21:13:21 UTC
What exactly is the misalignment problem?
Comment 23 Akhil Laddha 2012-02-10 05:58:20 UTC
Sebastian, ping
Comment 24 Sebastian Dröge (slomo) 2012-02-10 07:19:07 UTC
Akhil, see David's question in comment #22.
Comment 25 Tobias Mueller 2012-09-30 18:28:22 UTC
Since patches are committed, I mark as FIXED. Please reopen if this is not the case.
Comment 26 Tim-Philipp Müller 2012-09-30 18:45:12 UTC
Re-opening, since I am not sure the patches address the original concern, and it sounds like something actually worth looking into.


> What exactly is the misalignment problem?

It looks like the Meego bug has more details..

Whoever is currently working on this from Meego: do you have any patches to be considered?
Comment 27 Edward Hervey 2013-07-17 15:32:31 UTC
Still no reply. Original bug report (from meego.com) is dead.

Do we close this ?
Comment 28 Tim-Philipp Müller 2013-07-17 16:14:20 UTC
Let's close it.