GNOME Bugzilla – Bug 592787
a52dec: Allow liba52 to use djbfft based IMDCT transform
Last modified: 2009-08-23 11:37:41 UTC
liba52 in a52dec-0.7.4 does not have any MMX, MMXEXT or 3DNOW based IMDCT transform acceleration. It does however have a software based acceleration using the djbfft library (D.J. Bernstein's library for fourier transforms - Extremely fast library for floating-point convolution). So allow liba52 to use it through the MM_ACCEL_DJBFFT flag. The liba52 copy in MPlayer sources does have SSE, 3dnowext, 3dnow and AltiVec implementations, but those are checked for first, and djbfft is chosen only if none of those is available - good in the case of some distributions including a port of the MPlayer changes in their system a52dec library. The down and upmix code in liba52 doesn't seem to be disturbed by this additional MM_ACCEL flag and will still use MMX, SSE or 3DNOW versions if passed from oil_cpu_get_flags (SSE currently is not).
Created attachment 141481 [details] [review] [PATCH] a52dec: Allow liba52 to use djbfft based IMDCT transform liba52 in a52dec-0.7.4 does not have any MMX, MMXEXT or 3DNOW based IMDCT transform acceleration. It does however have a software based acceleration using the djbfft library (D.J. Bernstein's library for fourier transforms - Extremely fast library for floating-point convolution). So allow liba52 to use it through the MM_ACCEL_DJBFFT flag. The liba52 copy in MPlayer sources does have SSE, 3dnowext, 3dnow and AltiVec implementations, but those are checked for first, and djbfft is chosen only if none of those is available - good in the case of some distributions including a port of the MPlayer changes in their system a52dec library. The down and upmix code in liba52 doesn't seem to be disturbed by this additional MM_ACCEL flag and will still use MMX, SSE or 3DNOW versions if passed from oil_cpu_get_flags (SSE currently is not). --- ext/a52dec/gsta52dec.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-)
I have anecdotal evidence of a 20% increase in speed of a52 decoding with this change based on very very crude and quick measurement. Just demuxing of a 45 minute test file seems to take 1.15 seconds Decoding without djbfft patch seems to take 13.722 - 1.15 = ~12.57 seconds Decoding with djbfft patch seems to take 11.6 - 1.15 = ~10.45 seconds (assumes a system liba52 with djbfft library usage enabled) Note that a52_init isn't passed any SSE flag if the CPU supports it, but liba52 does have SSE code for downmixing. I might do a patch for that later, or not.. All of this was done while ensuring test.avi is in disk cache through multiple runs and taking the last. In the end there's an excerpt from mplayer on the same file, to demonstrate that with the patched liba52 in mplayer, the result would be even better (if liba52 would be patched with mplayer patches and the plugin would pass SSE flag to a52_init): $ time gst-launch-0.10 filesrc location=test.avi ! avidemux ! a52dec ! fakesink Setting pipeline to PAUSED ... No accelerated IMDCT transform found Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 13623398303 ns. Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... real 0m13.722s user 0m13.482s sys 0m0.220s $ time gst-launch-0.10 filesrc location=test.avi ! avidemux ! a52dec ! fakesink Setting pipeline to PAUSED ... Using djbfft for IMDCT transform Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 11475860134 ns. Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... real 0m11.599s user 0m11.306s sys 0m0.267s $ time gst-launch-0.10 filesrc location=test.avi ! avidemux ! fakesink Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 1018646297 ns. Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... real 0m1.146s user 0m0.847s sys 0m0.230s $ time mplayer -vo null -vc null -ao pcm:fast -benchmark test.avi ... Opening audio decoder: [liba52] AC3 decoding with liba52 Using SSE optimized IMDCT transform Using MMX optimized resampler AUDIO: 48000 Hz, 2 ch, s16le, 192.0 kbit/12.50% (ratio: 24000->192000) Selected audio codec: [a52] afm: liba52 (AC3-liba52) ... BENCHMARKs: VC: 0.077s VO: 0.000s A: 5.751s Sys: 3.278s = 9.106s BENCHMARK%: VC: 0.8491% VO: 0.0000% A: 63.1538% Sys: 35.9971% = 100.0000% real 0m9.137s user 0m6.013s sys 0m1.077s
commit f5f10797ffa7d63c9dc5b086ce4e788c528c922c Author: Sebastian Dröge <sebastian.droege@collabora.co.uk> Date: Sun Aug 23 13:35:46 2009 +0200 a52dec: Only add the MM_ACCEL_DJBFFT flag if it's defined It's not defined for older liba52 versions. commit ebfe6c07467c67c82cdd199903ddd36097564ed5 Author: Mart Raudsepp <leio@gentoo.org> Date: Sun Aug 23 13:34:32 2009 +0200 a52dec: Allow liba52 to use djbfft based IMDCT transform liba52 in a52dec-0.7.4 does not have any MMX, MMXEXT or 3DNOW based IMDCT transform acceleration. It does however have a software based acceleration using the djbfft library (D.J. Bernstein's library for fourier transforms - Extremely fast library for floating-point convolution). So allow liba52 to use it through the MM_ACCEL_DJBFFT flag. The liba52 copy in MPlayer sources does have SSE, 3dnowext, 3dnow and AltiVec implementations, but those are checked for first, and djbfft is chosen only if none of those is available - good in the case of some distributions including a port of the MPlayer changes in their system a52dec library. The down and upmix code in liba52 doesn't seem to be disturbed by this additional MM_ACCEL flag and will still use MMX, SSE or 3DNOW versions if passed from oil_cpu_get_flags (SSE currently is not). Fixes bug #592787.