Bug 592787 – a52dec: Allow liba52 to use djbfft based IMDCT transform

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 592787 - a52dec: Allow liba52 to use djbfft based IMDCT transform


Summary:	a52dec: Allow liba52 to use djbfft based IMDCT transform


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-ugly
Version:	git master
Hardware:	Other Linux

Importance:	Normal minor
Target Milestone:	0.10.13
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2009-08-23 09:59 UTC by Mart Raudsepp
Modified:	2009-08-23 11:37 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
[PATCH] a52dec: Allow liba52 to use djbfft based IMDCT transform (621 bytes, patch) 2009-08-23 10:02 UTC, Mart Raudsepp	committed	Details \| Review

Description Mart Raudsepp 2009-08-23 09:59:13 UTC

liba52 in a52dec-0.7.4 does not have any MMX, MMXEXT or 3DNOW based
    IMDCT transform acceleration. It does however have a software based
    acceleration using the djbfft library (D.J. Bernstein's library for
    fourier transforms - Extremely fast library for floating-point
    convolution). So allow liba52 to use it through the MM_ACCEL_DJBFFT
    flag.
    The liba52 copy in MPlayer sources does have SSE, 3dnowext, 3dnow
    and AltiVec implementations, but those are checked for first, and
    djbfft is chosen only if none of those is available - good in the
    case of some distributions including a port of the MPlayer changes
    in their system a52dec library.
    
    The down and upmix code in liba52 doesn't seem to be disturbed by
    this additional MM_ACCEL flag and will still use MMX, SSE or 3DNOW
    versions if passed from oil_cpu_get_flags (SSE currently is not).

Comment 1 Mart Raudsepp 2009-08-23 10:02:08 UTC

Created attachment 141481 [details] [review]
[PATCH] a52dec: Allow liba52 to use djbfft based IMDCT transform


liba52 in a52dec-0.7.4 does not have any MMX, MMXEXT or 3DNOW based
IMDCT transform acceleration. It does however have a software based
acceleration using the djbfft library (D.J. Bernstein's library for
fourier transforms - Extremely fast library for floating-point
convolution). So allow liba52 to use it through the MM_ACCEL_DJBFFT
flag.
The liba52 copy in MPlayer sources does have SSE, 3dnowext, 3dnow
and AltiVec implementations, but those are checked for first, and
djbfft is chosen only if none of those is available - good in the
case of some distributions including a port of the MPlayer changes
in their system a52dec library.

The down and upmix code in liba52 doesn't seem to be disturbed by
this additional MM_ACCEL flag and will still use MMX, SSE or 3DNOW
versions if passed from oil_cpu_get_flags (SSE currently is not).
---
 ext/a52dec/gsta52dec.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

Comment 2 Mart Raudsepp 2009-08-23 10:14:30 UTC

I have anecdotal evidence of a 20% increase in speed of a52 decoding with this change based on very very crude and quick measurement.

Just demuxing of a 45 minute test file seems to take 1.15 seconds
Decoding without djbfft patch seems to take 13.722 - 1.15 = ~12.57 seconds
Decoding with djbfft patch seems to take 11.6 - 1.15 = ~10.45 seconds (assumes a system liba52 with djbfft library usage enabled)

Note that a52_init isn't passed any SSE flag if the CPU supports it, but liba52 does have SSE code for downmixing. I might do a patch for that later, or not..

All of this was done while ensuring test.avi is in disk cache through multiple runs and taking the last. In the end there's an excerpt from mplayer on the same file, to demonstrate that with the patched liba52 in mplayer, the result would be even better (if liba52 would be patched with mplayer patches and the plugin would pass SSE flag to a52_init):

$ time gst-launch-0.10 filesrc location=test.avi ! avidemux ! a52dec ! fakesink
Setting pipeline to PAUSED ...
No accelerated IMDCT transform found
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 13623398303 ns.
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

real 0m13.722s
user 0m13.482s
sys 0m0.220s

$ time gst-launch-0.10 filesrc location=test.avi ! avidemux ! a52dec ! fakesink
Setting pipeline to PAUSED ...
Using djbfft for IMDCT transform
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 11475860134 ns.
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

real 0m11.599s
user 0m11.306s
sys 0m0.267s

$ time gst-launch-0.10 filesrc location=test.avi ! avidemux ! fakesink
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 1018646297 ns.
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

real 0m1.146s
user 0m0.847s
sys 0m0.230s

$ time mplayer -vo null -vc null -ao pcm:fast -benchmark test.avi
...
Opening audio decoder: [liba52] AC3 decoding with liba52
Using SSE optimized IMDCT transform
Using MMX optimized resampler
AUDIO: 48000 Hz, 2 ch, s16le, 192.0 kbit/12.50% (ratio: 24000->192000)
Selected audio codec: [a52] afm: liba52 (AC3-liba52)
...
BENCHMARKs: VC: 0.077s VO: 0.000s A: 5.751s Sys: 3.278s = 9.106s
BENCHMARK%: VC: 0.8491% VO: 0.0000% A: 63.1538% Sys: 35.9971% = 100.0000%

real 0m9.137s
user 0m6.013s
sys 0m1.077s

Comment 3 Sebastian Dröge (slomo) 2009-08-23 11:37:41 UTC

commit f5f10797ffa7d63c9dc5b086ce4e788c528c922c
Author: Sebastian Dröge <sebastian.droege@collabora.co.uk>
Date:   Sun Aug 23 13:35:46 2009 +0200

    a52dec: Only add the MM_ACCEL_DJBFFT flag if it's defined
    
    It's not defined for older liba52 versions.

commit ebfe6c07467c67c82cdd199903ddd36097564ed5
Author: Mart Raudsepp <leio@gentoo.org>
Date:   Sun Aug 23 13:34:32 2009 +0200

    a52dec: Allow liba52 to use djbfft based IMDCT transform
    
    liba52 in a52dec-0.7.4 does not have any MMX, MMXEXT or 3DNOW based
    IMDCT transform acceleration. It does however have a software based
    acceleration using the djbfft library (D.J. Bernstein's library for
    fourier transforms - Extremely fast library for floating-point
    convolution). So allow liba52 to use it through the MM_ACCEL_DJBFFT
    flag.
    The liba52 copy in MPlayer sources does have SSE, 3dnowext, 3dnow
    and AltiVec implementations, but those are checked for first, and
    djbfft is chosen only if none of those is available - good in the
    case of some distributions including a port of the MPlayer changes
    in their system a52dec library.
    
    The down and upmix code in liba52 doesn't seem to be disturbed by
    this additional MM_ACCEL flag and will still use MMX, SSE or 3DNOW
    versions if passed from oil_cpu_get_flags (SSE currently is not).
    
    Fixes bug #592787.