After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 734043 - videoconvert: add Orc optimization for I420 to BGRA for x86 [32 bit]
videoconvert: add Orc optimization for I420 to BGRA for x86 [32 bit]
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
1.4.0
Other All
: Normal major
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2014-07-31 14:33 UTC by Jake Foytik
Modified: 2018-05-01 09:50 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Orc logs for this bug (5.76 KB, text/plain)
2014-10-20 10:00 UTC, Eric
Details

Description Jake Foytik 2014-07-31 14:33:08 UTC
When testing conversion of I420->BGRA, I see fast performance with
gstreamer x86_64. However, gstreamer x86 performs much slower and posts the
warning:
' ORC: WARNING: orccompiler.c(382): orc_program_compile_full(): program
video_convert_orc_convert_I420_BGRA failed to compile, reason: register
overflow for vector reg '
Is it possible to get the Orc optimization on the 32 bit build?

Test pipeline:

gst-launch-1.0 videotestsrc ! video/x-raw,format=I420,width=1280,height=1024 !
videoconvert ! video/x-raw,format=BGRA ! fakesink sync=true
Comment 1 Nicolas Dufresne (ndufresne) 2014-07-31 15:00:34 UTC
Would be useful to also provide your CPU capabilities (e.g. just attach /proc/cpuinfo). Intel CPU are not all equal in their ability to do SIMD (hence gain something from ORC). Just being 64bit already allow for faster operation or around 2X for most pixel operations.
Comment 2 Jake Foytik 2014-07-31 15:29:55 UTC
We've been running these tests on a Windows 7 laptop and creating a browser plugin that restricts us to using a 32-bit build of gstreamer. Below is the processor information.

Processor 1            ID = 0
    Number of cores        4 (max 8)
    Number of threads    8 (max 16)
    Name            Intel Core i7 2860QM
    Codename        Sandy Bridge
    Specification        Intel(R) Core(TM) i7-2860QM CPU @ 2.50GHz
    Package (platform ID)    Socket 988B rPGA (0x4)
    CPUID            6.A.7
    Extended CPUID        6.2A
    Core Stepping        D2
    Technology        32 nm
    TDP Limit        45 Watts
    Tjmax            100.0 °C
    Core Speed        2394.3 MHz
    Multiplier x Bus Speed    24.0 x 99.8 MHz
    Stock frequency        2500 MHz
    Instructions sets    MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, EM64T, VT-x, AES, AVX
    L1 Data cache        4 x 32 KBytes, 8-way set associative, 64-byte line size
    L1 Instruction cache    4 x 32 KBytes, 8-way set associative, 64-byte line size
    L2 cache        4 x 256 KBytes, 8-way set associative, 64-byte line size
    L3 cache        8 MBytes, 16-way set associative, 64-byte line size
    FID/VID Control        yes


    Turbo Mode        supported, enabled
    Max non-turbo ratio    25x
    Max turbo ratio        36x
    Max efficiency ratio    8x
    Max Power        72 Watts
    Min Power        36 Watts
    O/C bins        none
    Ratio 1 core        36x
    Ratio 2 cores        35x
    Ratio 3 cores        33x
    Ratio 4 cores        33x
    TSC            2494.6 MHz
    APERF            3293.1 MHz
    MPERF            2494.5 MHz
Comment 3 Edward Hervey 2014-10-20 09:45:47 UTC
this fails on 32bit with orc git, orc 0.4.22 and 0.4.18

I have this feeling the issue was introduced by the following commit:

commit 14b5999bca16d9ac18bdcd5905c472bec2fe247e
Author: Wim Taymans <wtaymans@redhat.com>
Date:   Thu Jan 9 18:12:00 2014 +0100

    videoconvert: rework YUV->RGB fastpaths
    
    Rework the orc code to be around 10% faster and support arbitrary matrices.
    Pass the matrix parameters to the YUV->RGB functions to make them work
    for all matrices. This enables more and faster fastpath conversions.
    
    See https://bugzilla.gnome.org/show_bug.cgi?id=721701
Comment 4 Eric 2014-10-20 10:00:08 UTC
Created attachment 288906 [details]
Orc logs for this bug

orc version 0.4.18
gst version 1.4
Comment 5 Eric 2014-10-28 09:25:11 UTC
Hi, 

Any update on this?


Regards,
Eric T
Comment 6 Edward Hervey 2014-11-13 07:11:06 UTC
So there are two ways to fix this:
1) Implement orc code that doesn't use too many registers (so it works on architectures with less registers than x86-64)
2) Implement register spilling in orc (i.e. use memory when we exceed the available number of available registers).

I'm not 100% sure we can achieve the same quality/speed results with 1) accross all platforms. Maybe wim has some feedback on this.

2) doesn't seem as trivial as it seems (how do you figure out what's the *right* register to spill into main memory).
Comment 7 Edward Hervey 2018-05-01 09:50:31 UTC
No activity for 4 years. Only applies on 32bit x86 machines. Closing.

Re-open if a patch can be provided to fix this issue.