After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 741987 - videoscale performance regression
videoscale performance regression
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
git master
Other Linux
: Normal major
: 1.5.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2014-12-26 10:11 UTC by Nicola
Modified: 2015-03-04 15:04 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
video-converter: Disable dithering by default and use LINEAR resampling (1.35 KB, patch)
2015-01-14 09:09 UTC, Sebastian Dröge (slomo)
none Details | Review
current state (11.98 KB, text/plain)
2015-02-10 15:49 UTC, Wim Taymans
  Details
state on pandaboard (13.19 KB, text/plain)
2015-02-11 15:52 UTC, Wim Taymans
  Details

Description Nicola 2014-12-26 10:11:53 UTC
on x86_64

git master:

gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink sync=false
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:13.956702692
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...


1.4.5:

gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink sync=false
Impostazione della pipeline a PAUSED ...
La pipeline è in PREROLLING ...
La pipeline è in PREROLLED ...
Impostazione della pipeline a PLAYING ...
New clock: GstSystemClock
Ottenuto EOS dall'elemento «pipeline0».
Execution ended after 0:00:10.353925742
Impostazione della pipeline a PAUSED ...
Impostazione della pipeline a READY ...
Impostazione della pipeline a NULL ...
Esecuzione di free sulla pipeline...

so the recent refactoring in videoscale seems to have introduced performance regressions
Comment 1 Nicola 2014-12-30 16:12:33 UTC
another test, I saved some raw buffers inside an mkv file with a pipeline like this:

gst-launch-1.0 v4l2src num-buffers=1000 ! video/x-raw ! matroskamux ! filesink location=/tmp/test.mkv

git master:

gst-launch-1.0 filesrc location= /tmp/test.mkv ! matroskademux ! queue ! videoscale ! video/x-raw,width=320,height=240 ! fakesink sync=false
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:01.401361272
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

1.4.5:

gst-launch-1.0 filesrc location= /tmp/test.mkv ! matroskademux ! queue ! videoscale ! video/x-raw,width=320,height=240 ! fakesink sync=false
Impostazione della pipeline a PAUSED ...
La pipeline è in PREROLLING ...
La pipeline è in PREROLLED ...
Impostazione della pipeline a PLAYING ...
New clock: GstSystemClock
Ottenuto EOS dall'elemento «pipeline0».
Execution ended after 0:00:00.521376775
Impostazione della pipeline a PAUSED ...
Impostazione della pipeline a READY ...
Impostazione della pipeline a NULL ...
Esecuzione di free sulla pipeline...

in this case the difference seems even bigger: 0.5 seconds vs 1.4
Comment 2 Nicola 2014-12-30 16:17:40 UTC
if I only demux the file I get similar times between git master and 1.4

git master:

 gst-launch-1.0 filesrc location= /tmp/test.mkv ! matroskademux ! queue ! fakesink sync=false
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:00.497602064
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

1.4.5:

gst-launch-1.0 filesrc location= /tmp/test.mkv ! matroskademux  ! fakesink sync=false
Impostazione della pipeline a PAUSED ...
La pipeline è in PREROLLING ...
La pipeline è in PREROLLED ...
Impostazione della pipeline a PLAYING ...
New clock: GstSystemClock
Ottenuto EOS dall'elemento «pipeline0».
Execution ended after 0:00:00.499425560
Impostazione della pipeline a PAUSED ...
Impostazione della pipeline a READY ...
Impostazione della pipeline a NULL ...
Esecuzione di free sulla pipeline...
Comment 3 Nicola 2014-12-30 16:44:47 UTC
here are the caps for the tested file:

caps = "video/x-raw\,\ format\=\(string\)YUY2\,\ width\=\(int\)1280\,\ height\=\(int\)720\,\ pixel-aspect-ratio\=\(fraction\)1/1\,\ interlace-mode\=\(string\)progressive\,\ chroma-site\=\(string\)mpeg2\,\ colorimetry\=\(string\)bt709\,\ framerate\=\(fraction\)10/1"
Comment 4 Wim Taymans 2015-01-07 12:28:37 UTC
git does chroma upsampling and downsampling before scaling. Disabling these extra steps brings performance in line with 1.4. I'll see if I can add an option for this.
Comment 5 Wim Taymans 2015-01-07 15:09:40 UTC
The best thing to do is this:

 - make videoscale negotiate the same colorimetry on in and output, in the
   example it uses the HD and SD colorimetry on in and output and thus needs to
   do a conversion.
 - avoid chroma up/downsampling by doing scaling directly on the subsampled
   planes
Comment 6 Sebastian Dröge (slomo) 2015-01-13 14:32:27 UTC
We should probably also switch to LINEAR instead of CUBIC as default for scaling. We used LINEAR in <= 1.4, and CUBIC is noticeably slower.
Comment 7 Sebastian Dröge (slomo) 2015-01-14 09:09:53 UTC
Created attachment 294496 [details] [review]
video-converter: Disable dithering by default and use LINEAR resampling

We had no dithering before 1.6 and used linear resampling, choosing more
heavy defaults causes performance regressions.
Comment 8 Sebastian Dröge (slomo) 2015-01-14 09:11:34 UTC
Not sure about dithering, I didn't measure that. But I have a case here where LINEAR vs CUBIC makes the difference between being able to run a pipeline in realtime or not. Where video_scale_h_ntap_4u8 (not ORCified) takes all of the CPU.
Comment 9 Wim Taymans 2015-01-27 09:54:53 UTC
commit e2864494fe20dc9a8160fb383ce76e4c11280f82
Author: Wim Taymans <wtaymans@redhat.com>
Date:   Tue Jan 27 10:28:35 2015 +0100

    video-converter: add fastpath for planar scaling
    
    Add fastpaths for scaling of planar subsampled formats.
    
    See https://bugzilla.gnome.org/show_bug.cgi?id=741987

commit ba98d06767af35db71288e428f1864b33876c377
Author: Wim Taymans <wtaymans@redhat.com>
Date:   Tue Jan 27 10:04:11 2015 +0100

    video-scaler: add support for monochroma formats
    
    Add support for scaling of images with pstride == 1. This can be used
    to scale individual planes later.
    Rework some of the scaling code to take the pstride as a parameter.

commit 3db8879f25fcd6b124d196850a58f5015d5be0e7
Author: Wim Taymans <wtaymans@redhat.com>
Date:   Tue Jan 27 09:51:47 2015 +0100

    videoscale: disable chroma and matrix operations
    
    Ignore chroma subsampling and color matrix transformations like the
    old videoscale used to do. This is to make the performance like it was
    before.
    
    See https://bugzilla.gnome.org/show_bug.cgi?id=741987



[gst-1.4][wim@wtay ~/gst/1.4]$ gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=I420 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink silent=1 
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:09.186009427
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

[gst-head][wim@wtay ~/gst/head/gstreamer]$ gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=I420 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink silent=1 
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:07.272031544
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...
Comment 10 Nicola 2015-01-27 10:54:46 UTC
thanks, now with i420 is better, however there are still some performance regressions with other formats, for example:

git master:

gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=YUY2 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink sync=false
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:09.526237997
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...


1.4:

gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=YUY2 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink sync=false
Impostazione della pipeline a PAUSED ...
La pipeline è in PREROLLING ...
La pipeline è in PREROLLED ...
Impostazione della pipeline a PLAYING ...
New clock: GstSystemClock
Ottenuto EOS dall'elemento «pipeline0».
Execution ended after 0:00:07.107416794
Impostazione della pipeline a PAUSED ...
Impostazione della pipeline a READY ...
Impostazione della pipeline a NULL ...
Esecuzione di free sulla pipeline...
Comment 11 Sebastian Dröge (slomo) 2015-01-27 11:51:42 UTC
Let's reopen then :)
Comment 12 Wim Taymans 2015-01-28 16:42:19 UTC
git before patches:

[gst-head][wim@wtay ~/gst/head/gstreamer]$ gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=YUY2 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink silent=1
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:09.151741005
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

git after patches:

[gst-head][wim@wtay ~/gst/head/gstreamer]$ gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=YUY2 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink silent=1
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:08.187308623
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

1.4:

[gst-1.4][wim@wtay ~/gst/1.4]$ gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=YUY2 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink silent=1
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:06.979913540
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...


It seems the orc optimizations can't make up for the reshuffling of the bytes we need to do first. Maybe we just need to put the C version that was in 1.4 (but that's not so nice).

commit f29b966c792811009b6e3613a59f6d7a58818020
Author: Wim Taymans <wtaymans@redhat.com>
Date:   Wed Jan 28 17:32:12 2015 +0100

    video-converter: add fast-path scaler for some packed YUV formats
    
    Add fast path scaling for YUY2 and other packed YUV formats. Add a new
    method to merge the scalers of the Y and UV components into one scaler.
    Add faster horizontal 2tap scaler.
    
    See https://bugzilla.gnome.org/show_bug.cgi?id=741987

commit 47bd6a138c04f66573754f66dcbaeb8d2aace17a
Author: Wim Taymans <wtaymans@redhat.com>
Date:   Wed Jan 28 17:30:53 2015 +0100

    videoscale: don't do dithering
Comment 13 Nicola 2015-01-28 17:18:44 UTC
here is the comparison between some formats:

- I420, git master perform better that 1.4
- YUY2, 1.4 win
- YV12, git master win
- UYVY, same performance
- NV21, does not work in 1.4
- NV12, git master win
- AYUV, 1.4 win
- RGB, 1.4 win, really great gap (7 secs vs 14 secs on my laptop)
- GRAY8, 1.4 win
- YVYU, 1.4 win
Comment 14 Nicola 2015-01-28 18:15:46 UTC
before 1.6 release would also be useful to do the same comparison tests on hardware different from x86_64, for example arm
Comment 15 Wim Taymans 2015-02-10 15:49:26 UTC
Created attachment 296519 [details]
current state
Comment 16 Wim Taymans 2015-02-11 15:52:10 UTC
Created attachment 296612 [details]
state on pandaboard

The current state on pandaboard
Comment 17 Nicola 2015-02-11 17:28:21 UTC
thanks for your efforts, performance seem not more so different now
Comment 18 Wim Taymans 2015-02-12 10:40:40 UTC
this also makes RGB15/RGB16/BGR15/BGR16 faster than 1.4

commit 45e408735c4c9e6f9259cc747d3aaf995f0505e5
Author: Wim Taymans <wtaymans@redhat.com>
Date:   Thu Feb 12 11:38:20 2015 +0100

    video-format: add orc function for RGB15/16 unpack
Comment 19 Wim Taymans 2015-03-04 15:04:18 UTC
More speedups in git. Let's close this now.