GNOME Bugzilla – Bug 741987
videoscale performance regression
Last modified: 2015-03-04 15:04:32 UTC
on x86_64 git master: gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink sync=false Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 0:00:13.956702692 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... 1.4.5: gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink sync=false Impostazione della pipeline a PAUSED ... La pipeline è in PREROLLING ... La pipeline è in PREROLLED ... Impostazione della pipeline a PLAYING ... New clock: GstSystemClock Ottenuto EOS dall'elemento «pipeline0». Execution ended after 0:00:10.353925742 Impostazione della pipeline a PAUSED ... Impostazione della pipeline a READY ... Impostazione della pipeline a NULL ... Esecuzione di free sulla pipeline... so the recent refactoring in videoscale seems to have introduced performance regressions
another test, I saved some raw buffers inside an mkv file with a pipeline like this: gst-launch-1.0 v4l2src num-buffers=1000 ! video/x-raw ! matroskamux ! filesink location=/tmp/test.mkv git master: gst-launch-1.0 filesrc location= /tmp/test.mkv ! matroskademux ! queue ! videoscale ! video/x-raw,width=320,height=240 ! fakesink sync=false Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 0:00:01.401361272 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... 1.4.5: gst-launch-1.0 filesrc location= /tmp/test.mkv ! matroskademux ! queue ! videoscale ! video/x-raw,width=320,height=240 ! fakesink sync=false Impostazione della pipeline a PAUSED ... La pipeline è in PREROLLING ... La pipeline è in PREROLLED ... Impostazione della pipeline a PLAYING ... New clock: GstSystemClock Ottenuto EOS dall'elemento «pipeline0». Execution ended after 0:00:00.521376775 Impostazione della pipeline a PAUSED ... Impostazione della pipeline a READY ... Impostazione della pipeline a NULL ... Esecuzione di free sulla pipeline... in this case the difference seems even bigger: 0.5 seconds vs 1.4
if I only demux the file I get similar times between git master and 1.4 git master: gst-launch-1.0 filesrc location= /tmp/test.mkv ! matroskademux ! queue ! fakesink sync=false Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 0:00:00.497602064 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... 1.4.5: gst-launch-1.0 filesrc location= /tmp/test.mkv ! matroskademux ! fakesink sync=false Impostazione della pipeline a PAUSED ... La pipeline è in PREROLLING ... La pipeline è in PREROLLED ... Impostazione della pipeline a PLAYING ... New clock: GstSystemClock Ottenuto EOS dall'elemento «pipeline0». Execution ended after 0:00:00.499425560 Impostazione della pipeline a PAUSED ... Impostazione della pipeline a READY ... Impostazione della pipeline a NULL ... Esecuzione di free sulla pipeline...
here are the caps for the tested file: caps = "video/x-raw\,\ format\=\(string\)YUY2\,\ width\=\(int\)1280\,\ height\=\(int\)720\,\ pixel-aspect-ratio\=\(fraction\)1/1\,\ interlace-mode\=\(string\)progressive\,\ chroma-site\=\(string\)mpeg2\,\ colorimetry\=\(string\)bt709\,\ framerate\=\(fraction\)10/1"
git does chroma upsampling and downsampling before scaling. Disabling these extra steps brings performance in line with 1.4. I'll see if I can add an option for this.
The best thing to do is this: - make videoscale negotiate the same colorimetry on in and output, in the example it uses the HD and SD colorimetry on in and output and thus needs to do a conversion. - avoid chroma up/downsampling by doing scaling directly on the subsampled planes
We should probably also switch to LINEAR instead of CUBIC as default for scaling. We used LINEAR in <= 1.4, and CUBIC is noticeably slower.
Created attachment 294496 [details] [review] video-converter: Disable dithering by default and use LINEAR resampling We had no dithering before 1.6 and used linear resampling, choosing more heavy defaults causes performance regressions.
Not sure about dithering, I didn't measure that. But I have a case here where LINEAR vs CUBIC makes the difference between being able to run a pipeline in realtime or not. Where video_scale_h_ntap_4u8 (not ORCified) takes all of the CPU.
commit e2864494fe20dc9a8160fb383ce76e4c11280f82 Author: Wim Taymans <wtaymans@redhat.com> Date: Tue Jan 27 10:28:35 2015 +0100 video-converter: add fastpath for planar scaling Add fastpaths for scaling of planar subsampled formats. See https://bugzilla.gnome.org/show_bug.cgi?id=741987 commit ba98d06767af35db71288e428f1864b33876c377 Author: Wim Taymans <wtaymans@redhat.com> Date: Tue Jan 27 10:04:11 2015 +0100 video-scaler: add support for monochroma formats Add support for scaling of images with pstride == 1. This can be used to scale individual planes later. Rework some of the scaling code to take the pstride as a parameter. commit 3db8879f25fcd6b124d196850a58f5015d5be0e7 Author: Wim Taymans <wtaymans@redhat.com> Date: Tue Jan 27 09:51:47 2015 +0100 videoscale: disable chroma and matrix operations Ignore chroma subsampling and color matrix transformations like the old videoscale used to do. This is to make the performance like it was before. See https://bugzilla.gnome.org/show_bug.cgi?id=741987 [gst-1.4][wim@wtay ~/gst/1.4]$ gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=I420 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink silent=1 Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 0:00:09.186009427 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... [gst-head][wim@wtay ~/gst/head/gstreamer]$ gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=I420 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink silent=1 Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 0:00:07.272031544 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ...
thanks, now with i420 is better, however there are still some performance regressions with other formats, for example: git master: gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=YUY2 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink sync=false Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 0:00:09.526237997 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... 1.4: gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=YUY2 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink sync=false Impostazione della pipeline a PAUSED ... La pipeline è in PREROLLING ... La pipeline è in PREROLLED ... Impostazione della pipeline a PLAYING ... New clock: GstSystemClock Ottenuto EOS dall'elemento «pipeline0». Execution ended after 0:00:07.107416794 Impostazione della pipeline a PAUSED ... Impostazione della pipeline a READY ... Impostazione della pipeline a NULL ... Esecuzione di free sulla pipeline...
Let's reopen then :)
git before patches: [gst-head][wim@wtay ~/gst/head/gstreamer]$ gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=YUY2 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink silent=1 Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 0:00:09.151741005 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... git after patches: [gst-head][wim@wtay ~/gst/head/gstreamer]$ gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=YUY2 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink silent=1 Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 0:00:08.187308623 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... 1.4: [gst-1.4][wim@wtay ~/gst/1.4]$ gst-launch-1.0 videotestsrc num-buffers=2000 ! video/x-raw,width=1920,height=1080,format=YUY2 ! videoscale ! video/x-raw,width=320,height=240 ! fakesink silent=1 Setting pipeline to PAUSED ... Pipeline is PREROLLING ... Pipeline is PREROLLED ... Setting pipeline to PLAYING ... New clock: GstSystemClock Got EOS from element "pipeline0". Execution ended after 0:00:06.979913540 Setting pipeline to PAUSED ... Setting pipeline to READY ... Setting pipeline to NULL ... Freeing pipeline ... It seems the orc optimizations can't make up for the reshuffling of the bytes we need to do first. Maybe we just need to put the C version that was in 1.4 (but that's not so nice). commit f29b966c792811009b6e3613a59f6d7a58818020 Author: Wim Taymans <wtaymans@redhat.com> Date: Wed Jan 28 17:32:12 2015 +0100 video-converter: add fast-path scaler for some packed YUV formats Add fast path scaling for YUY2 and other packed YUV formats. Add a new method to merge the scalers of the Y and UV components into one scaler. Add faster horizontal 2tap scaler. See https://bugzilla.gnome.org/show_bug.cgi?id=741987 commit 47bd6a138c04f66573754f66dcbaeb8d2aace17a Author: Wim Taymans <wtaymans@redhat.com> Date: Wed Jan 28 17:30:53 2015 +0100 videoscale: don't do dithering
here is the comparison between some formats: - I420, git master perform better that 1.4 - YUY2, 1.4 win - YV12, git master win - UYVY, same performance - NV21, does not work in 1.4 - NV12, git master win - AYUV, 1.4 win - RGB, 1.4 win, really great gap (7 secs vs 14 secs on my laptop) - GRAY8, 1.4 win - YVYU, 1.4 win
before 1.6 release would also be useful to do the same comparison tests on hardware different from x86_64, for example arm
Created attachment 296519 [details] current state
Created attachment 296612 [details] state on pandaboard The current state on pandaboard
thanks for your efforts, performance seem not more so different now
this also makes RGB15/RGB16/BGR15/BGR16 faster than 1.4 commit 45e408735c4c9e6f9259cc747d3aaf995f0505e5 Author: Wim Taymans <wtaymans@redhat.com> Date: Thu Feb 12 11:38:20 2015 +0100 video-format: add orc function for RGB15/16 unpack
More speedups in git. Let's close this now.