After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 721701 - videoconvert: I420 to BGRA conversion is slower than in 0.10
videoconvert: I420 to BGRA conversion is slower than in 0.10
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
1.2.2
Other Linux
: Normal major
: 1.3.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
: 721938 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2014-01-07 13:12 UTC by Peter Maersk-Moller
Modified: 2014-04-12 09:45 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Peter Maersk-Moller 2014-01-07 13:12:55 UTC
It appear that version 1.2.0-1.2.2 and perhaps previous versions introduce a dramatic performance penalty for video scaling and video format conversion of I420 to BGRA when compared to version 0.10.36. This performance penalty is present for common Ubuntu installations ranging from 12.04 until 13.10 as well as with a newly build version of GStreamer 1.2.2 with orc-0.4.18. It is checked that Orc is enabled when building gst-plugins-base.


The following examples are all executed on Ubuntu 13.10 with GStreamer 1.2.2 for version 1.0 and GStreamer 0.10.36 for version 0.10. The system is a dual Xeon Quad core configuration with a total of 8 cores.

First we create two test samples of 60 seconds each (just hit ctrl-c after running 60 seconds:

  $ gst-launch-1.0 -e -v videotestsrc pattern=18 is-live=true ! 'video/x-raw, width=1024, height=576, format=I420' ! videoconvert ! x264enc ! 'video/x-h264, profile=main' ! avimux ! filesink location=video1024x576.mp4

  $ gst-launch-1.0 -e -v videotestsrc pattern=18 is-live=true ! 'video/x-raw, width=1280, height=720, format=I420' ! videoconvert ! x264enc ! 'video/x-h264, profile=main' ! avimux ! filesink location=video1280x720.mp4

Now we start top in another window to check cpu load and try decode and convert the latter file with 0.10 and 1.0

  $ /usr/bin/gst-launch-0.10 -v filesrc location=./video1280x720.mp4 do-timestamp=true ! decodebin2 name=decoder ! ffmpegcolorspace ! 'video/x-raw-rgb, bpp=(int)32, depth=(int)32, endianness=(int)4321, red_mask=(int)65280, green_mask=(int)16711680, blue_mask=(int)-16777216, width=(int)1280, height=(int)720' ! queue ! fakesink silent=true sync=true

and

  $ /usr/local/bin/gst-launch-1.0 -v filesrc location=./video1280x720.mp4 do-timestamp=true ! decodebin name=decoder ! videoconvert ! 'video/x-raw, format=(string)BGRA, width=(int)1280, height=(int)720' ! queue ! fakesink silent=true sync=true

On my system, the 0.10 version will happily run at roughly 24-25 % CPU load while the 1.0 version runs at roughly 51%

Now we add videoscaling by using the first file we created and scale it before converting it:

  $ /usr/bin/gst-launch-0.10 -v filesrc location=./video1024x576.mp4 do-timestamp=true ! decodebin2 name=decoder ! videoscale ! ffmpegcolorspace ! 'video/x-raw-rgb, bpp=(int)32, depth=(int)32, endianness=(int)4321, red_mask=(int)65280, green_mask=(int)16711680, blue_mask=(int)-16777216, width=(int)1280, height=(int)720' ! queue ! fakesink silent=true sync=true

and

  $ /usr/local/bin/gst-launch-1.0 -v filesrc location=./video1024x576.mp4 do-timestamp=true ! decodebin name=decoder ! videoscale ! videoconvert ! 'video/x-raw, format=(string)BGRA, width=(int)1280, height=(int)720' ! queue ! fakesink silent=true sync=true

On my system, the 0.10 version will happily run at roughly 33-34 % CPU load while the 1.0 version runs at roughly 101%

Here is what I see when building the plugin base:

configure: *** Plug-ins without external dependencies that will be built:
	adder
	app
	audioconvert
	audiorate
	audioresample
	audiotestsrc
	encoding
	gio
	playback
	subparse
	tcp
	typefind
	videoconvert
	videorate
	videoscale
	videotestsrc
	volume

configure: *** Plug-ins without external dependencies that will NOT be built:

configure: *** Plug-ins that have NOT been ported:

configure: *** Plug-ins with dependencies that will be built:
	alsa
	cdparanoia
	ivorbisdec
	libvisual
	ogg
	pango
	theora
	vorbis
	ximagesink
	xvimagesink

configure: *** Plug-ins with dependencies that will NOT be built:

configure: *** Orc acceleration enabled.

So I assume Orc acceleration for video scale and videoconvert is enabled.
Comment 1 Andoni Morales 2014-01-07 14:50:08 UTC
To complete a bit more the report, I was able to reproduce the performance issue with an Intel i5 x86_64 using Ubuntu's GStreamer, but I was unable to reproduce it with the developers ppa.
Comment 2 Wim Taymans 2014-01-07 15:28:12 UTC
I420 -> BGRA is not implemented correctly in 0.10, it uses the fast path but with the wrong color matrix and wrong chroma upsampling.

In 1.0, it is implemented correctly but uses a slow path. Implementing a fast path with wrong upsampling would easy to add.
Comment 3 Andoni Morales 2014-01-07 16:07:09 UTC
(In reply to comment #2)
> I420 -> BGRA is not implemented correctly in 0.10, it uses the fast path but
> with the wrong color matrix and wrong chroma upsampling.
> 
> In 1.0, it is implemented correctly but uses a slow path. Implementing a fast
> path with wrong upsampling would easy to add.

But how do you explain that using the developers ppa instead of the default version provided with Ubuntu was on par with 0.10?

I did run some tests with cachegrind comparing 0.10, 1.0 from Ubuntu and 1.2 from the developers ppa.
0.10 and 1.2 from the ppa was more or less the same number of CPU cycles (around 20M for the colorspace conversion function) while the version from Ubuntu was using 50M cycles more the colorspace conversion.
Comment 4 Nicolas Dufresne (ndufresne) 2014-01-07 16:50:22 UTC
In 1.2.2 there was some changes that may have has an impact on the fast path. It is most likely related. Note that all this depends on proper build of ORC support.

commit ed8460519f1571bc4a1ca079847a67f077d1efaf
Author: Sebastian Dröge <sebastian@centricular.com>
Date:   Mon Dec 23 14:54:02 2013 +0100

    videoconvert: Fix I420 to BGRA fast-path alpha setting

    This fast-path was adding 128 to every component including
    alpha while it should only be done for all components except
    alpha. This caused wrong alpha values to be generated.
    
    Also remove the high-quality I420 to BGRA fast-path as it needs
    the same fix, which causes an additional instruction, which causes
    orc to emit more than 96 variables, which then just crashes.
    This can only be fixed in orc by breaking ABI and allowing more
    variables.
Comment 5 Sebastian Dröge (slomo) 2014-01-07 16:55:51 UTC
The original reporter says that it also happens with 1.2.0. And that fix there adds exactly one ORC opcode per pixel, and as a result sets a non-random alpha value ;)
Comment 6 Peter Maersk-Moller 2014-01-08 13:05:02 UTC
It appear i was a bit hasty in specifying 1.2.0 as introducing a performance penalty. Looking into it, version 1.2.0 only do so partly. The following pipeline for 1.2.0 consumes about 53%, which is roughly half of 1.2.2, but still nearly twice as much as for 0.10.36:

  $ /usr/local/bin/gst-launch-1.0 -v filesrc location=./video1280x720.mp4 do-timestamp=true ! decodebin name=decoder ! videoconvert ! 'video/x-raw, format=(string)BGRA, width=(int)1280, height=(int)720' ! queue ! fakesink silent=true sync=true

However, the following pipeline only consumes roughly 25-29% which is only a little bit more than for 0.10.36

  $ /usr/local/bin/gst-launch-1.0 -v filesrc location=./video1024x576.mp4 do-timestamp=true ! decodebin name=decoder ! videoscale ! videoconvert ! 'video/x-raw, format=(string)BGRA, width=(int)1280, height=(int)720' ! queue ! fakesink silent=true sync=true

Best regards
Comment 7 Peter Maersk-Moller 2014-01-08 17:43:21 UTC
(In reply to comment #6)

The interesting part in previous comment is, the following:

  1) Decoding 1280x720 and converting to BGRA is consuming twice as much CPU compared to 0.10.36 and half as much as compared to 1.2.2

  2) Decoding 1024x576, then scaling to 1280x720 and then converting to BGRA is using only a little bit more CPU compared to 0.10.36

For version 1.0.10 the similar tests reveals:

  1) Decoding 1280x720 and converting to BGRA is consuming 20% less CPU compared to 1.2.0, but still 80% more than 0.10.36

  2) Decoding 1024x576, then scaling to 1280x720 and then converting to BGRA is using only a little bit more CPU compared to 0.10.36 similar to version 1.2.0

best regards
Peter
Comment 8 Wim Taymans 2014-01-09 17:21:23 UTC
This should bring it comparable to 0.10

commit 14b5999bca16d9ac18bdcd5905c472bec2fe247e
Author: Wim Taymans <wtaymans@redhat.com>
Date:   Thu Jan 9 18:12:00 2014 +0100

    videoconvert: rework YUV->RGB fastpaths
    
    Rework the orc code to be around 10% faster and support arbitrary matrices.
    Pass the matrix parameters to the YUV->RGB functions to make them work
    for all matrices. This enables more and faster fastpath conversions.
    
    See https://bugzilla.gnome.org/show_bug.cgi?id=721701

commit 5ca04cb79836ce817ce88189e97a1c41f3424686
Author: Wim Taymans <wtaymans@redhat.com>
Date:   Thu Jan 9 18:08:41 2014 +0100

    videoconvert: fix I420 to BGRA fast-path some more
    
    Calculate alpha value differently so that we can avoid running out
    of registers.
Comment 9 Allison Karlitskaya (desrt) 2014-01-09 22:03:49 UTC
This seems to have broken the build under jhbuild on fedora 20:

  CC       libgstvideoconvert_la-videoconvert.lo
videoconvert.c: In function 'convert_AYUV_ARGB':
videoconvert.c:1209:7: error: too many arguments to function 'video_convert_orc_convert_AYUV_ARGB'
       width, height);
       ^
In file included from videoconvert.c:31:0:
gstvideoconvertorc.h:113:6: note: declared here
 void video_convert_orc_convert_AYUV_ARGB (guint8 * ORC_RESTRICT d1, int d1_stride, const guint8 * ORC_RESTRICT s1, int s1_stride, int n, int m);
      ^
videoconvert.c: In function 'convert_AYUV_BGRA':
videoconvert.c:1223:7: error: too many arguments to function 'video_convert_orc_convert_AYUV_BGRA'
       width, height);
       ^
In file included from videoconvert.c:31:0:
gstvideoconvertorc.h:114:6: note: declared here
 void video_convert_orc_convert_AYUV_BGRA (guint8 * ORC_RESTRICT d1, int d1_stride, const guint8 * ORC_RESTRICT s1, int s1_stride, int n, int m);
      ^
videoconvert.c: In function 'convert_AYUV_ABGR':
videoconvert.c:1237:7: error: too many arguments to function 'video_convert_orc_convert_AYUV_ABGR'
       width, height);
       ^
In file included from videoconvert.c:31:0:
gstvideoconvertorc.h:115:6: note: declared here
 void video_convert_orc_convert_AYUV_ABGR (guint8 * ORC_RESTRICT d1, int d1_stride, const guint8 * ORC_RESTRICT s1, int s1_stride, int n, int m);
      ^
videoconvert.c: In function 'convert_AYUV_RGBA':
videoconvert.c:1251:7: error: too many arguments to function 'video_convert_orc_convert_AYUV_RGBA'
       width, height);
       ^
In file included from videoconvert.c:31:0:
gstvideoconvertorc.h:116:6: note: declared here
 void video_convert_orc_convert_AYUV_RGBA (guint8 * ORC_RESTRICT d1, int d1_stride, const guint8 * ORC_RESTRICT s1, int s1_stride, int n, int m);
      ^
videoconvert.c: In function 'convert_I420_BGRA':
videoconvert.c:1268:9: error: too many arguments to function 'video_convert_orc_convert_I420_BGRA'
         width);
         ^
In file included from videoconvert.c:31:0:
gstvideoconvertorc.h:117:6: note: declared here
 void video_convert_orc_convert_I420_BGRA (guint8 * ORC_RESTRICT d1, const guint8 * ORC_RESTRICT s1, const guint8 * ORC_RESTRICT s2, const guint8 * ORC_RESTRICT s3, int n);
      ^
make[1]: *** [libgstvideoconvert_la-videoconvert.lo] Error 1
make[1]: Leaving directory `/home/desrt/.cache/jhbuild/checkout/gst-plugins-base/gst/videoconvert'
make: *** [all] Error 2
Comment 10 Nicolas Dufresne (ndufresne) 2014-01-09 22:23:10 UTC
Builds for me in manual build, it's possible that the orc code didn't get regenerated, would make clean fix the issue ?
Comment 11 Allison Karlitskaya (desrt) 2014-01-10 04:07:03 UTC
Seems that the problem is this:

[desrt@moonpix videoconvert]$ make V=1
cp ./gstvideoconvertorc-dist.c tmp-orc.c
cp ./gstvideoconvertorc-dist.h gstvideoconvertorc.h
make  all-am

from the Makefile.am:

# When 'make dist' is run at the top level, or 'make orc-update'
# in a directory including this fragment, the generated source 
# files will be copied to $(ORC_SOURCE)-dist.[ch].  These files
# should be checked in to git, since they are used if Orc is
# disabled.


The checked in -dist copies of these files are the old ones and I don't have ORC enabled, so I'm not rebuilding them.  I guess you need to check in the updates?
Comment 12 Sebastian Dröge (slomo) 2014-01-10 08:22:12 UTC
Can this be closed now? Or do we miss some other fast-paths for this bug?

commit 4ff4c1c10aebf0a1cf1623572d6d66864aebac60
Author: Sebastian Dröge <sebastian@centricular.com>
Date:   Fri Jan 10 09:21:08 2014 +0100

    videoconvert: Update disted orc files
Comment 13 Sebastian Dröge (slomo) 2014-01-10 08:23:43 UTC
(In reply to comment #11)

> I don't have ORC enabled, so I'm not rebuilding them. 

Note that without ORC many plugins will be quite slow, you really should build with ORC support.
Comment 14 Wim Taymans 2014-01-10 09:06:58 UTC
gst-launch-0.10 -v videotestsrc num-buffers=1000 ! video/x-raw-yuv\,\ format\=\(fourcc\)I420\,\ width\=\(int\)1280\,\ height\=\(int\)720\,\ pixel-aspect-ratio\=\(fraction\)1/1\,\ interlace-mode\=\(string\)progressive\,framerate\=\(fraction\)30/1 ! ffmpegcolorspace ! 'video/x-raw-rgb, bpp=(int)32, depth=(int)24, endianness=(int)4321, red_mask=(int)65280, green_mask=(int)16711680, blue_mask=(int)-16777216, width=(int)1280, height=(int)720, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive' ! fakesink silent=1 -v

0.10: Execution ended after 12328739773 ns.

gst-launch-1.0 -v videotestsrc num-buffers=1000 ! video/x-raw\,\ format\=\(string\)I420\,\ width\=\(int\)1280\,\ height\=\(int\)720\,\ pixel-aspect-ratio\=\(fraction\)1/1\,\ interlace-mode\=\(string\)progressive\,\ colorimetry\=\(string\)bt709\,\ framerate\=\(fraction\)30/1 ! videoconvert ! 'video/x-raw, format=(string)BGRx,width=(int)1280, height=(int)720' ! fakesink

1.3 git: Execution ended after 0:00:11.873438183

gst-launch-1.0 -v videotestsrc num-buffers=1000 ! video/x-raw\,\ format\=\(string\)I420\,\ width\=\(int\)1280\,\ height\=\(int\)720\,\ pixel-aspect-ratio\=\(fraction\)1/1\,\ interlace-mode\=\(string\)progressive\,\ colorimetry\=\(string\)bt709\,\ framerate\=\(fraction\)30/1 ! videoconvert ! 'video/x-raw, format=(string)BGRx,width=(int)1280, height=(int)720' ! fakesink

1.2 git: Execution ended after 0:00:26.966103516
Comment 15 Wim Taymans 2014-01-10 09:18:39 UTC
with scaling;

gst-launch-0.10 -v videotestsrc num-buffers=1000 ! video/x-raw-yuv\,\ format\=\(fourcc\)I420\,\ width\=\(int\)1024\,\ height\=\(int\)576\,\ pixel-aspect-ratio\=\(fraction\)1/1\,\ interlace-mode\=\(string\)progressive\,framerate\=\(fraction\)30/1 ! videoscale ! ffmpegcolorspace ! 'video/x-raw-rgb, bpp=(int)32, depth=(int)24, endianness=(int)4321, red_mask=(int)65280, green_mask=(int)16711680, blue_mask=(int)-16777216, width=(int)1280, height=(int)720, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive' ! fakesink silent=1 -v

0.10: Execution ended after 17131772544 ns.

gst-launch-1.0 -v videotestsrc num-buffers=1000 ! video/x-raw\,\ format\=\(string\)I420\,\ width\=\(int\)1024\,\ height\=\(int\)576\,\ pixel-aspect-ratio\=\(fraction\)1/1\,\ interlace-mode\=\(string\)progressive\,\ colorimetry\=\(string\)bt709\,\ framerate\=\(fraction\)30/1 ! videoscale ! videoconvert ! 'video/x-raw, format=(string)BGRx,width=(int)1280, height=(int)720' ! fakesink

1.3 git: Execution ended after 0:00:15.854307283

gst-launch-1.0 -v videotestsrc num-buffers=1000 ! video/x-raw\,\ format\=\(string\)I420\,\ width\=\(int\)1024\,\ height\=\(int\)576\,\ pixel-aspect-ratio\=\(fraction\)1/1\,\ interlace-mode\=\(string\)progressive\,\ colorimetry\=\(string\)bt709\,\ framerate\=\(fraction\)30/1 ! videoscale ! videoconvert ! 'video/x-raw, format=(string)BGRx,width=(int)1280, height=(int)720' ! fakesink

1.2: Execution ended after 0:00:31.374607459
Comment 16 Sebastian Dröge (slomo) 2014-01-10 09:29:33 UTC
Let's consider this fixed then, new bug reports for other missing fast paths please
Comment 17 Wim Taymans 2014-01-10 09:50:55 UTC
From file it's more dramatic:

0.10:

   video1280x720.mp4: Execution ended after 5664100050 ns.
   video1024x576.mp4: Execution ended after 4483866324 ns.

1.3 git:

   video1280x720.mp4: Execution ended after 0:00:02.899891548
   video1024x576.mp4: Execution ended after 0:00:03.350581054

1.2.2:

   video1280x720.mp4: Execution ended after 0:00:09.539085249
   video1024x576.mp4: Execution ended after 0:00:10.231766318

0.10 uses purely software for color conversion but uses orc for scaling.
Comment 18 Allison Karlitskaya (desrt) 2014-01-10 12:05:53 UTC
This is still causing jhbuild to break.

It seems that you've gone out of your way to allow the build to proceed without Orc, by allowing ./configure to succeed without it and checking generated files into git so that the build will work anyway.  Those files have fallen out of date, causing the build to fail.

If your intention is that Orc is hard-required to build from git then you should probably modify ./configure to require it and/or remove those generated files from git...
Comment 19 Sebastian Dröge (slomo) 2014-01-10 12:21:40 UTC
(In reply to comment #18)
> This is still causing jhbuild to break.
> 
> It seems that you've gone out of your way to allow the build to proceed without
> Orc, by allowing ./configure to succeed without it and checking generated files
> into git so that the build will work anyway.  Those files have fallen out of
> date, causing the build to fail.
> 
> If your intention is that Orc is hard-required to build from git then you
> should probably modify ./configure to require it and/or remove those generated
> files from git...

ORC is not a hard requirement, it's just generally a good idea to use it. The files in GIT are supposed to be kept up-to-date, and in tarballs they are guaranteed to be up-to-date (make dist cares for that).

Is there still a build problem remaining?
Comment 20 Tim-Philipp Müller 2014-01-10 12:22:46 UTC
Ryan, the generated files in git were updated this morning. Did you update your checkout?
Comment 21 Vincent Penquerc'h 2014-01-10 14:00:56 UTC
I see the same orc issue, and I have slomo's commit from this morning. I tried a full autogen/make clean/make to no avail.
In case it matters:
$ orcc --version
Orc Compiler 0.4.16.1
So it should be regenerating them I think, unless we require newer ?
Comment 22 Sebastian Dröge (slomo) 2014-01-10 14:08:16 UTC
*** Bug 721938 has been marked as a duplicate of this bug. ***
Comment 23 Sebastian Dröge (slomo) 2014-01-10 14:09:15 UTC
*sigh* There were changes to the .orc files after the version for which I updated the disted files. Next try:

commit b21a67f99eae83c7ed2b40f7409f67d269d8a6f4
Author: Sebastian Dröge <sebastian@centricular.com>
Date:   Fri Jan 10 15:06:23 2014 +0100

    videoconvert: Update disted orc files once again



(also you need orc >= 0.4.18)
Comment 24 Vincent Penquerc'h 2014-01-10 14:14:13 UTC
This last commit fixed the build here, thanks. Without even having to install some newer orc.
Comment 25 Allison Karlitskaya (desrt) 2014-01-10 14:17:16 UTC
Indeed -- fixed with latest changes for me as well.  Thanks!
Comment 26 Peter Maersk-Moller 2014-04-11 14:56:18 UTC
Checked with GStreamer 1.2.3 and the issue of a hefty performance penalty is still there. Vincent and Ryan reported it as fixed, but apparently that did not make it into the 1.2.3?

Just want to be sure it really is fixed in some future version.
Regards
Peter
Comment 27 Vincent Penquerc'h 2014-04-11 15:05:17 UTC
The latter comments were about the build being broken, sorry if this was unclear.

Looking at git, Wim's patches seem not to be in 1.2.3, but they are in master.
Comment 28 Tim-Philipp Müller 2014-04-11 15:14:57 UTC
The 'target_milestone' field indicates where this is fixed. 1.3.1 means it's only in master, not the 1.2 branch. The patches were too invasive to cherry-pick IIRC.
Comment 29 Peter Maersk-Moller 2014-04-12 09:45:09 UTC
Git Master works nicely. Thanks