After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 734679 - videobox: Much slower than videocrop
videobox: Much slower than videocrop
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other Linux
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2014-08-12 16:42 UTC by Stirling Westrup
Modified: 2018-11-03 14:53 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
callgrind output for the pipeline mentioned (461.47 KB, application/gzip)
2014-08-13 16:11 UTC, Stirling Westrup
  Details
videobox: Don't do matrix multiplication unless required (100.50 KB, patch)
2014-08-14 09:10 UTC, Sebastian Dröge (slomo)
none Details | Review

Description Stirling Westrup 2014-08-12 16:42:26 UTC
I have an application that needs to do both video cropping and padding and which works with 4K videos. We chose videobox for the task because its the only element that does reasonable padding and the fact that it can also crop was a major bonus.

However, in time trials we've found that videobox is much, MUCH slower than videocrop, even when its just cropping. For example, using gstreamer 1.4.0, this pipeline runs on a stock ivy-bridge server driving a SMSC zero client display at about 45 fps:

gst-launch-1.0 -v filesrc location=Sintel-4K.mkv ! decodebin ! videoconvert ! videocrop left=100 ! videoscale ! fpsdisplaysink sync=false video-sink="xvimagesink display=:2"

The same pipeline, with videobox used instead of videocrop gives us an average of about 20 fps, which is a huge drop. Worse the video in question is 24 fps, so with videobox we cannot play realtime without dropping frames.
Comment 1 Sebastian Dröge (slomo) 2014-08-12 20:27:19 UTC
That might be because videobox is interpolating the I420 (or any other subsampled YUV format) planes properly while videocrop is just offsetting them.
Comment 2 Sebastian Dröge (slomo) 2014-08-13 14:33:24 UTC
Did you already check with perf or callgrind where all the time is spent?
Comment 3 Stirling Westrup 2014-08-13 16:11:42 UTC
Created attachment 283313 [details]
callgrind output for the pipeline mentioned

Generating command:

$ valgrind --tool=callgrind gst-launch-1.0 -v filesrc location=~/Videos/4K/Sintel-4K.mkv ! decodebin ! videoconvert ! videobox left=100 ! videoscale ! fpsdisplaysink sync=false video-sink="xvimagesink display=:2"
Comment 4 Stirling Westrup 2014-08-13 16:13:51 UTC
I've not used callgrind before, and so I'm not entirely sure how to use its output, but the results seem to indicate that the vast majority of the CPU was spent in a routine called "gst_video_filter_transform <cycle 7>"

I've attached the callgrind output in case that helps.
Comment 5 Sebastian Dröge (slomo) 2014-08-14 07:15:22 UTC
You can open it in e.g. kcachegrind for working visually with it.

So in summary:
- videoscale and decoder both take about 5-6 billion instructions
- videobox takes 20 billion instructions in its transform function, all of that in copy_i420_i420

Time to optimize that function :)
Comment 6 Sebastian Dröge (slomo) 2014-08-14 07:18:04 UTC
It's doing matrix multiplication to convert between SDTV and HDTV YUV, and if no conversion is necessary it multiplies with the identity matrix.
Comment 7 Sebastian Dröge (slomo) 2014-08-14 07:19:18 UTC
That's something that happens in all the YUV functions btw
Comment 8 Sebastian Dröge (slomo) 2014-08-14 09:10:16 UTC
Created attachment 283371 [details] [review]
videobox: Don't do matrix multiplication unless required
Comment 9 Sebastian Dröge (slomo) 2014-08-14 09:11:36 UTC
I tried the attached approach to get rid of the matrix multiplication, assuming that the compiler would optimize away the branches in the new macros. Benchmarking showed that it makes almost no difference.

Any further ideas?
Comment 10 Stirling Westrup 2014-08-19 20:29:06 UTC
I don't know if it will help, but my boss has authorized me to put a $500 (Canadian) Bug Bounty on this bug, for anyone who can get this working on 4K video (3840x2160) at 30fps by Midnight on August 28, Calgary Time (MST=UCT-7)

We'll do the testing on our stock Haswell i7 server. 

If you're interested in attempting the bounty, please contact us first, as we'd like to keep track on who all is interested in working on this. Send the emails to my work email: stirling@userful.com

Thanks!
Comment 11 Wim Taymans 2015-03-05 09:48:19 UTC
If you apply the patch of Bug 737401 it will use video-converter and be slightly faster than the old videobox.
Comment 12 Stirling Westrup 2015-03-05 19:14:04 UTC
We've abandoned video box and have written a one-pass video cropper, scaler, rotator and color space converter. We'll probably be releasing it soon, once the bugs are out.
Comment 13 Tim-Philipp Müller 2015-03-05 19:23:06 UTC
> We have written a one-pass video cropper,
> scaler, rotator and color space converter.

Ah, so basically like the new video converter API in -base (plus rotation though).
Comment 14 Stirling Westrup 2015-03-05 23:31:20 UTC
(In reply to Tim-Philipp Müller from comment #13)
> > We have written a one-pass video cropper,
> > scaler, rotator and color space converter.
> 
> Ah, so basically like the new video converter API in -base (plus rotation
> though).

I guess, although our algorithm is a heck of a lot simpler and involves no matrices. We just have an AVX2-enabled loop in which we, for each output frame pixel, read an input frame pixel, convert it to the output colorspace, and write it out.

We have one such loop for each pair of colorspaces we support.
Comment 15 Sebastian Dröge (slomo) 2015-04-13 07:03:24 UTC
Should we close this bug then?
Comment 16 Stirling Westrup 2015-04-13 18:02:54 UTC
As far as I know, bug is still true, but it no longer matters to me. If you feel it worth pursing as a bug, keep it open. Otherwise close it.
Comment 17 Tapas Kumar Kundu 2018-01-29 15:50:49 UTC
(In reply to Stirling Westrup from comment #12)
> We've abandoned video box and have written a one-pass video cropper, scaler,
> rotator and color space converter. We'll probably be releasing it soon, once
> the bugs are out.

Could you please share this code with me ? I really need this and I am also facing same issue
Comment 18 GStreamer system administrator 2018-11-03 14:53:51 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/126.