After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 759742 - nvenc: very high CPU load when multiple nvenc instances are initialised at the same time
nvenc: very high CPU load when multiple nvenc instances are initialised at th...
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-bad
git master
Other Linux
: Normal normal
: 1.7.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2015-12-21 14:30 UTC by Matthew Marsh
Modified: 2015-12-22 13:51 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[nvenc] fix multiple elements in same pipeline causing high CPU use on initialization (3.07 KB, patch)
2015-12-21 14:53 UTC, Matthew Marsh
none Details | Review
[nvenc] fix multiple elements in same pipeline causing high CPU use on initialization (2.19 KB, patch)
2015-12-22 09:20 UTC, Matthew Marsh
committed Details | Review

Description Matthew Marsh 2015-12-21 14:30:36 UTC
If multiple nvenc endoding elements are used in the same pipeline, then the CPU use is very high on initialisation. 

This can be tested with two pipelines from the command line. One using software encoding, and the other hardware. The hardware version incorrectly uses dramatically more CPU than the software for several minutes on the machine I tested on before it drops down.

The input video is very small ( 50px x 50px ). If I use software encoding ( the x264enc element ), then CPU use is low - around 20% of one core.

If I simply swap the x264enc element with a nvh264enc element, CPU use maxes out at 800% ( this is an 8 core machine ) for several minutes before eventually suddenly dropping to a much more expected use of 8%. 


Software Encoding Command:

LD_LIBRARY_PATH=/usr/local/cuda-7.0/targets/x86_64-linux/lib/:/usr/local/lib/ GST_PLUGIN_PATH=/usr/local/lib/gstreamer-1.5 gst-launch-1.5 videotestsrc is-live=true ! video/x-raw,width=50,height=50,framerate=15/1 ! tee name=tee1 tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200  ! fakesink 

Hardware Encoding Command:

LD_LIBRARY_PATH=/usr/local/cuda-7.0/targets/x86_64-linux/lib/:/usr/local/lib/ GST_PLUGIN_PATH=/usr/local/lib/gstreamer-1.5 gst-launch-1.5 videotestsrc is-live=true ! video/x-raw,width=50,height=50,framerate=15/1 ! tee name=tee1 tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink tee1. ! queue ! nvh264enc  ! fakesink
Comment 1 Matthew Marsh 2015-12-21 14:53:41 UTC
Created attachment 317743 [details] [review]
[nvenc] fix multiple elements in same pipeline causing high CPU use on initialization
Comment 2 Sebastian Dröge (slomo) 2015-12-21 15:20:34 UTC
Review of attachment 317743 [details] [review]:

Thanks for the patch, generally looks good :) Just some minor things

::: sys/nvenc/gstnvbaseenc.c
@@ +876,3 @@
+  /* This lock is needed to prevent the situation where multiple encoder elements
+   * in the same pipeline cause the CPU to spin on initalisation of the elements. */
+  G_LOCK(initialization_lock);

Move this below the variable declarations.

Is there a smaller part of the function that could be protected with the mutex or is it really required to protect everything?

@@ +1073,3 @@
     GST_ERROR_OBJECT (nvenc, "Subclass failed to set output caps");
     /* FIXME: clean up */
+    G_UNLOCK(initialization_lock);

instead of all these unlocks followed by return FALSE, we could also make this a "goto error" and define the error case below the "return TRUE" case at the bottom of the function.

Less risk of someone adding new code that forgets to unlock
Comment 3 Matthew Marsh 2015-12-22 09:20:01 UTC
Created attachment 317776 [details] [review]
[nvenc] fix multiple elements in same pipeline causing high CPU use on initialization
Comment 4 Matthew Marsh 2015-12-22 09:25:48 UTC
After some testing, it appears that the only methods that need to be statically locked to prevent the CPU spike are

NvEncCreateBitstreamBuffer

and something in the initialize_encoder method. 

I read through the NVENC docs as well, but they don't seem to mention any need to do this, so I'm not sure why the fix works... For now I am just happy that it does.
Comment 5 Tim-Philipp Müller 2015-12-22 13:51:01 UTC
Thanks, I've moved the comment about the lock up to where it's defined, but otherwise pushed as-is.

ommit 0e34c02dd60f8d301a1b4c1bbcd3c90ee05bdb0a
Author: Matthew Marsh <matt@stonethree.com>
Date:   Tue Dec 22 11:10:31 2015 +0200

    nvenc: fix high CPU use on initialization of multiple encoders at the same time
    
    We need a static lock to protect various NVENC methods in _set_format(). Without
    this the CPU use increases dramatically on initialisation of the element when
    there are multiple elements being initialised at the same time.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=759742