GNOME Bugzilla – Bug 759742
nvenc: very high CPU load when multiple nvenc instances are initialised at the same time
Last modified: 2015-12-22 13:51:31 UTC
If multiple nvenc endoding elements are used in the same pipeline, then the CPU use is very high on initialisation. This can be tested with two pipelines from the command line. One using software encoding, and the other hardware. The hardware version incorrectly uses dramatically more CPU than the software for several minutes on the machine I tested on before it drops down. The input video is very small ( 50px x 50px ). If I use software encoding ( the x264enc element ), then CPU use is low - around 20% of one core. If I simply swap the x264enc element with a nvh264enc element, CPU use maxes out at 800% ( this is an 8 core machine ) for several minutes before eventually suddenly dropping to a much more expected use of 8%. Software Encoding Command: LD_LIBRARY_PATH=/usr/local/cuda-7.0/targets/x86_64-linux/lib/:/usr/local/lib/ GST_PLUGIN_PATH=/usr/local/lib/gstreamer-1.5 gst-launch-1.5 videotestsrc is-live=true ! video/x-raw,width=50,height=50,framerate=15/1 ! tee name=tee1 tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink tee1. ! queue ! x264enc tune=4 bitrate=200 ! fakesink Hardware Encoding Command: LD_LIBRARY_PATH=/usr/local/cuda-7.0/targets/x86_64-linux/lib/:/usr/local/lib/ GST_PLUGIN_PATH=/usr/local/lib/gstreamer-1.5 gst-launch-1.5 videotestsrc is-live=true ! video/x-raw,width=50,height=50,framerate=15/1 ! tee name=tee1 tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink tee1. ! queue ! nvh264enc ! fakesink
Created attachment 317743 [details] [review] [nvenc] fix multiple elements in same pipeline causing high CPU use on initialization
Review of attachment 317743 [details] [review]: Thanks for the patch, generally looks good :) Just some minor things ::: sys/nvenc/gstnvbaseenc.c @@ +876,3 @@ + /* This lock is needed to prevent the situation where multiple encoder elements + * in the same pipeline cause the CPU to spin on initalisation of the elements. */ + G_LOCK(initialization_lock); Move this below the variable declarations. Is there a smaller part of the function that could be protected with the mutex or is it really required to protect everything? @@ +1073,3 @@ GST_ERROR_OBJECT (nvenc, "Subclass failed to set output caps"); /* FIXME: clean up */ + G_UNLOCK(initialization_lock); instead of all these unlocks followed by return FALSE, we could also make this a "goto error" and define the error case below the "return TRUE" case at the bottom of the function. Less risk of someone adding new code that forgets to unlock
Created attachment 317776 [details] [review] [nvenc] fix multiple elements in same pipeline causing high CPU use on initialization
After some testing, it appears that the only methods that need to be statically locked to prevent the CPU spike are NvEncCreateBitstreamBuffer and something in the initialize_encoder method. I read through the NVENC docs as well, but they don't seem to mention any need to do this, so I'm not sure why the fix works... For now I am just happy that it does.
Thanks, I've moved the comment about the lock up to where it's defined, but otherwise pushed as-is. ommit 0e34c02dd60f8d301a1b4c1bbcd3c90ee05bdb0a Author: Matthew Marsh <matt@stonethree.com> Date: Tue Dec 22 11:10:31 2015 +0200 nvenc: fix high CPU use on initialization of multiple encoders at the same time We need a static lock to protect various NVENC methods in _set_format(). Without this the CPU use increases dramatically on initialisation of the element when there are multiple elements being initialised at the same time. https://bugzilla.gnome.org/show_bug.cgi?id=759742