After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 785092 - 20xH264 Video Decode Render-less CPU Usage spike up to ~70% with flag sync=false
20xH264 Video Decode Render-less CPU Usage spike up to ~70% with flag sync=false
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gstreamer-vaapi
unspecified
Other Linux
: Low enhancement
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-07-19 01:59 UTC by Peng.Chen
Modified: 2018-11-03 15:50 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
a patch to fix this issue (1.21 KB, patch)
2017-07-19 02:49 UTC, Peng.Chen
none Details | Review

Description Peng.Chen 2017-07-19 01:59:46 UTC
1. Create a simple bash script with the 20x gstreamer execution command ( gst-launch-1.0 -v filesrc location=/videos/1920x1080_10mbps_30fps.mp4 ! qtdemux ! vaapidecode ! fpsdisplaysink video-sink=fakesink text-overlay=false sync=false)
2. Observed the  CPU usage spike up to 70%
3, Customer is expecting the CPU utilization result should not be so high if sync=false.Need to further to debug to find out the reason and root cause
Comment 1 Matthew Waters (ystreet00) 2017-07-19 02:28:43 UTC
Setting sync=false on sink elements disables any synchronization that would rate limit the processing.  Therefore that command is decoding frames as fast as is possible on the hardware and is the cause of the higher CPU usage.  As a result, this is all behaving as expected.
Comment 2 Peng.Chen 2017-07-19 02:47:40 UTC
yes, you are right. And I have a patch to enhance this multiple channel encoding usage and fix this issue. I am appreciate that you could help me to review it. In my test, CPU usage could drop to 10% with nearly the same FPS for each channel. But for 1 channel, fps drop from 900+ to 500+, and CPU usage drop from 14% to 7% on my platform
Comment 3 Peng.Chen 2017-07-19 02:49:23 UTC
Created attachment 355910 [details] [review]
a patch to fix this issue
Comment 4 Víctor Manuel Jáquez Leal 2017-07-19 10:19:46 UTC
this is a performance tuning for a very specific use case (20x decoding pipelines) meanwhile, if I understand correctly, downgrades the "normal" use case (single pipeline decoding)

It would be great to look another approach without the penalty for the most common use case.
Comment 5 Nicolas Dufresne (ndufresne) 2017-07-19 12:47:23 UTC
If I understand well, this patch makes the offline processing significantly slower. Why don't you configure your process with lower priority instead?
Comment 6 Nicolas Dufresne (ndufresne) 2017-07-19 12:49:09 UTC
Also note that in theory, fakesink will result into a frame copy (when this get fixed), so this will be a bad perf test.
Comment 7 sreerenj 2017-07-19 19:23:42 UTC
This bug has been reported internally and we did some investigations too.

Without Peng's patch, the kernel does implicit syncing. Peng's patch is adding explicit syncing. Ideally, both should behave similarly but there seem to be some differences in Kernel:

Let me copy & paste Peng's comment on this:

"the root cause of this issue is that Linux kernel i95 driver uses spin not sleep to implement the wait for some operating GEM Buffer Objects to be un busy.
If the explicit sync/wait/map() functions are added during the decoding, it replaces the spin with sleep and saves the CPU usages"
Comment 8 Peng.Chen 2017-07-20 00:58:13 UTC
Thanks for all your comments.  20x decoding is mainly used for transcoding or video wall application. I think this patch will help a lot for those use cases. And for 1 channel, the most common use case should be a video player. 60~ fps should be enough for most player. this patch doesn't have much effects on the player even if it drops decoding FPS.
Comment 9 Nicolas Dufresne (ndufresne) 2017-07-20 03:04:24 UTC
Still, from your report "But for 1 channel, fps drop from 900+ to 500+", that means that if you are transcoding 1 stream in your use case, your application will be 44% slower. That's a massive cut. If this is a kernel bug, why are you looking for a solution in GStreamer, you should find a solution on the driver side ?
Comment 10 Peng.Chen 2017-07-20 06:05:26 UTC
for transcoding 1 channel, the bottleneck should be encoding, it can't achieve such FPS like decoding. I am assuming that it is a mutex lock strategy in kernel, mutex_spin_on_owner() occupies a lot of CPU time in this use case. This mutex_spin_on_owner() just means that another CPU on the system is using the lock so it decided to spin instead of sleep.  Till now, the only solution we known is add sync or wait in driver or middleware. We need to make a decision where to put this sync. Definitely, it can be added in driver, but it will always have this sync for decoding, middleware can't choose disabling this sync for some special use case. So do you have some better idea?
Comment 11 GStreamer system administrator 2018-11-03 15:50:26 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gstreamer-vaapi/issues/59.