GNOME Bugzilla – Bug 770225
multiqueue: High CPU usage with multiple audio tracks in playbin3
Last modified: 2016-08-30 09:30:16 UTC
Dear Edward Hervey, Tim-Philipp Müller and Jan Schmidt In playbin3(decodebin3) with GST v1.9.1, CPU usage is more high compared with playbin2(decodebin2) when multiple audio track contents. I tested 1-video and 8-audio stream by launching gst-launch. Multiqueue is set as below in playbin3. => g_object_set (dbin->multiqueue, "sync-by-running-time", TRUE, "use-interleave", TRUE, "max-size-buffers", 0, NULL); In that case, CPU usage is around 210%. And most of them are related with srcpads of multiqueue. In playbin2(decodebin2), CPU usage is around 15%. And I turned off the 'sync-by-running-time' in MQ and CPU usage is similar with playbin2. I am guessing 'sync-by-running-time' mode consumes much more CPU usage than playbin2. 1) Do you have similar phenomenon? 2) Do you have a plan to reduce CPU usage when using "sync-by-running-time"? 3) May I get any opinions or guide to reduce CPU usage? Please refer https://harmony.lge.com:8443/issue/browse/CTRGST-288 or http://gstreamer-devel.966125.n4.nabble.com/High-CPU-usage-in-multiple-audio-track-in-playbin3-td4679179.html I am always welcome any questions and opinions. Thanks.
Created attachment 333974 [details] [review] multiqueue: Re-compute the high time outside of loop
Dear All I found out that to re-computing the high time after waking up in while loop causes high CPU usage. I confirmed that CPU usage is similar with playbin2(non 'sync-by-running-time' mode) after applying this patch. Please check and review this patch. I am always welcome any questions, opinions and ideas. Thanks.
Hi, While the intention of the patch is correct, it doesn't take into account the "high_time by group" but only the global high_time.
Note that compute_high_id also does also introduce a noticeable overheader in that loop. So the idea here should be to: 1) only call compute_high_{time|id} when one of the parameters involved in the calculation changed. 2) And only call that function from wherever one of the values changed (that waiting loop is *not* modifying any of the parameters involved) Readers that don't modify the values (such as that waiting loop function) would only need to check against the stored value. Values involved in both compute_high_id and compute_high_time: * single queues being added/removed * single queue nextid/oldid being updated * single queue flow return *changing* (to/from GST_FLOW_NOT_LINKED or GST_FLOW_EOS) * after pushing a GST_EVENT_EOS * after flushing Values involved in compute_high_id: * single queue nextid/oldid being updated Values involved in compute_high_time: * single queue next_time/last_time being updated The potentially tricky part is regarding the groupid feature for compute_high_time. We might to store the high_time by groupid in addition (maybe with a lookup table)
After digging deeper, moving the computation out of the loop still results in very high CPU usage (playing a 8 video/8 audio file, results in 212% cpu usage of which only 1% is video decoding). The culprit seems to be an insane amount of futex usage.
And after digging even deeper, the real culprit was ... wake_up_next_non_linked doing wrong calculations/checks. I'll provide a patch soon
Hello Edward. Thanks for your support. I am looking forward to get the patch. Thanks.
Created attachment 334122 [details] [review] multiqueue: Fix high_time wakeup logic When calculating the high_time, cache the group value in each singlequeue. This fixes the issue by which wake_up_next_non_linked() would use the global high-time to decide whether to wake-up a waiting thread, instead of the group one, resulting in those threads constantly spinning. Tidy up a bit the waiting logic while we're at it.
With this patch, we go from 212% playing a 8 audio / 8 video file down to less than 10% (most of it being the video decoding).
Hello Edward. Awesome!! It works fine in desktop version and our hardware target after applying your patch. And I confirmed that CPU usage much more was reduced. Also, changing track for audio works well. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ 2408 hoonhee+ 20 0 1850508 524488 65316 R 12.9 1.6 333:04.86 compiz 10209 hoonhee+ 20 0 2128104 146252 26348 S 10.6 0.4 0:11.80 lt-gst-play-1.0 Thanks for your effort.
Attachment 334122 [details] pushed as 3117525 - multiqueue: Fix high_time wakeup logic