Bug 773681 – directsoundsink: High cpu usage on windows x86

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 773681 - directsoundsink: High cpu usage on windows x86


Summary:	directsoundsink: High cpu usage on windows x86


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-bad
Version:	1.x
Hardware:	Other Windows

Importance:	Normal blocker
Target Milestone:	1.12.1
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-10-29 21:27 UTC by Jesus
Modified:	2017-05-08 15:34 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Fix high CPU usage and sleep for at least 10ms while looping (4.46 KB, patch) 2017-05-04 13:26 UTC, Nirbheek Chauhan	none	Details \| Review
Clean-up debug log messages (4.11 KB, patch) 2017-05-04 13:26 UTC, Nirbheek Chauhan	committed	Details \| Review
Fix high CPU usage and sleep for at least 10ms while looping (4.48 KB, patch) 2017-05-04 13:41 UTC, Nirbheek Chauhan	committed	Details \| Review
directsoundsink: use gst_clock_new_single_shot_id instead of Sleep (4.25 KB, patch) 2017-05-08 00:59 UTC, Dustin Spicuzza	none	Details \| Review
Updates prior patch to use gst_clock_single_shot_id_reinit (4.94 KB, patch) 2017-05-08 15:22 UTC, Dustin Spicuzza	committed	Details \| Review

Description Jesus 2016-10-29 21:27:47 UTC

I'm using gstreamer 1.8 x86 on windows 10 and it uses 100% cpu of one core.
I tested 1.9.90 too.
The application is basically a playbin so I tried this command and got the same result:

gst-launch-1.0.exe playbin uri="http://192.168.1.121:8000"

In the mailing list I was asked to provide a profiling result. this the two first lines of the test showing that the cpu is used in libgstdirectsoundsink:

Function Name Total CPU (%) Self CPU (%) Total CPU (ms) Self CPU (ms) Module
- libgstdirectsoundsink.dll!0x00000067541cfe 61.45 % 0.22 % 3953 14 libgstdirectsoundsink.dll
[...]
+ libgstdirectsoundsink.dll!0x00000067541d15 22.31 % 0.19 % 1435 12 libgstdirectsoundsink.dll

Comment 1 Sebastian Dröge (slomo) 2016-10-31 06:56:04 UTC

That must be in gst_directsound_sink_write() then. Can you check if something weird happens in that function, like it getting into this loop many times in a row?

The same was also already reported recently on the mailing list by someone else.

Comment 2 Jesus 2016-11-01 21:46:14 UTC

What should I do?
The truth I'm a .net programmer and all I have done in gstreamer is a playbin that plays, stops and changes volume, as a library and program.

Comment 3 Sebastian Dröge (slomo) 2016-11-02 07:46:18 UTC

You could check what happens in that function there, or wait until someone who has the problem does the same. It doesn't seem to happen everywhere unfortunately and it's not clear to me what happens here.


If it worked in an earlier version (did it? 1.6.0 for example?), it would be either bug #748680 or bug #734098. Can you check by copying over the libgstdirectsoundsink.dll from the 1.6.0 release, or 1.4.0 if that doesn't help yet?

Comment 4 Christoph Reiter (lazka) 2016-11-06 14:12:55 UTC

Reverting https://cgit.freedesktop.org/gstreamer/gst-plugins-good/commit/?id=51f94288ce66a0a6a28e45801f4e17eb74044298 on top of 1.8.3 fixes the problem here.

Comment 5 Nicolas Dufresne (ndufresne) 2016-11-06 14:23:40 UTC

Wow, both the code before and after seems odd. I see that DirectSound do have a mechanism to notify playback position reliably.

https://msdn.microsoft.com/en-us/library/microsoft.directx_sdk.idirectsoundnotify8.idirectsoundnotify8.setnotificationpositions(v=vs.85).aspx

We could use this function, or even better, implement a clock on top of that (async timers could be scheduled this way).

Comment 6 Sebastian Dröge (slomo) 2016-11-07 09:41:31 UTC

(In reply to Nicolas Dufresne (stormer) from comment #5)
> Wow, both the code before and after seems odd.

They do, just look at the whole of directsoundsink :) The Windows specific elements need some love in general... and for sound we should also move to wasapi.

> We could use this function, or even better, implement a clock on top of that
> (async timers could be scheduled this way).

The sleep there basically "implements" a GstClock around the system clock as apparently used by DirectSound here... in a not so nice way.


The main question here right now is what is wrong with that specific commit? Why does it cause too short sleeps?

Comment 7 Sebastian Dröge (slomo) 2016-11-08 12:36:05 UTC

Someone who can reproduce the problem would have to look at the values of all the variables in that sleep loop to understand what goes on there.

Comment 8 Sebastian Dröge (slomo) 2017-04-09 07:49:46 UTC

Is anybody looking into this? We can't fix this for 1.12 unless someone who observes the problem spends some time on finding the root cause.

Comment 9 Christoph Reiter (lazka) 2017-04-09 07:54:21 UTC

I don't. But the commit is reverted in msys2 for 5 months now: https://github.com/Alexpux/MINGW-packages/pull/1902 It would be good if it was reverted upstream as well.

Comment 10 Sebastian Dröge (slomo) 2017-04-09 08:05:59 UTC

Not really, we need to understand what the problem here is. The original commit fixes a real problem, and it seems that nobody is able to reproduce this new bug here (or otherwise someone would've been able to provide the information I asked for in comment 7).

Comment 11 Dustin Spicuzza 2017-05-01 01:23:40 UTC

I'm going to look at this briefly, maybe it's related to 781998.

Comment 12 Dustin Spicuzza 2017-05-01 02:05:55 UTC

Ok, I got the build stuff working, I changed the log message to be outside of the sleep time check, and I get LOTS of these:

0:00:01.451663549  3220    21e9b20 DEBUG        directsoundsink gstdirectsoundsink.c:647:gst_directsound_sink_write:<directsoundsink0> gst_directsound_sink_write: length:1764, FreeBufSiz: 1764, sleepTime: 0, bps: 4, rate: 44100
0:00:01.451835359  3220    21e9b20 DEBUG        directsoundsink gstdirectsoundsink.c:647:gst_directsound_sink_write:<directsoundsink0> gst_directsound_sink_write: length:1764, FreeBufSiz: 1764, sleepTime: 0, bps: 4, rate: 44100

I can definitely confirm that the CPU goes to 100%, and that the sink isn't usable in that state (my VM is very unhappy when I played the audio file).

As a sidenote, when it's eating all the CPU, then the audio becomes really difficult to get it to stutter (which was the issue reported in 781998).

Comment 13 Dustin Spicuzza 2017-05-01 02:12:22 UTC

Oh, and in case it's useful, more information during initialization:

0:00:00.113513868  3220    21e9490 INFO         directsoundsink gstdirectsoundsink.c:542:gst_directsound_sink_prepare:<directsoundsink0> GstAudioRingBufferSpec->channels: 2, GstAudioRingBufferSpec->rate: 44100, GstAudioRingBufferSpec->bytes_per_sample: 4
WAVEFORMATEX.nSamplesPerSec: 44100, WAVEFORMATEX.wBitsPerSample: 16, WAVEFORMATEX.nBlockAlign: 4, WAVEFORMATEX.nAvgBytesPerSec: 176400
Size of dsound circular buffer=>35280

Comment 14 Dustin Spicuzza 2017-05-01 04:51:56 UTC

I think I've come to the conclusion that the sink doesn't actually need the sleep at all; the sleep is masking the the fact that there's some incorrect information being computed somewhere. It seems like the play cursor is being used incorrectly, see this: https://msdn.microsoft.com/en-us/library/windows/desktop/ee418744(v=vs.85).aspx

Comment 15 Dustin Spicuzza 2017-05-01 08:20:19 UTC

... and now I've caught up to the rest of you, and I see why the sleep is necessary. :) I had thought that the gst ring buffer would only feed data as fast as it was needed, but I see that isn't the case at all.

As I mentioned in 781998.. setting GST_DEBUG="directsoundsink:5,audiobasesink:5" outputs messages that are basically, queue a few samples, and then call sink_write A LOT of times. If there's no sleep, then 100% CPU happens -- and I think the current version in master does zero sleep a lot.

I did my own little rewrite of the write function, and ended up with mostly the same result (but hey, I learned a lot). One thing I noticed is that we don't need the loop inside the write function; just return the number of bytes written (or 0) and the audiosink caller will just keep looping until the buffer is completely written. Makes the code easier to understand too.

I think the suggestion to use SetNotificationPositions is the sane way to deal with it. Allocate segment size number of notifications, all using the same event, and then wait for that event if the buffer is full.

A simpler solution is anytime a sleep is required, make sure it's always at least some N (maybe 5ms?), otherwise 100% CPU happens.

Comment 16 Nicolas Dufresne (ndufresne) 2017-05-01 15:21:26 UTC

Instead of a dummy sleep, I think scheduling writes on the clock, base on the remaining time for data to be consumed would be better.

Comment 17 Dustin Spicuzza 2017-05-01 15:33:25 UTC

Is there a good example for how one would do that?

Comment 18 Sebastian Dröge (slomo) 2017-05-01 22:36:59 UTC

(In reply to Nicolas Dufresne (stormer) from comment #16)
> Instead of a dummy sleep, I think scheduling writes on the clock, base on
> the remaining time for data to be consumed would be better.

That's exactly what the sleep() there tries to do. Doing the same thing with a GstClock is not magically solving the problem :)

(In reply to Dustin Spicuzza from comment #17)
> Is there a good example for how one would do that?

gst_clock_id_wait() on a gst_clock_new_single_shot_id() should do the job here. Different than sleep() this also allows cancelling of the waiting from another thread, which is generally a good thing to have.


I think what we have to understand here is why we often sleep 0ms, why the calculations believe that immediately data should be available while in reality some waiting has to happen first (but how much?).

Comment 19 Dustin Spicuzza 2017-05-03 06:36:05 UTC

Well, in the case of the log message I posted above, length is 1764 and dwFreeBufferSize is 1764. The first part of sleep calculation is (length - dwFreeBufferSize), which of course is zero.

If you change the `if (length >= dwFreeBufferSize)` to length > dwFreeBufferSize, the CPU usage goes down to normal. However, I think I would still recommend setting a minimum wait time (5ms?).

Comment 20 Sebastian Dröge (slomo) 2017-05-03 07:51:46 UTC

If you wait a minimum of 5ms that would a) make latency-time <= 5ms impossible and b) makes things go wrong it we must only wait less than 5ms (due to being scheduled late for example)


Can you provide a patch for the above?

Comment 21 Nirbheek Chauhan 2017-05-04 13:26:07 UTC

Created attachment 351048 [details] [review]
Fix high CPU usage and sleep for at least 10ms while looping

Trying to optimize the time we sleep for is futile in the case of directsoundsink because the sink position is updated too infrequently (usually only once per segment), so if we sleep for 5ms, we will actually end up looping twice as much as if we sleep for 10ms.

This is because the ringbuffer provided by directsound (for both source and sink) is emulated in software. See https://bugzilla.gnome.org/show_bug.cgi?id=781249 for the source case.

Comment 22 Nirbheek Chauhan 2017-05-04 13:26:37 UTC

Created attachment 351049 [details] [review]
Clean-up debug log messages

Comment 23 Nirbheek Chauhan 2017-05-04 13:31:17 UTC

(In reply to Nicolas Dufresne (stormer) from comment #5)
> Wow, both the code before and after seems odd. I see that DirectSound do
> have a mechanism to notify playback position reliably.
> 
> https://msdn.microsoft.com/en-us/library/microsoft.directx_sdk.
> idirectsoundnotify8.idirectsoundnotify8.setnotificationpositions(v=vs.85).
> aspx
> 
> We could use this function, or even better, implement a clock on top of that
> (async timers could be scheduled this way).

This API is totally useless because you can't schedule notifications during playback (they must be scheduled in stopped state), and while looping in the buffer, the events that you scheduled along the way will only be triggered exactly once.

The only real solution is to use (fix) our WASAPI elements and get rid of our directsound elements once they're in a good state.

Comment 24 Nirbheek Chauhan 2017-05-04 13:41:28 UTC

Created attachment 351051 [details] [review]
Fix high CPU usage and sleep for at least 10ms while looping

Accidentally used 1ms instead of 10ms for the minimum sleep duration.

Comment 25 Dustin Spicuzza 2017-05-04 17:33:12 UTC

I'm surprised that you can't reset the event for the notifcation API and get retriggered that way.

I will try this patch tonight, it looks pretty similar to what I have.

Comment 26 Nirbheek Chauhan 2017-05-04 17:49:33 UTC

Yes, ResetEvent() changes nothing, and letting Windows automatically reset the event on trigger also doesn't reset the event. I also can't find any code anywhere that uses looped capture with events for notifications. Audacity and VLC both poll for playback and capture.

On the other hand our WASAPI elements don't work for me at all, so someone would need to spend time on those.

Comment 27 Dustin Spicuzza 2017-05-04 17:57:31 UTC

The Chrome WASAPI stuff looks pretty good (and presumably is very battle tested), so if one were going to go that route that's where I would look for example code.

Comment 28 Dustin Spicuzza 2017-05-05 02:05:11 UTC

That patch that you posted does solve the high CPU problem.

Comment 29 Dustin Spicuzza 2017-05-06 06:10:24 UTC

Review of attachment 351051 [details] [review]:

This fixes the problem, please push it, thanks!

Comment 30 Dustin Spicuzza 2017-05-06 06:11:08 UTC

Review of attachment 351049 [details] [review]:

This seems good.

Comment 31 Dustin Spicuzza 2017-05-08 00:59:00 UTC

Created attachment 351318 [details] [review]
directsoundsink: use gst_clock_new_single_shot_id instead of Sleep

Addition to the other patch, gets rid of the Sleep call and uses a gst clock wait instead. Seems to work slightly better.

Comment 32 Sebastian Dröge (slomo) 2017-05-08 14:05:54 UTC

commit 2d44b53ff664658ba9a4578b5b3231365c7c8e71
Author: Nirbheek Chauhan <nirbheek@centricular.com>
Date:   Wed May 3 23:23:10 2017 +0530

    directsoundsink: Fix corner case causing large CPU usage
    
    We were unnecessarily looping/goto-ing repeatedly when we had exactly
    the amount of data as the free space, and also when the free space was
    too small. This, as it turns out, is a very common scenario with
    Directsound on Windows.
    
    Fixes https://bugzilla.gnome.org/show_bug.cgi?id=773681
    
    We have to do polling here because the event notification API that
    Directsound exposes cannot be used with live playback since all events
    must be registered in advance with the capture buffer, you cannot
    add/remove them once playback has begun. Directsoundsrc had the same
    problem.
    
    See also: https://bugzilla.gnome.org/show_bug.cgi?id=781249

commit 0700b1f0b0dd6b986f232c993b20929cee69ed6a
Author: Nirbheek Chauhan <nirbheek@centricular.com>
Date:   Wed May 3 23:31:00 2017 +0530

    directsoundsink: Clean up some debug logging
    
    Don't need to print the function name, gstreamer does it for you.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=773681

Comment 33 Sebastian Dröge (slomo) 2017-05-08 14:07:58 UTC

Review of attachment 351318 [details] [review]:

Generally a good change, thanks! This will make the waiting more granular, we can wait sub-milliseconds with this.

::: sys/directsound/gstdirectsoundsink.c
@@ +670,3 @@
+
+      /* don't bother with the return value as we'll detect reset separately */
+      gst_clock_id_wait (clock_id, NULL);

You probably want to check the return value. This returns UNSCHEDULED if unscheduled, making the additional reset_while_sleeping boolean useless probably

@@ +673,3 @@
+
+      GST_DSOUND_LOCK (dsoundsink);
+      dsoundsink->clock_id = NULL;

Shouldn't this also get a gst_clock_id_unref()?

@@ +725,3 @@
 
+    hRes = IDirectSoundBuffer_Unlock (dsoundsink->pDSBSecondary, pLockedBuffer1,
+        dwSizeBuffer1, pLockedBuffer2, dwSizeBuffer2);

Why do you move this around?

Comment 34 Tim-Philipp Müller 2017-05-08 14:21:34 UTC

Would it make sense to create a single shot id entry once and then just use _reinit() for scheduling things to avoid constant malloc/free churn?

Comment 35 Dustin Spicuzza 2017-05-08 15:22:27 UTC

Created attachment 351355 [details] [review]
Updates prior patch to use gst_clock_single_shot_id_reinit

Added requested changes.

Comment 36 Dustin Spicuzza 2017-05-08 15:25:50 UTC

Tim, I added that, thanks.

Sebastian, the return value from gst_clock_id_wait isn't useful in this case because a reset could occur between the time that we come back from gst_clock_id_wait and obtaining the lock again. Safer to only inspect the variable while holding the lock instead.

Oh, and I moved the unlock so that we would release it slightly sooner.

Comment 37 Sebastian Dröge (slomo) 2017-05-08 15:33:56 UTC

commit a303a7971c29db5599b34e82ef96565a1f58abee
Author: Dustin Spicuzza <dustin@virtualroadside.com>
Date:   Mon May 8 15:22:00 2017 +0000

    directsoundsink: Use GstClock API instead of Sleep() for waiting
    
    It's more accurate and allows cancellation.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=773681