GNOME Bugzilla – Bug 773681
directsoundsink: High cpu usage on windows x86
Last modified: 2017-05-08 15:34:02 UTC
I'm using gstreamer 1.8 x86 on windows 10 and it uses 100% cpu of one core. I tested 1.9.90 too. The application is basically a playbin so I tried this command and got the same result: gst-launch-1.0.exe playbin uri="http://192.168.1.121:8000" In the mailing list I was asked to provide a profiling result. this the two first lines of the test showing that the cpu is used in libgstdirectsoundsink: Function Name Total CPU (%) Self CPU (%) Total CPU (ms) Self CPU (ms) Module - libgstdirectsoundsink.dll!0x00000067541cfe 61.45 % 0.22 % 3953 14 libgstdirectsoundsink.dll [...] + libgstdirectsoundsink.dll!0x00000067541d15 22.31 % 0.19 % 1435 12 libgstdirectsoundsink.dll
That must be in gst_directsound_sink_write() then. Can you check if something weird happens in that function, like it getting into this loop many times in a row? The same was also already reported recently on the mailing list by someone else.
What should I do? The truth I'm a .net programmer and all I have done in gstreamer is a playbin that plays, stops and changes volume, as a library and program.
You could check what happens in that function there, or wait until someone who has the problem does the same. It doesn't seem to happen everywhere unfortunately and it's not clear to me what happens here. If it worked in an earlier version (did it? 1.6.0 for example?), it would be either bug #748680 or bug #734098. Can you check by copying over the libgstdirectsoundsink.dll from the 1.6.0 release, or 1.4.0 if that doesn't help yet?
Reverting https://cgit.freedesktop.org/gstreamer/gst-plugins-good/commit/?id=51f94288ce66a0a6a28e45801f4e17eb74044298 on top of 1.8.3 fixes the problem here.
Wow, both the code before and after seems odd. I see that DirectSound do have a mechanism to notify playback position reliably. https://msdn.microsoft.com/en-us/library/microsoft.directx_sdk.idirectsoundnotify8.idirectsoundnotify8.setnotificationpositions(v=vs.85).aspx We could use this function, or even better, implement a clock on top of that (async timers could be scheduled this way).
(In reply to Nicolas Dufresne (stormer) from comment #5) > Wow, both the code before and after seems odd. They do, just look at the whole of directsoundsink :) The Windows specific elements need some love in general... and for sound we should also move to wasapi. > We could use this function, or even better, implement a clock on top of that > (async timers could be scheduled this way). The sleep there basically "implements" a GstClock around the system clock as apparently used by DirectSound here... in a not so nice way. The main question here right now is what is wrong with that specific commit? Why does it cause too short sleeps?
Someone who can reproduce the problem would have to look at the values of all the variables in that sleep loop to understand what goes on there.
Is anybody looking into this? We can't fix this for 1.12 unless someone who observes the problem spends some time on finding the root cause.
I don't. But the commit is reverted in msys2 for 5 months now: https://github.com/Alexpux/MINGW-packages/pull/1902 It would be good if it was reverted upstream as well.
Not really, we need to understand what the problem here is. The original commit fixes a real problem, and it seems that nobody is able to reproduce this new bug here (or otherwise someone would've been able to provide the information I asked for in comment 7).
I'm going to look at this briefly, maybe it's related to 781998.
Ok, I got the build stuff working, I changed the log message to be outside of the sleep time check, and I get LOTS of these: 0:00:01.451663549 3220 21e9b20 DEBUG directsoundsink gstdirectsoundsink.c:647:gst_directsound_sink_write:<directsoundsink0> gst_directsound_sink_write: length:1764, FreeBufSiz: 1764, sleepTime: 0, bps: 4, rate: 44100 0:00:01.451835359 3220 21e9b20 DEBUG directsoundsink gstdirectsoundsink.c:647:gst_directsound_sink_write:<directsoundsink0> gst_directsound_sink_write: length:1764, FreeBufSiz: 1764, sleepTime: 0, bps: 4, rate: 44100 I can definitely confirm that the CPU goes to 100%, and that the sink isn't usable in that state (my VM is very unhappy when I played the audio file). As a sidenote, when it's eating all the CPU, then the audio becomes really difficult to get it to stutter (which was the issue reported in 781998).
Oh, and in case it's useful, more information during initialization: 0:00:00.113513868 3220 21e9490 INFO directsoundsink gstdirectsoundsink.c:542:gst_directsound_sink_prepare:<directsoundsink0> GstAudioRingBufferSpec->channels: 2, GstAudioRingBufferSpec->rate: 44100, GstAudioRingBufferSpec->bytes_per_sample: 4 WAVEFORMATEX.nSamplesPerSec: 44100, WAVEFORMATEX.wBitsPerSample: 16, WAVEFORMATEX.nBlockAlign: 4, WAVEFORMATEX.nAvgBytesPerSec: 176400 Size of dsound circular buffer=>35280
I think I've come to the conclusion that the sink doesn't actually need the sleep at all; the sleep is masking the the fact that there's some incorrect information being computed somewhere. It seems like the play cursor is being used incorrectly, see this: https://msdn.microsoft.com/en-us/library/windows/desktop/ee418744(v=vs.85).aspx
... and now I've caught up to the rest of you, and I see why the sleep is necessary. :) I had thought that the gst ring buffer would only feed data as fast as it was needed, but I see that isn't the case at all. As I mentioned in 781998.. setting GST_DEBUG="directsoundsink:5,audiobasesink:5" outputs messages that are basically, queue a few samples, and then call sink_write A LOT of times. If there's no sleep, then 100% CPU happens -- and I think the current version in master does zero sleep a lot. I did my own little rewrite of the write function, and ended up with mostly the same result (but hey, I learned a lot). One thing I noticed is that we don't need the loop inside the write function; just return the number of bytes written (or 0) and the audiosink caller will just keep looping until the buffer is completely written. Makes the code easier to understand too. I think the suggestion to use SetNotificationPositions is the sane way to deal with it. Allocate segment size number of notifications, all using the same event, and then wait for that event if the buffer is full. A simpler solution is anytime a sleep is required, make sure it's always at least some N (maybe 5ms?), otherwise 100% CPU happens.
Instead of a dummy sleep, I think scheduling writes on the clock, base on the remaining time for data to be consumed would be better.
Is there a good example for how one would do that?
(In reply to Nicolas Dufresne (stormer) from comment #16) > Instead of a dummy sleep, I think scheduling writes on the clock, base on > the remaining time for data to be consumed would be better. That's exactly what the sleep() there tries to do. Doing the same thing with a GstClock is not magically solving the problem :) (In reply to Dustin Spicuzza from comment #17) > Is there a good example for how one would do that? gst_clock_id_wait() on a gst_clock_new_single_shot_id() should do the job here. Different than sleep() this also allows cancelling of the waiting from another thread, which is generally a good thing to have. I think what we have to understand here is why we often sleep 0ms, why the calculations believe that immediately data should be available while in reality some waiting has to happen first (but how much?).
Well, in the case of the log message I posted above, length is 1764 and dwFreeBufferSize is 1764. The first part of sleep calculation is (length - dwFreeBufferSize), which of course is zero. If you change the `if (length >= dwFreeBufferSize)` to length > dwFreeBufferSize, the CPU usage goes down to normal. However, I think I would still recommend setting a minimum wait time (5ms?).
If you wait a minimum of 5ms that would a) make latency-time <= 5ms impossible and b) makes things go wrong it we must only wait less than 5ms (due to being scheduled late for example) Can you provide a patch for the above?
Created attachment 351048 [details] [review] Fix high CPU usage and sleep for at least 10ms while looping Trying to optimize the time we sleep for is futile in the case of directsoundsink because the sink position is updated too infrequently (usually only once per segment), so if we sleep for 5ms, we will actually end up looping twice as much as if we sleep for 10ms. This is because the ringbuffer provided by directsound (for both source and sink) is emulated in software. See https://bugzilla.gnome.org/show_bug.cgi?id=781249 for the source case.
Created attachment 351049 [details] [review] Clean-up debug log messages
(In reply to Nicolas Dufresne (stormer) from comment #5) > Wow, both the code before and after seems odd. I see that DirectSound do > have a mechanism to notify playback position reliably. > > https://msdn.microsoft.com/en-us/library/microsoft.directx_sdk. > idirectsoundnotify8.idirectsoundnotify8.setnotificationpositions(v=vs.85). > aspx > > We could use this function, or even better, implement a clock on top of that > (async timers could be scheduled this way). This API is totally useless because you can't schedule notifications during playback (they must be scheduled in stopped state), and while looping in the buffer, the events that you scheduled along the way will only be triggered exactly once. The only real solution is to use (fix) our WASAPI elements and get rid of our directsound elements once they're in a good state.
Created attachment 351051 [details] [review] Fix high CPU usage and sleep for at least 10ms while looping Accidentally used 1ms instead of 10ms for the minimum sleep duration.
I'm surprised that you can't reset the event for the notifcation API and get retriggered that way. I will try this patch tonight, it looks pretty similar to what I have.
Yes, ResetEvent() changes nothing, and letting Windows automatically reset the event on trigger also doesn't reset the event. I also can't find any code anywhere that uses looped capture with events for notifications. Audacity and VLC both poll for playback and capture. On the other hand our WASAPI elements don't work for me at all, so someone would need to spend time on those.
The Chrome WASAPI stuff looks pretty good (and presumably is very battle tested), so if one were going to go that route that's where I would look for example code.
That patch that you posted does solve the high CPU problem.
Review of attachment 351051 [details] [review]: This fixes the problem, please push it, thanks!
Review of attachment 351049 [details] [review]: This seems good.
Created attachment 351318 [details] [review] directsoundsink: use gst_clock_new_single_shot_id instead of Sleep Addition to the other patch, gets rid of the Sleep call and uses a gst clock wait instead. Seems to work slightly better.
commit 2d44b53ff664658ba9a4578b5b3231365c7c8e71 Author: Nirbheek Chauhan <nirbheek@centricular.com> Date: Wed May 3 23:23:10 2017 +0530 directsoundsink: Fix corner case causing large CPU usage We were unnecessarily looping/goto-ing repeatedly when we had exactly the amount of data as the free space, and also when the free space was too small. This, as it turns out, is a very common scenario with Directsound on Windows. Fixes https://bugzilla.gnome.org/show_bug.cgi?id=773681 We have to do polling here because the event notification API that Directsound exposes cannot be used with live playback since all events must be registered in advance with the capture buffer, you cannot add/remove them once playback has begun. Directsoundsrc had the same problem. See also: https://bugzilla.gnome.org/show_bug.cgi?id=781249 commit 0700b1f0b0dd6b986f232c993b20929cee69ed6a Author: Nirbheek Chauhan <nirbheek@centricular.com> Date: Wed May 3 23:31:00 2017 +0530 directsoundsink: Clean up some debug logging Don't need to print the function name, gstreamer does it for you. https://bugzilla.gnome.org/show_bug.cgi?id=773681
Review of attachment 351318 [details] [review]: Generally a good change, thanks! This will make the waiting more granular, we can wait sub-milliseconds with this. ::: sys/directsound/gstdirectsoundsink.c @@ +670,3 @@ + + /* don't bother with the return value as we'll detect reset separately */ + gst_clock_id_wait (clock_id, NULL); You probably want to check the return value. This returns UNSCHEDULED if unscheduled, making the additional reset_while_sleeping boolean useless probably @@ +673,3 @@ + + GST_DSOUND_LOCK (dsoundsink); + dsoundsink->clock_id = NULL; Shouldn't this also get a gst_clock_id_unref()? @@ +725,3 @@ + hRes = IDirectSoundBuffer_Unlock (dsoundsink->pDSBSecondary, pLockedBuffer1, + dwSizeBuffer1, pLockedBuffer2, dwSizeBuffer2); Why do you move this around?
Would it make sense to create a single shot id entry once and then just use _reinit() for scheduling things to avoid constant malloc/free churn?
Created attachment 351355 [details] [review] Updates prior patch to use gst_clock_single_shot_id_reinit Added requested changes.
Tim, I added that, thanks. Sebastian, the return value from gst_clock_id_wait isn't useful in this case because a reset could occur between the time that we come back from gst_clock_id_wait and obtaining the lock again. Safer to only inspect the variable while holding the lock instead. Oh, and I moved the unlock so that we would release it slightly sooner.
commit a303a7971c29db5599b34e82ef96565a1f58abee Author: Dustin Spicuzza <dustin@virtualroadside.com> Date: Mon May 8 15:22:00 2017 +0000 directsoundsink: Use GstClock API instead of Sleep() for waiting It's more accurate and allows cancellation. https://bugzilla.gnome.org/show_bug.cgi?id=773681