After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 764415 - Very High CPU usage in g_poll() Windows implementation
Very High CPU usage in g_poll() Windows implementation
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: win32
2.46.x
Other Windows
: Normal major
: ---
Assigned To: gtk-win32 maintainers
gtk-win32 maintainers
Depends on:
Blocks:
 
 
Reported: 2016-03-31 14:07 UTC by giacomopoz
Modified: 2016-09-17 09:39 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
W32: Do not ignore short waits in g_poll (1.31 KB, patch)
2016-04-28 14:26 UTC, LRN
committed Details | Review

Description giacomopoz 2016-03-31 14:07:23 UTC
As described and fixed in https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg03572.html , current g_poll() Windows implementation doesn't properly handle small timeouts before calling MsgWaitForMultipleObjectsEx() leading to a whole CPU used just by g_poll().

This is 100% reproducible in HexChat application on Windows 8+ x64 (and x86), please check profiling data at https://github.com/hexchat/hexchat/issues/830#issuecomment-52079358 .

I'd suggest to contact a Qemu developer if needed to fix this issue that has been wasting CPU for years on Windows platform

Related issues ignored for years and still open:
- https://bugzilla.gnome.org/show_bug.cgi?id=675695
- https://bugzilla.gnome.org/show_bug.cgi?id=728486
Comment 1 LRN 2016-03-31 14:47:05 UTC
Could you please re-iterate the problem, as it applies to glib git master (or a recent release)? I've read, albeit briefly, the Qemu mailing list post linked above, and it's not easy to make sense of it, because it's a patched revert to a commit from somewhere that broke HDD I/O, and one of these (either a commit or its unpatched revert or something?) at some point caused high CPU load on gpolls with short timeout.

Is the problem in glib polling pollfds before going into a waiting state, when timeout it short (that is the most coherent description of some problem i've read there)?
Comment 2 giacomopoz 2016-03-31 18:03:52 UTC
http://git.qemu.org/?p=qemu.git;a=blob;f=util/oslib-win32.c;h=c926db4a5c44b4df5d8a6e99d155ce41bc547c65;hb=master#l513

"Modification for QEMU: replaced timeout >= 10 by timeout > 0."

- if (retval == 0 && (timeout == INFINITE || timeout > 10)) {

+ if (retval == 0 && (timeout == INFINITE || timeout > 0)) {
Comment 3 LRN 2016-04-01 01:14:21 UTC
That code was added back in 2008 by tml in commit 605682521b501912a8a0c2e66a24bca171ae0df1.

It never explains why the check for timeout > 10 is there. Or, rather, the comment does try to explain, but i don't buy that explanation. 10ms might seem like a short time, but *not* waiting even that long increases the chance that you're going to busyloop.
Comment 4 giacomopoz 2016-04-01 09:49:40 UTC
Maybe the idea behind was to always wait at least 10 ms even if a smaller timeout (but > 0) was specified ? To make sure it won't try to wait for 2 ms.

Some kind of std::max(10, timeout) for the > 0 case might be even better than QEMU change. I don't know the value of INFINITE and if it should be kept out of any std::max() call.
Comment 5 LRN 2016-04-01 10:58:21 UTC
What's wrong with waiting for 2ms?

I'm tempted to ask tml, but in my experience he doesn't remember why exactly did he do certain things 8 years ago.
Comment 6 Ignacio Casal Quinteiro (nacho) 2016-04-01 11:08:27 UTC
To be honest we should understand the problem and just try to fix it. I added Tor in CC but as you say this was 8 years ago, he will definitely not remember about this.
Comment 7 LRN 2016-04-05 07:30:43 UTC
No one seems to be interested in this, so here's my proposal:
Wait until glib 2.50 is branched, then, on my authority, push "timeout > 0" into master and see if anyone complains.
Comment 8 giacomopoz 2016-04-28 13:28:25 UTC
Could any glib dev confirm in the meantime the issue and that the Qemu patch fixes it ?
Comment 9 Ignacio Casal Quinteiro (nacho) 2016-04-28 13:30:19 UTC
So glib was branched yesterday. I think we should push the timeout > 0 to master now. LRN do you take care of it? Also can we try to make a small test case for this?
Comment 10 LRN 2016-04-28 14:26:01 UTC
Created attachment 326946 [details] [review]
W32: Do not ignore short waits in g_poll

This changes the timeout check as described above.

I'm a bit concerned with the fact that the code calls poll_rest() twice
(once with 0 timeout, once with non-0 timeout). Whether this is truly
useful or not depends on minute details of the behaviour of the W32 API
implementation, which i am not aware of, so i'll keep it as-is.
Comment 11 LRN 2016-04-28 14:26:53 UTC
Attachment 326946 [details] pushed as 210a979 - W32: Do not ignore short waits in g_poll
Comment 12 Jeremy Tan 2016-09-17 09:39:14 UTC
Ooh, I only noticed this now. I had created a separate bug because I thought the issue was in GDK (and I found a patch that worked): https://bugzilla.gnome.org/show_bug.cgi?id=771568