GNOME Bugzilla – Bug 764415
Very High CPU usage in g_poll() Windows implementation
Last modified: 2016-09-17 09:39:14 UTC
As described and fixed in https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg03572.html , current g_poll() Windows implementation doesn't properly handle small timeouts before calling MsgWaitForMultipleObjectsEx() leading to a whole CPU used just by g_poll(). This is 100% reproducible in HexChat application on Windows 8+ x64 (and x86), please check profiling data at https://github.com/hexchat/hexchat/issues/830#issuecomment-52079358 . I'd suggest to contact a Qemu developer if needed to fix this issue that has been wasting CPU for years on Windows platform Related issues ignored for years and still open: - https://bugzilla.gnome.org/show_bug.cgi?id=675695 - https://bugzilla.gnome.org/show_bug.cgi?id=728486
Could you please re-iterate the problem, as it applies to glib git master (or a recent release)? I've read, albeit briefly, the Qemu mailing list post linked above, and it's not easy to make sense of it, because it's a patched revert to a commit from somewhere that broke HDD I/O, and one of these (either a commit or its unpatched revert or something?) at some point caused high CPU load on gpolls with short timeout. Is the problem in glib polling pollfds before going into a waiting state, when timeout it short (that is the most coherent description of some problem i've read there)?
http://git.qemu.org/?p=qemu.git;a=blob;f=util/oslib-win32.c;h=c926db4a5c44b4df5d8a6e99d155ce41bc547c65;hb=master#l513 "Modification for QEMU: replaced timeout >= 10 by timeout > 0." - if (retval == 0 && (timeout == INFINITE || timeout > 10)) { + if (retval == 0 && (timeout == INFINITE || timeout > 0)) {
That code was added back in 2008 by tml in commit 605682521b501912a8a0c2e66a24bca171ae0df1. It never explains why the check for timeout > 10 is there. Or, rather, the comment does try to explain, but i don't buy that explanation. 10ms might seem like a short time, but *not* waiting even that long increases the chance that you're going to busyloop.
Maybe the idea behind was to always wait at least 10 ms even if a smaller timeout (but > 0) was specified ? To make sure it won't try to wait for 2 ms. Some kind of std::max(10, timeout) for the > 0 case might be even better than QEMU change. I don't know the value of INFINITE and if it should be kept out of any std::max() call.
What's wrong with waiting for 2ms? I'm tempted to ask tml, but in my experience he doesn't remember why exactly did he do certain things 8 years ago.
To be honest we should understand the problem and just try to fix it. I added Tor in CC but as you say this was 8 years ago, he will definitely not remember about this.
No one seems to be interested in this, so here's my proposal: Wait until glib 2.50 is branched, then, on my authority, push "timeout > 0" into master and see if anyone complains.
Could any glib dev confirm in the meantime the issue and that the Qemu patch fixes it ?
So glib was branched yesterday. I think we should push the timeout > 0 to master now. LRN do you take care of it? Also can we try to make a small test case for this?
Created attachment 326946 [details] [review] W32: Do not ignore short waits in g_poll This changes the timeout check as described above. I'm a bit concerned with the fact that the code calls poll_rest() twice (once with 0 timeout, once with non-0 timeout). Whether this is truly useful or not depends on minute details of the behaviour of the W32 API implementation, which i am not aware of, so i'll keep it as-is.
Attachment 326946 [details] pushed as 210a979 - W32: Do not ignore short waits in g_poll
Ooh, I only noticed this now. I had created a separate bug because I thought the issue was in GDK (and I found a patch that worked): https://bugzilla.gnome.org/show_bug.cgi?id=771568