Bug 764415 – Very High CPU usage in g_poll() Windows implementation

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 764415 - Very High CPU usage in g_poll() Windows implementation


Summary:	Very High CPU usage in g_poll() Windows implementation


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	win32
Version:	2.46.x
Hardware:	Other Windows

Importance:	Normal major
Target Milestone:	---
Assigned To:	gtk-win32 maintainers
QA Contact:	gtk-win32 maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-03-31 14:07 UTC by giacomopoz
Modified:	2016-09-17 09:39 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
W32: Do not ignore short waits in g_poll (1.31 KB, patch) 2016-04-28 14:26 UTC, LRN	committed	Details \| Review

Description giacomopoz 2016-03-31 14:07:23 UTC

As described and fixed in https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg03572.html , current g_poll() Windows implementation doesn't properly handle small timeouts before calling MsgWaitForMultipleObjectsEx() leading to a whole CPU used just by g_poll().

This is 100% reproducible in HexChat application on Windows 8+ x64 (and x86), please check profiling data at https://github.com/hexchat/hexchat/issues/830#issuecomment-52079358 .

I'd suggest to contact a Qemu developer if needed to fix this issue that has been wasting CPU for years on Windows platform

Related issues ignored for years and still open:
- https://bugzilla.gnome.org/show_bug.cgi?id=675695
- https://bugzilla.gnome.org/show_bug.cgi?id=728486

Comment 1 LRN 2016-03-31 14:47:05 UTC

Could you please re-iterate the problem, as it applies to glib git master (or a recent release)? I've read, albeit briefly, the Qemu mailing list post linked above, and it's not easy to make sense of it, because it's a patched revert to a commit from somewhere that broke HDD I/O, and one of these (either a commit or its unpatched revert or something?) at some point caused high CPU load on gpolls with short timeout.

Is the problem in glib polling pollfds before going into a waiting state, when timeout it short (that is the most coherent description of some problem i've read there)?

Comment 2 giacomopoz 2016-03-31 18:03:52 UTC

http://git.qemu.org/?p=qemu.git;a=blob;f=util/oslib-win32.c;h=c926db4a5c44b4df5d8a6e99d155ce41bc547c65;hb=master#l513

"Modification for QEMU: replaced timeout >= 10 by timeout > 0."

- if (retval == 0 && (timeout == INFINITE || timeout > 10)) {

+ if (retval == 0 && (timeout == INFINITE || timeout > 0)) {

Comment 3 LRN 2016-04-01 01:14:21 UTC

That code was added back in 2008 by tml in commit 605682521b501912a8a0c2e66a24bca171ae0df1.

It never explains why the check for timeout > 10 is there. Or, rather, the comment does try to explain, but i don't buy that explanation. 10ms might seem like a short time, but *not* waiting even that long increases the chance that you're going to busyloop.

Comment 4 giacomopoz 2016-04-01 09:49:40 UTC

Maybe the idea behind was to always wait at least 10 ms even if a smaller timeout (but > 0) was specified ? To make sure it won't try to wait for 2 ms.

Some kind of std::max(10, timeout) for the > 0 case might be even better than QEMU change. I don't know the value of INFINITE and if it should be kept out of any std::max() call.

Comment 5 LRN 2016-04-01 10:58:21 UTC

What's wrong with waiting for 2ms?

I'm tempted to ask tml, but in my experience he doesn't remember why exactly did he do certain things 8 years ago.

Comment 6 Ignacio Casal Quinteiro (nacho) 2016-04-01 11:08:27 UTC

To be honest we should understand the problem and just try to fix it. I added Tor in CC but as you say this was 8 years ago, he will definitely not remember about this.

Comment 7 LRN 2016-04-05 07:30:43 UTC

No one seems to be interested in this, so here's my proposal:
Wait until glib 2.50 is branched, then, on my authority, push "timeout > 0" into master and see if anyone complains.

Comment 8 giacomopoz 2016-04-28 13:28:25 UTC

Could any glib dev confirm in the meantime the issue and that the Qemu patch fixes it ?

Comment 9 Ignacio Casal Quinteiro (nacho) 2016-04-28 13:30:19 UTC

So glib was branched yesterday. I think we should push the timeout > 0 to master now. LRN do you take care of it? Also can we try to make a small test case for this?

Comment 10 LRN 2016-04-28 14:26:01 UTC

Created attachment 326946 [details] [review]
W32: Do not ignore short waits in g_poll

This changes the timeout check as described above.

I'm a bit concerned with the fact that the code calls poll_rest() twice
(once with 0 timeout, once with non-0 timeout). Whether this is truly
useful or not depends on minute details of the behaviour of the W32 API
implementation, which i am not aware of, so i'll keep it as-is.

Comment 11 LRN 2016-04-28 14:26:53 UTC

Attachment 326946 [details] pushed as 210a979 - W32: Do not ignore short waits in g_poll

Comment 12 Jeremy Tan 2016-09-17 09:39:14 UTC

Ooh, I only noticed this now. I had created a separate bug because I thought the issue was in GDK (and I found a patch that worked): https://bugzilla.gnome.org/show_bug.cgi?id=771568