GNOME Bugzilla – Bug 771568
Holding down a modifier key results in 100% cpu usage
Last modified: 2018-02-17 18:49:08 UTC
Press and hold any modifier key - e.g. Ctrl, Shift or Alt. Watch CPU usage in task manager increase rapidly. I can confirm this bug is present both in GTK 3.20.9 and 2.24.28. E.g. program - Glade, or even gtk3-demo.
It would also be great if https://bugzilla.gnome.org/show_bug.cgi?id=602773 could be resolved
I can confirm this bug is also present in GIMP 2.8.18. I could also reproduce the issue using GDK only, which suggests it's a bug in GDK.
I also see https://github.com/hexchat/hexchat/issues/830 So I'm not sure if it's a bug in GDK or GLib.
I've done some more debugging and I'm still not sure if it's a glib bug or gdk bug. When a modifier key is pressed and held, MsgWaitForMultipleObjects (here: https://github.com/GNOME/glib/blob/master/glib/gpoll.c#L151) returns 1 i.e. message ready. But when you press and hold a normal key, it usually returns 258, e.g. WAIT_TIMEOUT. Pressing and holding a normal key usually results in it exiting here: https://github.com/GNOME/glib/blob/master/glib/gpoll.c#L200 But when pressing and holding a modifier key, it goes into a recursive call here: https://github.com/GNOME/glib/blob/master/glib/gpoll.c#L216 which then itself times out and returns zero. This then repeats over. Another nasty bug seems to be that if you press and release a normal key, and if no other events reach the window (leave the mouse pointer outside the window), the same thing will happen as if you held a modifier key - CPU usage will skyrocket until it receives another event like a mouse movement. Unfortunately this one's a bit harder to observe - for some reason, having the task manager open causes CPU usage to drop shortly afterwards (you can still see a spike in CPU usage before it drops off). However, if you instead use process explorer, you can see the elevated CPU usage.
Created attachment 335757 [details] [review] Patch On a hunch from https://blogs.msdn.microsoft.com/oldnewthing/20050217-00/?p=36423, I figured that _gdk_win32_display_queue_events should be processing all available events whenever it's called. The supplied patch seems to fix both the bugs outlined in the previous message.
Review of attachment 335757 [details] [review]: I will let LRN to double check this but the patch looks fine to me
I've done some further testing, and although it does seem to be the right fix (or at least somewhere in the line, PeekMessage with PM_REMOVE needs to get called if MsgWaitForMultipleObjectsEx signals that there are events waiting), it seems to negatively affect event compression. In other words, if you perform compute intensive actions in response to mouse move events, then those mouse events get backed up causing apparent lag. I'll see if I can think of a relevant example.
Try a different patch. Instead of removing the events check, put this call: > PeekMessageW (&msg, NULL, 0, 0, PM_NOREMOVE | PM_NOYIELD); at the very beginning of _gdk_win32_display_queue_events() It seems to be fixing things for me (as far as CPU usage goes), but i haven't looked into event compression at all.
I've done as suggested, but with that alone, I still get high CPU usage. e.g.: void _gdk_win32_display_queue_events (GdkDisplay *display) { MSG msg; PeekMessageW (&msg, NULL, 0, 0, PM_NOREMOVE | PM_NOYIELD); if (modal_win32_dialog != NULL) return; while (!_gdk_event_queue_find_first (display) && PeekMessageW (&msg, NULL, 0, 0, PM_REMOVE)) { TranslateMessage (&msg); DispatchMessageW (&msg); } }
I think I know why motion compression is broken - the key down events for when I'm holding shift get interspersed between the motion events, but the code in gdkevents.c expects them to be all contiguous for it to perform compression.
Oh - I should also mention, on X11, this (motion compression) isn't an issue, because when you press a modifier key, you only get one KEY_DOWN event and one KEY_UP event, whereas on Windows, you get autorepeated KEY_DOWN events for as long as you hold down a modifier.
Created attachment 335760 [details] [review] Patch to make motion compression work This patch works in tandem with my other patch. It allows motion compression to work if a modifier key has been held down (e.g. shift click/dragging) by not emitting autorepeated keydown events if a modifier key has been held down.
Does that mean that all GTK applications that rely on ctrl/alt/shift autorepeating events will stop working? As for attachment 335757 [details] [review], the reason why this loop is skipped when GDK events are queued is to give GDK events a priority over Windows messages and to prevent a flood of Windows messages from drowning GDK. Thus removing the event queue check does not seem like a good solution. Right now my reasoning about this bug is the following: 1) Something happens 2) MsgWaitForMultipleObjectsEx() returns WAIT_OBJECT_0 + nhandles, indicating that messages are available 3) glib calls ..._check() 4) ..._check() looks at events and also calls GetQueueStatus() to determine whether there's anything to do 5) glib calls ..._dispatch() 6) ..._dispatch() checks for events and, if they are available, immediately goes on to process them. If they are not available, it goes into PeekMessage+TranslateMessage+DispatchMessage loop. Let's say that events are available. 7) Events are handled and we go back to glib main loop 8) MsgWaitForMultipleObjectsEx() returns WAIT_OBJECT_0 + nhandles, indicating that messages are available, because PeekMessage() was not called 9) glib calls ..._check() 10)..._check() looks at events (none at the moment) and calls GetQueueStatus(), which says there's no messages 11) No events, no messages, so we go back to glib main loop 12) goto (8) The reason why this might be happening could be this (quoting MSDN): > WAIT_OBJECT_0 + nCount > New input of the type specified in the dwWakeMask parameter is available in > the thread's input queue. Functions such as PeekMessage, GetMessage, > GetQueueStatus, and WaitMessage mark messages in the queue as old messages. > Therefore, after you call one of these functions, a subsequent call to > MsgWaitForMultipleObjectsEx will not return until new input of the specified > type arrives. > This value is also returned upon the occurrence of a system event that > requires the thread's action, such as foreground activation. Therefore, > MsgWaitForMultipleObjectsEx can return even though no appropriate input is > available and even if dwWakeMask is set to 0. If this occurs, call GetMessage > or PeekMessage to process the system event before trying the call to > MsgWaitForMultipleObjectsEx again. My earlier suggestion tried to fix this by ensuring that at least one call to PeekMessage() happens in every ..._dispatch(). If that didn't work, then the above does not reflect the current state of affairs, but i don't know where my reasoning went wrong.
> Does that mean that all GTK applications that rely on ctrl/alt/shift autorepeating events will stop working? They shouldn't be depending on this behaviour anyway - X11 doesn't send autorepeats for these keys. > 4) ..._check() looks at events and also calls GetQueueStatus() to determine whether there's anything to do In my profiling in Visual Studio (I converted DWARF debugging info to pdb using cv2pdb), I found that actually, when MsgWaitForMultipleObjectsEx gets called, there was a path that it called _gdk_win32_window_procedure, which suggests that the message handler is getting invoked anyway. This is where the message gets added to the queue and so _gdk_win32_display_has_pending returns true because the queue is not empty. The gist I got from https://blogs.msdn.microsoft.com/oldnewthing/20050217-00/?p=36423 was that as look as PeekMessage returns TRUE, you shouldn't call MsgWaitForMultipleObjectsEx --------------- For lack of anything better to test the event compression behaviour, I present a debug copy of what I'm working on: https://sourceforge.net/projects/fontforgebuilds/files/patched-builds/FontForge-mingw-w64-i686-86c4cd-r1-771568.7z As it stands, it's unpatched, but there's a patched libgdk-3-0.dll in the base directory. If you cd into ReleasePackage/bin, you can run fontforge.exe directly. Sorry for the noise it generates; you can export GGDK_QUIET=1 to turn off debug messages. Once running, clicking on 'new' then double click on any one of the slots. This brings up an glyph editing window (similar to inkscape). Ctrl+click on the hand tool to assign it as the Ctrl+click action. Then ctrl+click drag the canvas. On unpatched GDK, it drags fairly smoothly,but CPU usage is very high. If you patch it but don't get rid of autorepeated events, continued dragging and then stopping dragging will result in it continuing on for a little while. With the patched dll it's both not laggy and CPU usage remains low (or more reasonable).
Review of attachment 335757 [details] [review]: I give up. I can't find a logical explanation of why MsgWaitForMultipleObjectsEx() would return immediately when called with (supposedly) non-empty message queue, and why would GetQueueStatus() claim that the queue is empty at the same time. But this patch works, so let's push it and see how things fall afterwards.
Review of attachment 335760 [details] [review]: It bugs me a bit that we're basically hardcoding modifier keys, but i currently see no alternatives.
> GDK events are queued is to give GDK events a priority over Windows messages > and to prevent a flood of Windows messages from drowning GDK. I'll just start with saying that after some further thought, the patch should be a non-issue. If the message handler behaves properly, it should be appending messages to the event queue. So even if there were GDK specific events, they would be processed first. Furthermore, if you have Windows messages backing up, you're going to have more problems than just GDK specific events being displaced. --------- However: > I can't find a logical explanation of why MsgWaitForMultipleObjectsEx() would > return immediately when called with (supposedly) non-empty message queue, and > why would GetQueueStatus() claim that the queue is empty at the same time. I gave this some more thought, in conjunction with this: > This value is also returned upon the occurrence of a system event that > requires the thread's action, such as foreground activation. Therefore, > MsgWaitForMultipleObjectsEx can return even though no appropriate input is > available and even if dwWakeMask is set to 0. If this occurs, call GetMessage > or PeekMessage to process the system event before trying the call to > MsgWaitForMultipleObjectsEx again. GetQueueStatus is correct - there is no present message *for this thread* - but as indicated by MsgWaitForMultipleObjectsEx, there is (I guess) a *system event* that needs to be processed by PeekMessage. So counter-intuitively, the following check also resolves the high CPU usage while limiting the number of win32 messages processed: > while ((!_gdk_event_queue_find_first (display) || !GetQueueStatus(QS_ALLINPUT)) && > PeekMessageW (&msg, NULL, 0, 0, PM_REMOVE)) This still needs to be used in conjunction with the other patch to ensure that motion compression works.
Ah, except if we somehow end up in the situation that GetQueueStatus returns true, it's now marked any new messages as read. So MsgWaitForMultipleObjectsEx could potentially deadlock. Ugh. Unless you set MWMO_INPUTAVAILABLE on MsgWaitForMultipleObjectsEx? Actually, is it always called with a finite timeout? Considering all that starts to get messy. I still think my original patch will be fine.
Was this patch not backported to Gtk2?
Does GTK2 need this patch? I.e. is it affected by this bug?
(In reply to LRN from comment #20) > Does GTK2 need this patch? I.e. is it affected by this bug? Here is a related bug report: https://github.com/hexchat/hexchat/issues/830 The author of the patch commented there mentioning it was related.
Oh and comment #1 even says it applies to 2.24.28.
@LRN Just a reminder about backporting this to Gtk2.
Backported both patches to the branch gtk-2-24 commit bfdac2f70e005b2504cc3f4ebbdab328974d005a commit 61162225f712df648f38fd12bc0817cfa9f79a64