GNOME Bugzilla – Bug 425899
Wild clock shifts cause mouse clicks to stay as "move" (implement _NET_WM_MOVERESIZE_CANCEL?)
Last modified: 2010-05-19 13:32:09 UTC
Using: RHEL 5 Client (Xorg 7.1, GNOME 2.16) + metacity 2.16.5 from GNOME.org. Behavior: Clicking on a titlebar makes the move cursor appear and stay on (even if it's a single click, not a drag) but the window does not move (even if it's a legitimate drag instead of a single-click). This may be related to a badly configured NTP, which can cause repeated clock swings of up 800+ ms once a minute. If this happens in proximity to (or possibly during) a window being moved via DnD, Metacity seems to loose it's ability to properly recognize what the user is trying to do from then on. I haven't done the best-practice thing and test it in 2.18, nor have I tried to reproduce it with a script that shifts the clock around.
This issue occurred again, after NTP was fixed and there was this message in the xsession-errors (possibly unrelated): "Window manager warning: Received a _NET_WM_MOVERESIZE message for 0x3400003 (Configurat); these messages lack timestamps and therefore suck." The only applications running at the time were the GNOME panel and a custom Java application running under JPackage RPMS of JRE 1.5.0_11 on RHEL5.
Does this Java application draw its own window frame/decorations/whatever-you-want-to-call-it? (You can usually tell because the window's frame doesn't match the metacity theme on other windows, such as you see with XMMS or Xine) _NET_WM_MOVERESIZE is used by clients who try to draw their own decorations (i.e. crappy, sucky apps) but want the window manager to still control moving their windows around when the user clicks on those decorations. That warning message was added by me, because I was trying to fix race conditions with timestamps and simply couldn't do so in this case. But, timestamps aren't the race condition possible with those messages. Check out this other comment in the code that I think was written by Havoc, who from context was thinking about something other than timestamps: /* The race conditions in this _NET_WM_MOVERESIZE thing * are mind-boggling */ So, if you could verify whether this is happening when clicking on such window frames (or whether you can duplicate it when clicking on a normal window, such as gedit or gnome-terminal), that would help. The only other bug report we have in here that I can think of being similar is bug 304430, which we recently may have fixed. It'd be nice if you could try cvs head (should build under GNOME 2.16.5 just fine if you have e.g. gtk+ development libraries from RH installed).
This may be related: quoting wm-spec 1.4draft2 http://standards.freedesktop.org/wm-spec/latest/ar01s04.html#id2526932 #define _NET_WM_MOVERESIZE_CANCEL 11 /* cancel operation */ The Client MUST release all grabs prior to sending such message (except for the _NET_WM_MOVERESIZE_CANCEL message). The Window Manager can use the button field to determine the events on which it terminates the operation initiated by the _NET_WM_MOVERESIZE message. Since there is a race condition between a client sending the _NET_WM_MOVERESIZE message and the user releasing the button, Window Managers are advised to offer some other means to terminate the operation, e.g. by pressing the ESC key. The special value _NET_WM_MOVERESIZE_CANCEL also allows clients to cancel the operation by sending such message if they detect the release themselves (clients should send it if they get the button release after sending the move resize message, indicating that the WM did not get a grab in time to get the release).
Also potentially related http://bugs.kde.org/show_bug.cgi?id=101468 I tried to reporoduce the bug with jittering the clock back and forth and wasn't able to do so, it may be more related to high load than the clock jitter.
Sorry to spam, but here's more, _NET_WM_MOVERESIZE_CANCEL comments by Havoc Pennington http://osdir.com/ml/gnome.wm-spec/2005-12/msg00003.html metacity doesn't have any mention of _NET_WM_MOVERESIZE_CANCEL, it's not definied in src/window.c up to 2.19.2 where the rest of 'em are defined: #define _NET_WM_MOVERESIZE_SIZE_TOPLEFT 0 #define _NET_WM_MOVERESIZE_SIZE_TOP 1 #define _NET_WM_MOVERESIZE_SIZE_TOPRIGHT 2 #define _NET_WM_MOVERESIZE_SIZE_RIGHT 3 #define _NET_WM_MOVERESIZE_SIZE_BOTTOMRIGHT 4 #define _NET_WM_MOVERESIZE_SIZE_BOTTOM 5 #define _NET_WM_MOVERESIZE_SIZE_BOTTOMLEFT 6 #define _NET_WM_MOVERESIZE_SIZE_LEFT 7 #define _NET_WM_MOVERESIZE_MOVE 8 #define _NET_WM_MOVERESIZE_SIZE_KEYBOARD 9 #define _NET_WM_MOVERESIZE_MOVE_KEYBOARD 10
Oh wow, I somehow missed or forgot those wm-spec-list discussions about _NET_WM_MOVERESIZE_CANCEL. If fixing this just means implementing _NET_WM_MOVERESIZE_CANCEL, it shouldn't be too hard at all. Of course, we're assuming that it is a _NET_WM_MOVERESIZE related problem. While that certainly sounds likely right now, it'd be nice to verify. James?
The custom Java application that we run on these workstations does not draw it's own decorations, but it does create multiple windows using JFC/Swing and then resizes them at startup.
That sounds different. _NET_WM_MOVERESIZE is used by an app to ask the window manager to start a _user-involved_ moving/resizing action (usually because the user clicked on the application's pseudo-frame); the action won't end until the user releases the mouse button (or a _NET_WM_MOVERESIZE_CANCEL is sent, but that isn't supported in metacity yet). If an app is going to resize a bunch of windows at startup, it'd be more likely to use ConfigureRequest events or _NET_MOVERESIZE_WINDOW (both of which specify to move/resize a window to a given final configuration without user involvement). So, just so I understand, are you two coworkers who are both working on this same problem, or are you two just independent users who have run into issues that look the same? (If the latter, perhaps there are two different problems involved...)
We are working on the same problem.
Okay, can you run metacity --replace in a terminal and then try to duplicate? If you see the warning Window manager warning: Received a _NET_WM_MOVERESIZE message for 0x<some-number> (<some-window-name>); these messages lack timestamps and therefore suck. at the same time you trigger the bug, then we can conclude it's _NET_WM_MOVERESIZE related. Otherwise, we'll need to look elsewhere.
I was able to reproduce the problem, or at least portion of it, on our dual-core 64-bit RHEL5 w/ upgraded metacity 2.16.5 workstation, however I do not see any _NET_WM_MOVERESIZE warnings in .xessions-errors I ran 2 instances of "cat /dev/urandom >/dev/null" as well as a shell script to jump the time back and worth (see attached jitter.sh) and ran the workstation sitting overnight. This morning, no window had focus, no mouse events were having any effect. There were a lot of warnings regarding inaccurate timestamps, as expected, such as: Window manager warning: last_user_time (3950369695) is greater than comparison timestamp (3950358644). This most likely represents a buggy client sending inaccurate timestamps in messages such as _NET_ACTIVE_WINDOW. Trying to work around... Window manager warning: 0x321bdff (username00) appears to be one of the offending windows with a timestamp of 3950369695. Working around... Keyboard still worked, as I have <Alt>X keyboard shortcut mapped to open gnome-terminal. New window came up with focus, but cursor was hollowed out and no mouse events or key pressed were working. I could not switch focus with the mouse, however, I did switch focus with keyboard (<Alt>Tab) and everything went back to normal.
Created attachment 86424 [details] script to jump the back back and forth
I should point out that this may not be exactly the same problem as initially reported, but perhaps it's related.
Note: the cursor from comment 11 was the character-position block drawn by g-t, not the mouse cursor.
Original problem happened twice this morning. Since the application that's running on this workstation is critical and we have to give user control of it ASAP, we can not generally sit around and get a lot of information out of the situation. I did collect a brief strace of metacity after it occured, and can email it on request. Any suggestings on what information to collect when this occures? There is a lot of pressure to switch the window manager, so we may not be able to help troubleshoot this in the future.
Created attachment 86952 [details] [review] Add support for _NET_WM_MOVERESIZE_CANCEL While I'm not sure if this is _NET_WM_MOVERESIZE_CANCEL related or not, it could be and it was pretty easy to add support for this, so I cooked up a patch. Don't have a good program around to test with. I guess I could cook one up, but I'm starting to run short on time...
The issue is still occuring on 2.18.2 with the patch from attachment 86952 [details] [review]. We experienced the issue again with meta_topic debugging enabled, I'll mail the end of the log privately after the window titles have been renamed.
Patch looks sane; I'll write a veracity test for it later, but for now it's small enough I'm just putting it into trunk. http://svn.gnome.org/viewvc/metacity?rev=4088&view=rev James: is this still a problem for you?
We switched x86_64 to i386 shortly after the last message, and the issue has only repeated itself once more since then (have not updated since original report). So the bug is still in there, but masked nearly perfectly when running i386.
James: sorry to reply after so long, but did the problem re-occur on i386 before you used a version of Metacity with the above patch included? I'm closing this as FIXED for now, but if it did re-occur, please re-open the bug.