GNOME Bugzilla – Bug 167358
display->last_focus_time is not the same as max(window->net_wm_user_time)
Last modified: 2005-02-20 17:20:40 UTC
Please describe the problem: Selecting a window from the menu of a grouped task on the tasklist doesn't bring the window to front. Steps to reproduce: 1. start glade-2, open multiple windows 2. move all windows behind other windows 3. click on the grouped task in the tasklist and select one of the window Actual results: the tasklist button gets bold as if the window had gotten focus, but lost it due to some other event. Expected results: The window should pop to front Does this happen every time? yes. Other information:
This happens for me as well, it just happened with some grouped Firefox windows.
I can't duplicate. What version of metacity and libwnck (i.e. what day did you pull them from CVS HEAD)?
It's working for me too.
I build all of gnome from cvs yesterday. These are the ChangeLog heads: metacity: 2005-02-12 Elijah Newren <newren@gmail.com> Raise the ancestor of a window instead of the window itself. Fixes #166894. * src/window.c: (find_root_ancestor): new function, (meta_window_raise): get the ancestor of the given window and raise it if possible instead of the window libwnk: 2005-02-12 Elijah Newren <newren@gmail.com> Check for whether a window or one of its transients has the demands attention hint set. Fixes #166713. Also, i use sloppy focus, that might be involved.
I can't duplicate using sloppy or mouse focus modes, but there aren't any new patches since the ones you point out... Could you apply the patch at http://www.gnome.org/~newren/temp/debug-focus.patch, rebuild metacity, run 'metacity --replace' in a terminal, click on all windows on your workspace at least once, then do whatever you need to trigger this bug, and then post all the terminal output here? It may also be helpful if you could screenshot how your windows are setup. Finally, a few more questions: after you click on the grouped window icon, is there a mouse under your window? Are you quickly moving the mouse to where a window is?
Created attachment 37526 [details] Window layout before
Created attachment 37527 [details] Window layout after selecting the "Glade: <untitled>" window in the group menu
Created attachment 37528 [details] Metacity debug log
With the window layout visible in the before screenshot i launched metacity --replace, clicked on all windows, then selected the "Glade: <untitled>" window in the glade group in the tasklist. The only thing that happened was the group going bold (see after screenshot). Then i killed metacity, the log from it is attached. When I click on the window in the group menu, there is a window right under the menu. If there isn't one, this doesn't happen. And, in fact, that window becomes focused when the menu disappears.
> When I click on the window in the group menu, there is a window right under the > menu. If there isn't one, this doesn't happen Ah, yes, I can duplicate. And the log shows that it was due to my braindeaded-ness in treating display->last_focus_time like max(window->net_wm_user_time), which came from one of the patches in bug 166395... Also, the fact that you've had your xserver running for over 30 days shows that I had a bug in my debugging code--I should have used %lu in one place where I used %d. Thanks for the screenshots, debugging log, and extra info. :)
Elijah: After your commit yesterday, I'm getting this warning when I compile metacity from HEAD... display.c: In function `event_callback': display.c:1568: warning: long unsigned int format, guint32 arg (arg 4) (Using G_GUINT32_FORMAT in place of lu gets rid of it... )
Richard: Thanks, I'll fix that in just a minute... Okay, so working on this bug lead to uncovering some other issues and/or reminded me of corner-case bugs that I had come across recently but hadn't filed a report for. In particular, we currently have the following problems: 1) We unconditionally trust timestamps sent for messages such as _NET_ACTIVE_WINDOW and such. This can result in window->net_wm_user_time and display_last_focus_time being set to some point days in the future--resulting in all new focus requests (e.g. user clicking on a window) being ignored until the xserver time caught up to the buggy time we had previously been passed. I haven't seen this in the wild, but while trying to write the patch for bug 161361 I accidentally triggered this with one of my preliminary libwnck patches). We need a way of sanity checking timestamps so that we can correct/workaround this issue. 2) All of our code to update user_time was buggy because it shouldn't be done if we've already processed a request for another update with a later timestamp. I haven't seen this in the wild yet, but it could bite us later and would be a subtle and painful bug to track down. 3) The XSERVER_TIME_IS_BEFORE macro did not correctly handle timestamps of 0 if the timestamps were more than halfway towards the wraparound point (i.e. if the xserver had been up for over 25 days or so). I don't know if this bug would ever affect us since it's really rare, but it would be a pain to try to debug and replicate unless the user provided really good duplication steps. And some other issues that aren't technically bugs but which would really make things cleaner and/or more helpful: 4) Verbose logs from users wouldn't be very useful because they didn't mention the relevant windows and the timestamps involved. 5) Updating user_time wasn't done consistently/correctly so it makes sense to abstract it into a separate function (especially with the extra checking code needed to fix issue(2) above) 6) Both meta_display_set_input_focus_window() and meta_display_focus_the_no_focus_window() were using nearly identical code for checking timestamps. Since those sections needed to be changed and extended due to both this bug report and (4), it makes sense to put it in a separate function. I'll attach a patch in a minute that solves the bug reported here and all these issues. If you really want me to break into separate patches I can do that, but many of these issues are fairly intertwined so the patches would not be independent of each other for the most part...
Created attachment 37688 [details] [review] Fix lots of Metacity timestamp issues
Comment on attachment 37688 [details] [review] Fix lots of Metacity timestamp issues Why do you need the "fudge factor" in sanity_check_timestamps, I think if a client gets event A with timestamp 1 and sends _NET_ACTIVE_WINDOW with timestamp 1, then if metacity gets event B after the _NET_ACTIVE_WINDOW, event B is guaranteed to have a timestamp greater than event A. This is because any event with timestamp before event A would have been sent to metacity prior to the _NET_ACTIVE_WINDOW. Therefore a _NET_ACTIVE_WINDOW with a timestamp greater than event B's timestamp was broken. Does it not work? I think sanity_check_timestamps() would be clearer if the timestamp param were called known_good_timestamp (assuming I understand the code properly) timestamp_too_old is confusing, since it has this side effect of changing the timestamp sometimes and I don't quite understand that case. The comment above last_focus_time I think would be more useful if it said whether it's the last timestamp metacity passed to SetInputFocus or the last timestamp from a focus event. Obviously I don't feel I fully understand this stuff ;-)
Why do we get to assume there's a strictly chronological sequence of events reported to Metacity from the xserver? Given the asynchronousness of how the xserver reports events, I was assuming the following could happen: 1) User clicks on some window with timestamp A 2) Shortly later, somehow a _NET_ACTIVE_WINDOW message is sent with timestamp B, where B>A 3) Metacity receives the _NET_ACTIVE_WINDOW message and processes it 4) Metacity receives the ButtonPress event and processes it. If this can happen, then the ButtonPress event does not mean that the _NET_ACTIVE_WINDOW message had an invalid timestamp. The fudge factor is an attempt to account for cases like this. I can easily rename to known_good_timestamp; that sounds like a good idea. There are two cases where timestamp_too_old can change the timestamp passed. One is if CurrentTime is passed, in which case we need to get a real timestamp. I think that case makes sense to you because we've been doing that for a while. The other case comes from this bug report, and is due to the fact that focusing should depend on timestamps from _NET_WM_USER_TIME, but windows can be focused without the user interacting with the window (in sloppy and mouse focus modes only). To be precise, let me provide an example with full details about what would go on if this timestamp weren't modified for this case: 1) User clicks on some button in the tasklist 2) User moves their mouse around while waiting, crossing some windows in their path 3) Metacity catches the EnterNotify events from moving the mouse around and thus focuses the relevant window. 4) Metacity recieves the _NET_ACTIVE_WINDOW message from step (1). 5) Metacity notices that the new _NET_ACTIVE_WINDOW message has a timestamp greater than that of any other window's _NET_WM_USER_TIME 6) Therefore, Metacity decides that the window corresponding to the _NET_ACTIVE_WINDOW request should be allowed to take focus. 7) Metacity tries to focus the window with the timestamp from the _NET_ACTIVE_WINDOW message. 8) The xserver ignores the XSetInputFocus request because the timestamp is less than the last focus time. This is the core thing that this bug is about (except that, because of the special setup, Alex actually didn't move the mouse at all after clicking). The fix is that if we decide that a window should be focused, we have to make the timestamp be at least as great as display->last_focus_time. We don't get timestamps from FocusIn/FocusOut events. ;-) Since that's easy to forget, you're probably right that it makes sense to add a clarifying comment that it is the last timestamp Metacity passed to XSetInputFocus. Does that clear up the parts that you don't understand? If not, what's still unclear?
How did I botch the summary so bad? My summary is totally false; let me fix that...
X server events are always ascending in time AFAIK (except for wraparound). I haven't tracked down where the spec says this but I think it's right. This should not be able to happen, I'll try to explain why: 1) User clicks on some window with timestamp A 2) Shortly later, somehow a _NET_ACTIVE_WINDOW message is sent with timestamp B, where B>A -> Where did timestamp B come from? It had to come from an event received by the application that owns the window. If B>A, this event was *after* the button press. So _NET_ACTIVE_WINDOW gets sent after the app receives the button press, unless the app processes its event queue out of order. Even if the app rearranges its event queue, we know that _NET_ACTIVE_WINDOW is sent after the button press event was queued. 3) Metacity receives the _NET_ACTIVE_WINDOW message and processes it 4) Metacity receives the ButtonPress event and processes it. -> The ButtonPress will have been placed in metacity's event queue at the same time it was placed in the app's event queue. Thus, since we've shown that _NET_ACTIVE_WINDOW was sent after the button press was queued, _NET_ACTIVE_WINDOW must be in metacity's event queue *after* the ButtonPress. This seems rational to me? If you don't see a problem with it I'd try writing code that makes this assumption and let's see if anything breaks it. Thanks for the explanation on this bug, I get it now I think. Possible issue: one purpose of the timestamp in _NET_ACTIVE_WINDOW is to be sure the right _NET_ACTIVE_WINDOW "wins" if there are two. By promoting one of the _NET_ACTIVE_WINDOW to a newer time, won't we break this? i.e. we really only want to override intervening focuses that came from enter/leave events, maybe. Do we want to override *any* intervening focus no matter what the origin? Another way to phrase this, say: - taskbar button is clicked, sending _NET_ACTIVE_WINDOW - random app sends _NET_ACTIVE_WINDOW for random window - metacity will now focus an undefined window (there's a race) Before it was defined that the one with the later timestamp would win. This specific example seems far-fetched and irrelevant. But maybe there are real-life examples?
Sounds fine to me; making that assumption merely means nuking the fudge factor. I'm pretty sure there would be real-life examples that match the scenario you propose, but it doesn't cause any problems. Behavior is still defined such that the one with the later timestamp will win. The timestamp that timestamp_too_old() modifies is a local variable of either meta_display_set_input_focus_window() or meta_display_focus_the_no_focus_window(). In other words, window->net_wm_user_time is not modified for any window and neither is display->last_user_time or even display->last_focus_time; only the timestamp passed to XSetInputFocus() is modified. Timestamps from _NET_ACTIVE_WINDOW are only compared against net_wm_user_times of other windows, meaning that there is no such race condition. I'll attach a new patch in a few minutes incorporating all the changes you've suggested so far...
Created attachment 37695 [details] [review] Incorporate suggested changes This patch is different from the previous one due to the following changes: - rename argument of sanity_check_timestamps to known_good_timestamp - assume chronological ordering of reported events (i.e. remove the fudge factor from sanity_check_timestamps) - clarify use of display->last_focus_window - add a clarifying comment in display.h to meta_display_set_input_focus_window
Comment on attachment 37695 [details] [review] Incorporate suggested changes There are a couple things that look like they won't compile but with those fixed go ahead and commit. Sorry for making you do the long explanation but I wanted to be sure I understood. Thanks for the help.
Oops, yeah, I didn't quite fully remove the fudge factor.. :-/ No worries; I was almost certain I'd have to give a long explanation considering how many things this bug encompassed: 2005-02-20 Elijah Newren <newren@gmail.com> Big patch to cover about 6 different issues in order to correct rare problems with timestamps (make sure window selected in tasklist actually gets focus, sanity check timestamps to avoid rogue apps hosing the system, correct the updating of net_wm_user_time, correctly handle timestamps of 0 when comparing xserver timestamps for those who have had their systems up for over 25 days or so, add some debugging information to verbose logs, some code cleanups). Fixes all issues listed in #167358. * src/display.h: (struct _MetaDisplay): clarify comment on last_focus_time, introduce a new variable--last_user_time, (XSERVER_TIME_IS_BEFORE macro): put this functionality into a separate macro and then introduce a new macro with this name that uses the old one but adds additional special-case checks for timestamps that are 0, (comment to meta_display_set_input_focus_window): add information about how last_user_time should be used in this function * src/display.c (santiy_check_timestamps): new function, (meta_display_open): intialize display->last_user_time, (meta_display_get_current_time_roundtrip): use the timestamp, which is known to be good, in order to sanity_check_timestamps, (event_callback): use the new meta_window_ste_user_time() function in order to correct problems, use the timestamp of KeyPress and ButtonPress events, which are known to be good, in order to sanity_check_timestamps, (timestamp_too_old): new function for common behavior of meta_display_focus_the_no_focus_window and meta_display_set_input_focus_window, with added checking for display->last_user_time in addition to display->last_focus_time, (meta_display_set_input_focus_window): replace some of the code with a call to timestamp_too_old(), (meta_display_focus_the_no_focus_window): replace some of th ecode with a call to timestamp_too_old() * src/window.h: (meta_window_set_user_time): new function to abstract the many things that need to be done when updating the net_wm_user_time of any window * src/window.c: (meta_window_activate): add debugging spew, make sure the comparison is made with last_user_time NOT last_focus_time, use meta_window_set_user_time() function in order to correct problems, (meta_window_client_message): add a newline to a debugging message to make them easier to read, (meta_window_set_user_time): new function * src/window-props.c (reload_net_wm_user_time): use the new meta_window_ste_user_time() function in order to correct problems Committed now, though.