After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 167358 - display->last_focus_time is not the same as max(window->net_wm_user_time)
display->last_focus_time is not the same as max(window->net_wm_user_time)
Status: RESOLVED FIXED
Product: metacity
Classification: Other
Component: general
trunk
Other All
: High major
: 2.10.x
Assigned To: Metacity maintainers list
Metacity maintainers list
Depends on:
Blocks: 155450
 
 
Reported: 2005-02-14 15:21 UTC by Alexander Larsson
Modified: 2005-02-20 17:20 UTC
See Also:
GNOME target: ---
GNOME version: 2.9/2.10


Attachments
Window layout before (175.93 KB, image/png)
2005-02-16 08:30 UTC, Alexander Larsson
  Details
Window layout after selecting the "Glade: <untitled>" window in the group menu (176.13 KB, image/png)
2005-02-16 08:32 UTC, Alexander Larsson
  Details
Metacity debug log (7.72 KB, text/plain)
2005-02-16 08:32 UTC, Alexander Larsson
  Details
Fix lots of Metacity timestamp issues (14.12 KB, patch)
2005-02-19 20:39 UTC, Elijah Newren
none Details | Review
Incorporate suggested changes (15.01 KB, patch)
2005-02-20 04:31 UTC, Elijah Newren
accepted-commit_now Details | Review

Description Alexander Larsson 2005-02-14 15:21:21 UTC
Please describe the problem:
Selecting a window from the menu of a grouped task on the tasklist doesn't bring
the window to front.

Steps to reproduce:
1. start glade-2, open multiple windows
2. move all windows behind other windows
3. click on the grouped task in the tasklist and select one of the window


Actual results:
the tasklist button gets bold as if the window had gotten focus, but lost it due
to some other event.

Expected results:
The window should pop to front

Does this happen every time?
yes.

Other information:
Comment 1 Richard Hult 2005-02-14 15:30:21 UTC
This happens for me as well, it just happened with some grouped Firefox windows.
Comment 2 Elijah Newren 2005-02-14 18:30:12 UTC
I can't duplicate.  What version of metacity and libwnck (i.e. what day did you
pull them from CVS HEAD)?
Comment 3 Vincent Untz 2005-02-14 18:35:18 UTC
It's working for me too.
Comment 4 Alexander Larsson 2005-02-15 07:53:38 UTC
I build all of gnome from cvs yesterday.
These are the ChangeLog heads:

metacity:
2005-02-12  Elijah Newren  <newren@gmail.com>

        Raise the ancestor of a window instead of the window itself.
        Fixes #166894.

        * src/window.c: (find_root_ancestor): new function,
        (meta_window_raise): get the ancestor of the given window and
        raise it if possible instead of the window

libwnk:
2005-02-12  Elijah Newren  <newren@gmail.com>

        Check for whether a window or one of its transients has the
        demands attention hint set.  Fixes #166713.

Also, i use sloppy focus, that might be involved.
Comment 5 Elijah Newren 2005-02-15 17:44:22 UTC
I can't duplicate using sloppy or mouse focus modes, but there aren't any new
patches since the ones you point out...

Could you apply the patch at
http://www.gnome.org/~newren/temp/debug-focus.patch, rebuild metacity, run
'metacity --replace' in a terminal, click on all windows on your workspace at
least once, then do whatever you need to trigger this bug, and then post all the
terminal output here?  It may also be helpful if you could screenshot how your
windows are setup.  Finally, a few more questions: after you click on the
grouped window icon, is there a mouse under your window?  Are you quickly moving
the mouse to where a window is?
Comment 6 Alexander Larsson 2005-02-16 08:30:56 UTC
Created attachment 37526 [details]
Window layout before
Comment 7 Alexander Larsson 2005-02-16 08:32:06 UTC
Created attachment 37527 [details]
Window layout after selecting the "Glade: <untitled>" window in the group menu
Comment 8 Alexander Larsson 2005-02-16 08:32:33 UTC
Created attachment 37528 [details]
Metacity debug log
Comment 9 Alexander Larsson 2005-02-16 08:37:53 UTC
With the window layout visible in the before screenshot i launched metacity
--replace, clicked on all windows, then selected the "Glade: <untitled>" window
in the glade group in the tasklist. The only thing that happened was the group
going bold (see after screenshot). Then i killed metacity, the log from it is
attached.

When I click on the window in the group menu, there is a window right under the
menu. If there isn't one, this doesn't happen. And, in fact, that window becomes
focused when the menu disappears.
Comment 10 Elijah Newren 2005-02-16 08:56:15 UTC
> When I click on the window in the group menu, there is a window right under the
> menu. If there isn't one, this doesn't happen

Ah, yes, I can duplicate.  And the log shows that it was due to my
braindeaded-ness in treating display->last_focus_time like
max(window->net_wm_user_time), which came from one of the patches in bug
166395...  Also, the fact that you've had your xserver running for over 30 days
shows that I had a bug in my debugging code--I should have used %lu in one place
where I used %d.

Thanks for the screenshots, debugging log, and extra info.  :)
Comment 11 Richard Hoelscher 2005-02-17 22:41:20 UTC
Elijah:

After your commit yesterday, I'm getting this warning when I compile metacity
from HEAD...

display.c: In function `event_callback':
display.c:1568: warning: long unsigned int format, guint32 arg (arg 4)

(Using G_GUINT32_FORMAT in place of lu gets rid of it... )
Comment 12 Elijah Newren 2005-02-19 20:38:50 UTC
Richard: Thanks, I'll fix that in just a minute...

Okay, so working on this bug lead to uncovering some other issues and/or
reminded me of corner-case bugs that I had come across recently but hadn't filed
a report for.  In particular, we currently have the following problems: 
1) We unconditionally trust timestamps sent for messages such as
   _NET_ACTIVE_WINDOW and such.  This can result in window->net_wm_user_time
   and display_last_focus_time being set to some point days in the
   future--resulting in all new focus requests (e.g. user clicking on a window)
   being ignored until the xserver time caught up to the buggy time we had
   previously been passed.  I haven't seen this in the wild, but while trying
   to write the patch for bug 161361 I accidentally triggered this with one of
   my preliminary libwnck patches).  We need a way of sanity checking
   timestamps so that we can correct/workaround this issue.
2) All of our code to update user_time was buggy because it shouldn't be done
   if we've already processed a request for another update with a later
   timestamp.  I haven't seen this in the wild yet, but it could bite us later
   and would be a subtle and painful bug to track down.
3) The XSERVER_TIME_IS_BEFORE macro did not correctly handle timestamps of 0
   if the timestamps were more than halfway towards the wraparound point (i.e.
   if the xserver had been up for over 25 days or so).  I don't know if this
   bug would ever affect us since it's really rare, but it would be a pain to
   try to debug and replicate unless the user provided really good duplication
   steps.

And some other issues that aren't technically bugs but which would really make
things cleaner and/or more helpful:
4) Verbose logs from users wouldn't be very useful because they didn't mention
   the relevant windows and the timestamps involved.
5) Updating user_time wasn't done consistently/correctly so it makes sense to
   abstract it into a separate function (especially with the extra checking 
   code needed to fix issue(2) above)
6) Both meta_display_set_input_focus_window() and
   meta_display_focus_the_no_focus_window() were using nearly identical code
   for checking timestamps.  Since those sections needed to be changed and
   extended due to both this bug report and (4), it makes sense to put it in a 
   separate function.

I'll attach a patch in a minute that solves the bug reported here and all these
issues.  If you really want me to break into separate patches I can do that, but
many of these issues are fairly intertwined so the patches would not be
independent of each other for the most part...
Comment 13 Elijah Newren 2005-02-19 20:39:42 UTC
Created attachment 37688 [details] [review]
Fix lots of Metacity timestamp issues
Comment 14 Havoc Pennington 2005-02-19 23:38:28 UTC
Comment on attachment 37688 [details] [review]
Fix lots of Metacity timestamp issues

Why do you need the "fudge factor" in sanity_check_timestamps, I think if a
client gets event A with timestamp 1 and sends _NET_ACTIVE_WINDOW with
timestamp 1, then if metacity gets event B after the _NET_ACTIVE_WINDOW, event
B is guaranteed to have a timestamp greater than event A. This is because any
event with timestamp before event A would have been sent to metacity prior to
the _NET_ACTIVE_WINDOW. Therefore a _NET_ACTIVE_WINDOW with a timestamp greater
than event B's timestamp was broken. Does it not work?

I think sanity_check_timestamps()  would be clearer if the timestamp param were
called known_good_timestamp (assuming I understand the code properly)

timestamp_too_old is confusing, since it has this side effect of changing the
timestamp sometimes and I don't quite understand that case.

The comment above last_focus_time I think would be more useful if it said
whether it's the last timestamp metacity passed to SetInputFocus or the last
timestamp from a focus event.

Obviously I don't feel I fully understand this stuff ;-)
Comment 15 Elijah Newren 2005-02-20 02:11:23 UTC
Why do we get to assume there's a strictly chronological sequence of events
reported to Metacity from the xserver?  Given the asynchronousness of how the
xserver reports events, I was assuming the following could happen:
   1) User clicks on some window with timestamp A
   2) Shortly later, somehow a _NET_ACTIVE_WINDOW message is sent with
      timestamp B, where B>A
   3) Metacity receives the _NET_ACTIVE_WINDOW message and processes it
   4) Metacity receives the ButtonPress event and processes it.
If this can happen, then the ButtonPress event does not mean that the
_NET_ACTIVE_WINDOW message had an invalid timestamp.  The fudge factor is an
attempt to account for cases like this.

I can easily rename to known_good_timestamp; that sounds like a good idea.

There are two cases where timestamp_too_old can change the timestamp passed. 
One is if CurrentTime is passed, in which case we need to get a real timestamp.
 I think that case makes sense to you because we've been doing that for a while.
 The other case comes from this bug report, and is due to the fact that focusing
should depend on timestamps from _NET_WM_USER_TIME, but windows can be focused
without the user interacting with the window (in sloppy and mouse focus modes
only).  To be precise, let me provide an example with full details about what
would go on if this timestamp weren't modified for this case:
  1) User clicks on some button in the tasklist
  2) User moves their mouse around while waiting, crossing some windows in
     their path
  3) Metacity catches the EnterNotify events from moving the mouse around and
     thus focuses the relevant window.
  4) Metacity recieves the _NET_ACTIVE_WINDOW message from step (1).
  5) Metacity notices that the new _NET_ACTIVE_WINDOW message has a timestamp
     greater than that of any other window's _NET_WM_USER_TIME
  6) Therefore, Metacity decides that the window corresponding to the
     _NET_ACTIVE_WINDOW request should be allowed to take focus.
  7) Metacity tries to focus the window with the timestamp from the
     _NET_ACTIVE_WINDOW message.
  8) The xserver ignores the XSetInputFocus request because the timestamp
     is less than the last focus time.
This is the core thing that this bug is about (except that, because of the
special setup, Alex actually didn't move the mouse at all after clicking).  The
fix is that if we decide that a window should be focused, we have to make the
timestamp be at least as great as display->last_focus_time.

We don't get timestamps from FocusIn/FocusOut events.  ;-)  Since that's easy to
forget, you're probably right that it makes sense to add a clarifying comment
that it is the last timestamp Metacity passed to XSetInputFocus.

Does that clear up the parts that you don't understand?  If not, what's still
unclear?
Comment 16 Elijah Newren 2005-02-20 02:12:28 UTC
How did I botch the summary so bad?  My summary is totally false; let me fix that...
Comment 17 Havoc Pennington 2005-02-20 03:40:37 UTC
X server events are always ascending in time AFAIK (except for wraparound). I
haven't tracked down where the spec says this but I think it's right.

This should not be able to happen, I'll try to explain why:

   1) User clicks on some window with timestamp A
   2) Shortly later, somehow a _NET_ACTIVE_WINDOW message is sent with
      timestamp B, where B>A

-> Where did timestamp B come from? It had to come from an event received by the
application that owns the window. If B>A, this event was *after* the button
press. So _NET_ACTIVE_WINDOW gets sent after the app receives the button press,
unless the app processes its event queue out of order. Even if the app
rearranges its event queue, we know that _NET_ACTIVE_WINDOW is sent after the
button press event was queued.

   3) Metacity receives the _NET_ACTIVE_WINDOW message and processes it
   4) Metacity receives the ButtonPress event and processes it.

-> The ButtonPress will have been placed in metacity's event queue at the same
time it was placed in the app's event queue. Thus, since we've shown that
_NET_ACTIVE_WINDOW was sent after the button press was queued,
_NET_ACTIVE_WINDOW must be in metacity's event queue *after* the ButtonPress.

This seems rational to me? If you don't see a problem with it I'd try writing
code that makes this assumption and let's see if anything breaks it.

Thanks for the explanation on this bug, I get it now I think. Possible issue:
one purpose of the timestamp in _NET_ACTIVE_WINDOW is to be sure the right
_NET_ACTIVE_WINDOW "wins" if there are two. By promoting one of the
_NET_ACTIVE_WINDOW to a newer time, won't we break this? i.e. we really only
want to override intervening focuses that came from enter/leave events, maybe.
Do we want to override *any* intervening focus no matter what the origin?

Another way to phrase this, say:
 - taskbar button is clicked, sending _NET_ACTIVE_WINDOW
 - random app sends _NET_ACTIVE_WINDOW for random window
 - metacity will now focus an undefined window (there's a race)

Before it was defined that the one with the later timestamp would win.

This specific example seems far-fetched and irrelevant. But maybe there are
real-life examples?
Comment 18 Elijah Newren 2005-02-20 04:13:20 UTC
Sounds fine to me; making that assumption merely means nuking the fudge factor.

I'm pretty sure there would be real-life examples that match the scenario you
propose, but it doesn't cause any problems.  Behavior is still defined such that
the one with the later timestamp will win. The timestamp that
timestamp_too_old() modifies is a local variable of either
meta_display_set_input_focus_window() or
meta_display_focus_the_no_focus_window().  In other words,
window->net_wm_user_time is not modified for any window and neither is
display->last_user_time or even display->last_focus_time; only the timestamp
passed to XSetInputFocus() is modified.  Timestamps from _NET_ACTIVE_WINDOW are
only compared against net_wm_user_times of other windows, meaning that there is
no such race condition.

I'll attach a new patch in a few minutes incorporating all the changes you've
suggested so far...
Comment 19 Elijah Newren 2005-02-20 04:31:21 UTC
Created attachment 37695 [details] [review]
Incorporate suggested changes

This patch is different from the previous one due to the following changes:
  - rename argument of sanity_check_timestamps to known_good_timestamp
  - assume chronological ordering of reported events (i.e. remove the fudge
factor
    from sanity_check_timestamps)
  - clarify use of display->last_focus_window
  - add a clarifying comment in display.h to
meta_display_set_input_focus_window
Comment 20 Havoc Pennington 2005-02-20 15:28:59 UTC
Comment on attachment 37695 [details] [review]
Incorporate suggested changes

There are a couple things that look like they won't compile but with those
fixed go ahead and commit.

Sorry for making you do the long explanation but I wanted to be sure I
understood. Thanks for the help.
Comment 21 Elijah Newren 2005-02-20 17:20:40 UTC
Oops, yeah, I didn't quite fully remove the fudge factor..  :-/

No worries; I was almost certain I'd have to give a long explanation considering
how many things this bug encompassed:

2005-02-20  Elijah Newren  <newren@gmail.com>

	Big patch to cover about 6 different issues in order to correct
	rare problems with timestamps (make sure window selected in
	tasklist actually gets focus, sanity check timestamps to avoid
	rogue apps hosing the system, correct the updating of
	net_wm_user_time, correctly handle timestamps of 0 when comparing
	xserver timestamps for those who have had their systems up for
	over 25 days or so, add some debugging information to verbose
	logs, some code cleanups).  Fixes all issues listed in #167358.

	* src/display.h: (struct _MetaDisplay): clarify comment on
	last_focus_time, introduce a new variable--last_user_time,
	(XSERVER_TIME_IS_BEFORE macro): put this functionality into a
	separate macro and then introduce a new macro with this name that
	uses the old one but adds additional special-case checks for
	timestamps that are 0, (comment to
	meta_display_set_input_focus_window): add information about how
	last_user_time should be used in this function
	
	* src/display.c (santiy_check_timestamps): new function,
	(meta_display_open): intialize display->last_user_time,
	(meta_display_get_current_time_roundtrip): use the timestamp,
	which is known to be good, in order to sanity_check_timestamps,
	(event_callback): use the new meta_window_ste_user_time() function
	in order to correct problems, use the timestamp of KeyPress and
	ButtonPress events, which are known to be good, in order to
	sanity_check_timestamps, (timestamp_too_old): new function for
	common behavior of meta_display_focus_the_no_focus_window and
	meta_display_set_input_focus_window, with added checking for
	display->last_user_time in addition to display->last_focus_time,
	(meta_display_set_input_focus_window): replace some of the code
	with a call to timestamp_too_old(),
	(meta_display_focus_the_no_focus_window): replace some of th ecode
	with a call to timestamp_too_old()

	* src/window.h: (meta_window_set_user_time): new function to
	abstract the many things that need to be done when updating the
	net_wm_user_time of any window

	* src/window.c: (meta_window_activate): add debugging spew, make
	sure the comparison is made with last_user_time NOT
	last_focus_time, use meta_window_set_user_time() function in order
	to correct problems, (meta_window_client_message): add a newline
	to a debugging message to make them easier to read,
	(meta_window_set_user_time): new function

	* src/window-props.c (reload_net_wm_user_time): use the new
	meta_window_ste_user_time() function in order to correct problems

Committed now, though.