After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 152000 - A variety of focus race conditions in Metacity
A variety of focus race conditions in Metacity
Status: RESOLVED FIXED
Product: metacity
Classification: Other
Component: general
2.8.x
Other Linux
: High normal
: ---
Assigned To: Metacity maintainers list
Metacity maintainers list
Depends on:
Blocks:
 
 
Reported: 2004-09-06 17:25 UTC by Elijah Newren
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: 2.8.x
GNOME version: 2.7/2.8


Attachments
Revert the patch in bug 131582 (1.49 KB, patch)
2004-09-06 17:25 UTC, Elijah Newren
committed Details | Review
Allow EnterNotify events to do their job (2.60 KB, patch)
2004-09-15 22:17 UTC, Elijah Newren
none Details | Review
The results of my extensive testing (4.67 KB, text/plain)
2004-09-18 19:46 UTC, Elijah Newren
  Details
Patch to fix race conditions (or at least make them harder to trigger) (20.27 KB, patch)
2004-09-28 18:42 UTC, Elijah Newren
none Details | Review
Updated patch (19.78 KB, patch)
2004-10-02 18:16 UTC, Elijah Newren
needs-work Details | Review
Patch with changes suggested by Havoc (20.13 KB, patch)
2004-10-03 18:24 UTC, Elijah Newren
accepted-commit_now Details | Review

Description Elijah Newren 2004-09-06 17:25:16 UTC
There's a race condition affecting focus consistency for both sloppy and mouse
focus.  If you have three windows on your desktop (let's call them A, B, and C),
with A focused and on top of B (or at least with the upper right corner over B)
and C visible elsewhere, then clicking the close button for window A and rapidly
moving the mouse to window C can trigger this problem.  Despite the mouse being
in window C, you can end up with window B having focus in sloppy focus, or end
up with no window having focus in mouse focus.

The cause of this race condition is the patch in bug 131582 that was applied. 
That patch was needed at that time, because we had a dual policy of "(A) sloppy
and mouse focus means a window gets focus when the mouse enters them.  (B) when
a window is closed or minimized, we should focus the mru window"  Since (A) and
(B) were sometimes contradicting policies, we had races (which could be
triggered even without moving the mouse, let alone rapidly).  Because of bug
135810, we no longer have contradicting policies, and thus the patch in bug
131582 is not needed.  Reverting it solves this race condition with window close
and rapid mouse movement, and doesn't regress the previous bugs.

I'll attach the patch in a moment.
Comment 1 Elijah Newren 2004-09-06 17:25:42 UTC
Created attachment 31334 [details] [review]
Revert the patch in bug 131582
Comment 2 Havoc Pennington 2004-09-06 18:25:17 UTC
I'll take your word for it on this, the patch looks sensible to me since it
removes code ;-)
Comment 3 Elijah Newren 2004-09-15 16:30:00 UTC
I committed this, but further testing shows that this only fixes part of the issue.

Before, if we moved the mouse fast enough, the EnterNotify for window C would be
ignored and thus window C wouldn't be focused at all.  Now, if we move the mouse
fast enough, window C will be focused but due to the asynchronousness of things
it may be focused before window B is.  So we can still end up with the wrong
window having focus.

I played around a bit and logging event->xany.serial for all these different
events gives us the proper ordering; we could probably use that to make sure
that focus is correct (only focus a window if the serial of the event causing it
to be focused is greater than the serial causing the previous window to have
been focused)
Comment 4 Elijah Newren 2004-09-15 22:12:01 UTC
Ignore the crap above about serials.  It's really invasive, since it's just
trying to do the same thing as timestamps which means a lot of extra argument
passing.  It's really stupid, because XSetInputFocus supports timestamps but not
serials.  So:

One way to fix this would be to make sure we always provide a valid timestamp to
meta_window_focus (because then XSetInputFocus knows how to do the right thing).
 In particular, this means making meta_workspace_focus_default_window take a
timestamp.  That looks hard.

The easy way to fix this is to just allow EnterNotify events to do the focusing
like we claim we're doing.  In other words, if the mouse is over a window don't
manually focus it (since we know that an EnterNotify event will occur that will
cause it to get focus).  focuses from EnterNotify events do have valid
timestamps, so this works great.  I'll attach the patch in a minute.
Comment 5 Elijah Newren 2004-09-15 22:17:34 UTC
Created attachment 31596 [details] [review]
Allow EnterNotify events to do their job

Note that we do have to do one additional thing in this patch because of how
events can be ordered.	By not focusing a window in focus_default_window, we
might get a notification that no window has focus before the EnterNotify will
happen.  In this case, we want to ignore the workaround in display.c of
focusing the no_focus_window because we know the EnterNotify will happen and
focus a window for us.
Comment 6 Elijah Newren 2004-09-15 22:32:07 UTC
Um...we ignore EnterNotify events on upon workspace switching, so this makes
things look a little weird.  I think this patch is part of the solution, but
that we need the rest of the solution before acting.

I know, I need to be smacked upside the head.  ;-)  I'll test my next solution
much more thoroughly before posting.
Comment 7 Elijah Newren 2004-09-16 20:10:54 UTC
Joy oh joy!  There's multiple focus race conditions!

The race condition I described above is much easier to trigger on one of my
machines than another.  In fact, applying the attachment 31334 [details] [review] patch makes the
race condition mentioned here impossible for me to trigger on my main machine at
home--which is why I thought the patch was sufficient to fix the problem (I had
tested the patch, but only on that machine).

Anyway, as part of my more thorough testing, I have discovered that focus races
exist under at least 3 additional similiar conditions.  With the attachment
31334 [details] [review] patch, some are harder or impossible for me to trigger, but some still
persist.  And I can trigger at least one of the focus races using click-to-focus
(which, of course, the patches I have generated don't help with), so this isn't
specific to sloppy and mouse focus.

Anyway, I haven't finished all my testing to determine all the cases where I can
trigger races.  I'm going to do some more testing and coding and I'll report
back later...  Hopefully this heavily repetitive testing (many cases are hard to
trigger and require lots of attempts) doesn't give me carpal tunnel syndrome in
the meantime.  :-)
Comment 8 Elijah Newren 2004-09-18 19:46:22 UTC
Created attachment 31672 [details]
The results of my extensive testing

I thought it might be useful to provide a write-up of all the testing I did and
all the race conditions I have uncovered.  So this is a short (in comparison to
how much work went into generating it) text file about that.  Now that I've
identified all the ones I'm able to personally trigger, I can get back to
making a patch and seeing which of these I can prevent with it...
Comment 9 Elijah Newren 2004-09-28 18:27:31 UTC
Okay, so I have some patches that I've been running with for a few days, just to
make sure that there's no surprises.  I've also done further stress testing.  I
believe that the patch removes some race conditions entirely, but that some of
the race conditions remain for all focus modes.  However, even for those that I
suspect remain, they are /much/ harder to trigger (and in many such cases, I
have been unable to trigger at all)

There are some interesting things about this bug (bugs?) too.  It is virtually
impossible to trigger any of these race conditions when the system is under load
(i.e., if I'm running some big computation when trying to do the testing, or am
compiling, or if I'm running Metacity under valgrind).  Further, the machine
that I'm most easily able to trigger the race conditions under is a dual
processor system.  (Note that my single processor machine at home is a little
faster than the processors in my SMP system at school, but it's still easier to
trigger this on my school machine).  So it appears that a fast unloaded, system
coupled with SMP make these race conditions easier to detect.

The race conditions that I have been able to trigger even with the patch I'll
attach in a minute are cases 6, 8, and 9 for click-to-focus.  And, as I said
before, these three cases are harder for me to trigger than before (and they
were already fairly difficult cases--definitely more so than most the other
cases on the list).  I have been unable to trigger any of the race conditions
for sloppy or mouse focus methods (whereas without the patch, the race
conditions are definitely easiest to trigger under those focus methods).
Comment 10 Elijah Newren 2004-09-28 18:42:05 UTC
Created attachment 32034 [details] [review]
Patch to fix race conditions (or at least make them harder to trigger)

This patch basically implements what I said would work in comment 4--add a
timestamp to focus_default_window so that XSetInputFocus calls have correct
timestamps and can thus do their work correctly.  However, unlike I originally
envisioned (and I started writing a patch for), I didn't keep propogating
timestamp passing to more and more functions throughout the code--doing that
was kind of ugly, and besides, in most cases we just didn't have a timestamp
anyway.

For the calls to focus_default_window that wouldn't have a readily available
timestamp (and for which meta_display_get_current_time only returns
CurrentTime), I added a call to a new function:
meta_display_thou_shalt_return_a_timestamp.  This new function first tries
meta_display_get_current_time, and if that doesn't provide a timestamp then it
gets one manually from the X server.  I would really rather not have to use
this new function, but many of the race conditions remain and are reasonably
easy to trigger without it.  We could add a warning to that function so that
whenever it is called, people get a reminder that it'd be nicer if we could
find another way to get a real timestamp for the various cases.  *shrug*

Finally, I also remove the place where we currently use the metacity sentinel
property.  It was causing part of the race condition problems for sloppy and
mouse focus, and removing it does not cause any ill effects (since the bug it
was designed to fix was fixed in another way--see the comment I added yesterday
to bug 110970)
Comment 11 Rob Adams 2004-10-02 15:58:52 UTC
you've got a if (timestamp == CurrentTime && FALSE) in there; probably want to
figure out what that's supposed to be.

I think we can come up with a better name for
meta_display_thou_shalt_return_a_timestamp.  Perhaps
meta_display_retrieve_timestamp or _get_current_time_roundtrip to make clear
that its a server roundtrip and should be avoided like the plague.

Does this patch make the focus sentinel unnecessary?  I notice that you removed
the code in idle_calc_showing that sets the focus sentinel, and also the code in
the event handler that checks the sentinel.  The original reason for it was race
conditions on workspace switching, where we'd get an EnterNotify event on a
window on the new workspace after we'd already tried to set our own default
focus window.  I don't see how having more timestamps would remove the need for
that, since the timestamp on the EnterNotify will come after the timestamp of
the event which triggered the workspace switch.

But regardless if we're removing the code that uses the sentinel we should
remove all of it and not leave little bits of it anywhere.
Comment 12 Elijah Newren 2004-10-02 17:12:09 UTC
Oops, yeah, I forgot about that.  Currently, timestamp will never equal
CurrentTime in that function, and thus the "&& FALSE" is useless (for now).  I
had tested the if portion of the code in attachment 31596 [details] [review], and in preliminary
patches I wanted to force testing of the else portion of the loop even in the
cases where I didn't yet have a valid timestamp.  The "&& FALSE" was meant as a
temporary measure to allow me to do that--and I forgot to remove it.  The reason
the if portion of the patch is there is in case later patches add calls to
focus_default_window but a timestamp isn't provided.  We really shouldn't allow
that to happen (and thus the warning emitted when it does), but if it does then
that if section of the loop is there to provide the best possible behavior given
the circumstances.

Yeah, I figured it needed a better name, but I couldn't think of a better one. 
  _get_current_time_roundtrip sounds good to me--I'll make the change.

I addressed the focus sentinel stuff in comment 10.  My patch doesn't make it
unnecessary--it was already unnecessary (and introduced race conditions), as I
explained in my comment to bug 110970.  That comment in bug 110970 also explains
why I disagree with removing all the code regarding the use of the sentinel
(namely, that I think it's needed infrastructure for handling bug 151996).
Comment 13 Rob Adams 2004-10-02 17:24:32 UTC
If you need the code for bug 151996, should we leave in the call to
focus_sentinel_clear in display.c:event_callback?
Comment 14 Elijah Newren 2004-10-02 17:31:45 UTC
Oh, right.  Yeah, I'll add that line back in.
Comment 15 Elijah Newren 2004-10-02 18:16:09 UTC
Created attachment 32171 [details] [review]
Updated patch

Okay, this gets rid of the &&FALSE, renames to get_current_time_roundtrip, and
returns the meta_display_focus_sentinel_clear call.

Thinking about this further, there's a way we could avoid the roundtrip.  If we
added a field of most_recent_time to the MetaDisplay struct and add 

  if (display->current_time != CurrentTime && 
      display->current_time >  display->most_recent_time)
    {
      display->most_recent_time = display->current_time;
    }

right after the setting of display->current_time, then we could probably use
the value of display->most_recent_time instead of querying the server in
get_current_time_roundtrip.  This idea had occurred to me before, but I
rejected it for some reason.  I don't remember the exact reason, but I believe
it was probably unfounded.  Would that be worth testing?
Comment 16 Havoc Pennington 2004-10-03 16:58:19 UTC
I think Owen can probably explain why "most recent timestamp" isn't as good as
the round trip better than I can, but it amounts to "it doesn't fix race
conditions" iirc.
Comment 17 Havoc Pennington 2004-10-03 17:04:20 UTC
Comment on attachment 32171 [details] [review]
Updated patch

If the function is user_lower_and_unfocus() doesn't that mean it's in response
to user action and we should have a key/button event timestamp? In fact this is
called from the button press handler in frames.c, so why not just pass in a
timestamp.

I like adding the _roundtrip() function anyhow, just so we have it.

meta_screen_get_mouse_window() is a round trip, are you calling that in
response to every event? That's going to be bad, and also a race condition I
think.

adding timestamp arg to workspace_activate() which seems like the bulk of the
patch seems very sensible.

Thanks as always
Comment 18 Rob Adams 2004-10-03 17:06:44 UTC
Also, try to keep your code below 80 columns wherever possible.  Function calls
can be changed to put one argument in each line if they have wide or a large
number of arguments.
Comment 19 Elijah Newren 2004-10-03 18:13:59 UTC
Rob:  I tried to do that, but in each case where I let lines extend beyond 80
columns, putting arguments 1-per-line would not fix the problem.  That is,
unless I violate a different problem such as making all arguments be left
aligned, or having all arguments appear just to the right column-wise of where
the function name is.  If there's something better I could do, could you
demonstrate with a concrete example?  I'm trying to follow the style I've found
elsewhere in the code and I'm willing to do better if I can see how to do so.  I
guess I could put them one-per-line anyway, since that would at least make it
closer to 80 columns.  Is that what you mean?
Comment 20 Elijah Newren 2004-10-03 18:24:23 UTC
Created attachment 32185 [details] [review]
Patch with changes suggested by Havoc

I went ahead and made user_lower_and_unfocus take a timestamp.

Good point on the meta_screen_get_mouse_window stuff.  It didn't call that on
every event, but rather only on every FocusIn/FocusOut event received _for the
root window_.  I could move the code inside the if section of code below so
that it would be called less frequently, but I think you're right that calling
this function may be a race condition.	Further, it's only purpose was to avoid
a race condition which would probably be quite difficult to trigger (I hadn't
tried but from my other experience I expect that it would be very rare if I
could trigger at all) so it might be introducing a bigger bug than it solves. 
Besides, hopefully we can make that section of code unnecessary anyway
eventually with an X extension (bug 125492).  So, in summary, I just pulled it
out as you suggest.

Okay to apply?
Comment 21 Havoc Pennington 2004-10-04 20:03:43 UTC
Comment on attachment 32185 [details] [review]
Patch with changes suggested by Havoc

It looks OK to me. 

On the 80-column issue generally I'd rather keep the other formatting rules
than the 80-column rule, when they conflict.
Comment 22 Elijah Newren 2004-10-04 20:33:47 UTC
committed.