After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 307875 - App-modal dialogs need to be supported so they don't pop behind and become lost (+ some other stuff)
App-modal dialogs need to be supported so they don't pop behind and become lo...
Status: RESOLVED DUPLICATE of bug 126489
Product: metacity
Classification: Other
Component: general
2.10.x
Other Linux
: High critical
: ---
Assigned To: Metacity maintainers list
Metacity maintainers list
Depends on:
Blocks: 164841
 
 
Reported: 2005-06-16 04:42 UTC by Billy Biggs
Modified: 2005-07-25 04:57 UTC
See Also:
GNOME target: ---
GNOME version: 2.9/2.10


Attachments
focustest.tar.gz (724.67 KB, application/x-compressed-tar)
2005-06-16 04:43 UTC, Billy Biggs
  Details
multiplemodal.c (1.41 KB, text/plain)
2005-06-16 23:36 UTC, Billy Biggs
  Details
Fix meta_window_raise for the case of multiple transients of a single window (1.95 KB, patch)
2005-07-11 19:19 UTC, Elijah Newren
committed Details | Review
Avoid obscuring additional modal dialogs (6.59 KB, patch)
2005-07-17 00:52 UTC, Elijah Newren
committed Details | Review

Description Billy Biggs 2005-06-16 04:42:56 UTC
metacity 2.10.1-2 (Debian unstable) and metacity 2.10.0-1 (Fedora Core 4)

If an application opens two modal dialogs in sequence, D1 and then D2, sometimes
when D2 opens it appears below D1 in the stacking order (although it does gain
focus).  This behaviour is new in metacity 2.10 and is not reproducable with
earlier versions.  Both D1 and D2 are marked as transients for the main window,
and both have the dialog type hint set.

The problem can be reproduced in eclipse by selecting a file in the resource
navigator, hitting Ctrl+C to copy, and Ctrl+V to paste.  Two dialogs appear, the
first is a progress dialog, the second is a name conflict dialog requesting user
input.  It always opens below the progress dialog, obscured from view, and
cannot be raised above it.

I can reproduce this in a small SWT application by simply creating two dialogs
as children of a main window.  The application is attached, to run it using gij,
use:   gij-4.0 --cp . -Djava.library.path=. TwoModalDialogs

With the small example, it is clear that this is a race.  The problem does not
happen all the time.  However, the race does not depend on when the second
dialog is opened.  Even if I do this:

  open 1st dialog
  flush event loop
  sleep for 1 second
  flush event loop
  open 2nd dialog

It can still happen that sometimes the second dialog appears below the first.  I
have so far been unable to construct a small GTK+ example, nor have I been able
to conclusively isolate what triggers it in the SWT code.
Comment 1 Billy Biggs 2005-06-16 04:43:37 UTC
Created attachment 47846 [details]
focustest.tar.gz
Comment 2 Billy Biggs 2005-06-16 20:27:24 UTC
Based on some more experiments, my best guess for what is going on is that if a
window gets a _NET_ACTIVE_WINDOW request with a timestamp newer than the last
user interaction during open, it appears below the currently active window in
the stacking order.
Comment 3 Billy Biggs 2005-06-16 23:36:15 UTC
Created attachment 47881 [details]
multiplemodal.c

Here is a small GTK+ application which consistently reproduces the problem for
me.
Comment 4 Havoc Pennington 2005-06-17 17:22:00 UTC
I think we have some existing discussion on this topic, either on a bug or
wm-spec-list. I don't remember the details.
Comment 5 Billy Biggs 2005-06-24 01:41:22 UTC
I searched around on both but found nothing relevant.  There's clearly a bug
here if the calls to focus the windows modify the stacking order.
Comment 6 Elijah Newren 2005-07-11 18:01:34 UTC
Okay, I found some time to finally look at this bug.  Found the issue.  Well,
two issues.  One in Metacity, but you should also note that your testcase itself
is buggy.  The metacity issue here is bug 166894; in order to keep windows
stacked together, we raise a window's ancestor instead of raising the window
itself.  But I didn't think about multiple children.  I think we can fix this by
simply raising both the ancestor and the child.

About the bug in the testcase:  Sending a _NET_ACTIVE_WINDOW message with a
timestamp of 0 is inherently broken; we support it as best we can for legacy
applications but it will cause bugs.  But do you even need to send a
_NET_ACTIVE_WINDOW message?  Newly mapped windows don't need activation messages
as they are activated by default.  (If you are doing it because windows aren't
being properly focused due to gtk+ using incorrect timestamps, you should
instead use the gtk+ functions to correctly set _NET_WM_USER_TIME before
mapping).  Your original comment suggests that the situation for your real code
is for newly mapped windows as well (at least, as far as I could tell), but if
it's for some other situation were you really do want to send activation
messages then you need to get a valid timestamp. 
gdk_x11_display_get_user_time() will most likely be of assistance here (it
returns the most recent user interaction with any windows of your application;
if that's the user interaction that caused the activation message to be sent,
then it's what you need; if you have one of the other really rare cases, you'll
need to get a timestamp from the relevant user interaction event).


WARNING: REALLY VERBOSE DETAILS FOLLOW ABOUT TIMESTAMP 0 BEING BAD WITH
_NET_ACTIVE_WINDOW

In case you're curious, here's a detailed history of _NET_ACTIVE_WINDOW messages
with timestamp of 0 and why it's buggy (lots of detail here so that I can just
refer to this comment later when this same issue comes up):

_NET_ACTIVE_WINDOW was originally added to the EWMH spec without a timestamp
parameter (oops!).  The EWMH declares that all unused fields in a client message
should be 0.  Thus, if Metacity gets a _NET_ACTIVE_WINDOW message with a
timestamp of 0, it can't tell whether it came from a legacy client using the old
version of the spec or is a client using the new version of the spec (with an
invalid timestamp).  For backwards compatibility, it must treat this as "please
raise and focus me".  But now there's a dilemma: Metacity has no clue what
timestamp caused the window to be activated.  That's bad.  It means that it's
impossible for Metacity to provide correct behavior in all circumstances and it
just has to take a wild guess.  This is a problem that is definitely accentuated
by focus stealing prevention (since it relies more heavily on having correct
timestamps), but it is not new to 2.10.x; see bug 152000 for all kinds of race
conditions caused by missing timestamps.  IF we didn't have to worry about
legacy apps, we could solve this problem by either ignoring such requests
("obviously this window requested to be activated at some time in the ancient
past and we missed the request until now; no point in activating since the user
has done lots of stuff in the mean time") or treat them as requests to
lower/defocus the window.

Here's a step-by-step explanation of events that occur in this case, which sort
of illustrate why it's problematic:

1) User types in the main window of multiplemodal testcase program at xserver
   timestamp A.
2) Gtk+ receives keystroke event and sets _NET_WM_USER_TIME of the main window
   to A.
2) Program launches new modal dialog (gtk+ sets _NET_WM_USER_TIME of new
   dialog to A, as it should)
3) Metacity gets new map request, checks the timestamp (A), compares it to the
   current window with focus (the main window, which also has timestamp A),
   and thus places the new window on top and with focus
4) Program sends a _NET_ACTIVE_WINDOW message for this modal dialog with
   timestamp 0.
5) Metacity receives the activate request, notices that the timestamp is 0, 
   pings the xserver for a timestamp which will return B (where B is later
   than A), and sets the _NET_WM_USER_TIME of the modal dialog to B.
6) Program launches second modal dialog (gtk+ sets _NET_WM_USER_TIME of second
   modal dialog to A, as it should)
7) Metacity gets new map request, checks the timestamp (A) compares it to the
   current window with focus (the first modal dialog, which has timestamp B),
   notices that "the user has interacted with modal-dialog-1 after having
   launched modal-dialog-2" and thus does not focus the window and stacks it
   below modal dialog 1 so that the user can "continue" interacting with that
   window.
8) Program sends a _NET_ACTIVE_WINDOW message for modal dialog 2 with
   timestamp 0.
9) Metacity receives the activate request, notices that the timestamp is 0, 
   pings the xserver for a timestamp which will return C (where C is later
   than both B and A), raises the second modal dialog (well, it's ancestor
   which doesn't quite work here) and focuses it and sets its
   _NET_WM_USER_TIME to C.

Note that although you sent a _NET_ACTIVE_WINDOW message for the second dialog,
others might not and they'd get the wrong focus/stacking.  Also, in your case,
if the user manages to type in another application while waiting for the window
to appear (which is feasible if the app is running from a remote machine), then
your activate request would erroneously treat the newly created window as having
been interacted with more recently than the application the user is actually
typing in, causing your window to rudely pop up on top and steal focus, likely
sending keystrokes to the wrong place.

Also, you are probably noticing races because although the application can send
xevents and messages in a certain order, Metacity receives them asynchronously.
 Without a timestamp to differentiate, Metacity doesn't know which activate
message was actually first.  (Yes, there is a related problem that is inherently
problematic--namely, if a single user action launches two windows then their
_NET_WM_USER_TIME values will be the same and if the mapping request events are
generated out of order Metacity won't be able to tell.  I don't know a good way
to fix this for Metacity.  But your example makes it more of an issue since you
have two map requests and two activate window messages, allowing for more things
to go out of order)
Comment 7 Billy Biggs 2005-07-11 18:44:31 UTC
Thanks for looking into this.

I'm still confused -- why does the stacking order depend on any timestamps?  If
an application window creates two dialogs, I think the stacking order should
depend entirely on the order of the map requests, regardless of any user
interaction or focus messages.  In what case would the map requests be out of order?
Comment 8 Elijah Newren 2005-07-11 19:19:33 UTC
Created attachment 48977 [details] [review]
Fix meta_window_raise for the case of multiple transients of a single window
Comment 9 Elijah Newren 2005-07-11 21:05:05 UTC
Billy: 
If you want the focusing & stacking to rely on map requests, why are you sending
_NET_ACTIVE_WINDOW messages?  Because you send those messages, it is those
messages that are determining the focusing & stacking order instead of stuff
related to mapping or launch times determining the order.

I believe events such as map request and client messages sent by a single client
can execute in a different order due to the asynchronousness of the X
connection, but my knowledge on what X does and doesn't guarantee with respect
to order are fuzzy so let me avoid that and just address the larger issue: 
You're thinking in terms of a single application whereas metacity has to deal
with several apps (and often can't even tell which windows are a part of which app).

An example:
User launches OOo writer.  User waits.  and waits.  User opens terminal. 
Metacity gets map request for terminal and maps it (with focus and on top, of
course).  User starts typing in the terminal.  Metacity finally gets map request
for OOo writer window.  Metacity maps it.  Metacity does NOT raise or focus
it[1] because it was launched long before the last user interaction and to do so
would rudely interrupt the user typing in their terminal.

Another example:
Two apps send _NET_ACTIVE_WINDOW message requests, both with a timestamp of 0. 
When Metacity receives these events, how does it know which one sent it second?
 Or more importantly, how does it know which one corresponds to a later user
interaction time than the other?  This is an inherent race condition.  (See bug
152000 for all kinds of similar examples that used to happen, e.g. when closing
one window and then quickly clicking in a different one that wouldn't normally
be the next to receive focus)

Does that clear things up?  Also, I'm still curious about my previous
question--namely, why do you need to send a _NET_ACTIVE_WINDOW message?

[1] I just tried this.  Apparently this _NET_ACTIVE_WINDOW message with a 0
timestamp is a more widespread problem because OOo sends such a message
immediately after the window is mapped.  That totally breaks focus stealing
prevention and causes the OOo writer window to RUDELY interrupt what I was in
the middle of doing and steal focus.  That really, really sucks.
Comment 10 Billy Biggs 2005-07-11 21:35:08 UTC
I believe that if there are N windows, all of which are transients of the same
window, their stacking order should depend on the order in which the map
requests are received by the WM.

This is important when an application opens multiple modal dialogs.  You never,
ever want a dialog that is blocking an application to be obscured (and you can
never change the stacking order once it appears).  I do not see how the map
requests from a single application could ever get out of order.

I agree that in the case of multiple applications, things are different, but for
dialogs I believe the rules are clear.

SWT uses gdk_window_focus() (_NET_ACTIVE_WINDOW) in a few cases, most notably in
the implementation of setFocus() on a widget (if one our windows has focus, and
the app has requested that focus go to a widget in a different window, we call
gdk_window_focus()).  It's called when dialogs are opened as a side-effect of
this: the window opens and we request a control in that window get focus, and we
haven't got the event from the WM to know that focus is in that window yet.
Comment 11 Elijah Newren 2005-07-12 05:06:24 UTC
Okay, so after talking with Billy in IRC, I think I understand the issue a
little bit better.  There's also a third bug in addition to the two I discussed
at length above.  The testcase does not exhibit this third bug because of two
important differences between it and his actual problem: (1) in the testcase the
second window is not modal to the first one, (2) there's an actual pause between
the appearance of the first and second modal dialogs (the pause is accidental
and is due to the application just having lots of work to do or something).

So, I'll try to describe the third bug in detail.  Three windows are involved
(A, B, and C) and two timestamps (X and Y).  Here's how the bug is triggered:
  - User action occurs in Window A at timestamp X.
  - dialog B (which is modal to A) is launched as a result of user's
    interaction with Window A (i.e. launchtime of B is X)
  - metacity compares B's launchtime (X) to user_time of A (the
    currently focused window), which is X, so dialog B takes focus.
  - User clicks on dialog B (updating B's timestamp to Y, where Y > X)
  - dialog C (which is modal to both A and B but metacity can only detect that
    it's modal to A) is launched also as a result of user's interaction with
    window A (i.e. launchtime is also X)
  - metacity compares C's launchtime (X) to user_time of B (which is
    Y).  X < Y and C is not detected as being a transient of B, so C does
    not get focus and is stacked behind B.
  - dialog C is smaller in size than dialog B, thus the user can only see
    window A and dialog B.  User's interactions with either will be
    no-ops because C is modal to both.  User thinks application is
    frozen.  User eventually manually kills the app, unless they fortuitously
    move dialog B out of the way of C first and thus discover that C exists.

The big problem here is that metacity simply can't detect that C is modal to
both A and B because Eclipse sets C's WM_TRANSIENT_FOR property to A.  I believe
that's a violation of the EWMH spec[1], though Metacity doesn't support that
part of the spec anyway.  I believe that marking C as a transient for B may
workaround this specific case (though I've never tested
grandparent->parent->child chaining of modality), but I don't know if that's
exactly right and it's skirting the bigger problem that metacity (and gtk+ for
that matter) don't really support app/group-modal windows.  So, we need to add
such support.

Note that if Metacity could detect that C was modal to B, then the changes in
bug 164716 and bug 151996 would kick in and fix this issue.  Bug 126489 is
already filed about that, though it's looking at it from a slightly different
(and less urgent) angle.  Note that bug 126489 links to useful info, in
particular bug bug 109638 and bug 124525.

[1] "_NET_WM_STATE_MODAL indicates that this is a modal dialog box.  If the
WM_TRANSIENT_FOR hint is set to another toplevel window, the dialog is modal for
that window; if WM_TRANSIENT_FOR is not set or set to the root window the dialog
is modal for its window group."  So Eclipse isn't correctly hinting that the
final window is app/group-modal, probably due to the fact that gtk+ doesn't
provide support for it (bug 124525) but also because we don't have any checks
for it (this bug or bug 126489, I guess).
Comment 12 Elijah Newren 2005-07-17 00:47:57 UTC
Okay, so what I said in comment 11 is still valid and we do need to fix the
buggy handling of modal dialogs in metacity and gtk+ and eclipse.  However,
given that gtk+ itself almost enforces buggy behavior (it relies on the app to
set the transient_for hint, only provides a way to hint about parent-modal, and
only provides a way to do group-modal[1], we really need a workaround.

Luckily, I figured out how to provide a workaround that isn't too difficult.  In
bug 166524 we decided to try to avoid having denied-focus windows be obscured by
the focus window.  But we only did that for new windows that went through the
normal algorithm.  We can use the same basic idea with a few restrictions to
avoid dialogs like these being lost.  I'll attach a patch in a minute that does
this.


[1] Okay, technically gtk+ does provide a way of putting windows into new groups
so app authors could make it so that group-modal and parent-modal happen to be
the same in some circumstances (no window can be a member of more than one
group), but that's a royal pain even when it would work.
Comment 13 Elijah Newren 2005-07-17 00:52:10 UTC
Created attachment 49307 [details] [review]
Avoid obscuring additional modal dialogs

I'm a little unsure about my requirement of modality.  It may make sense to
remove that and avoid the obscuring for all transients.

This patch actually fixes a secondary issue while I was at it--I accidentally
assumed workarea.x == 0 && 0 == workarea.y in place.c:find_most_freespace,
which would probably break the function on xinerama.

(Note that the first patch in this bug is still relevant and fixes a separate
issue)
Comment 14 Havoc Pennington 2005-07-24 05:00:35 UTC
Comment on attachment 49307 [details] [review]
Avoid obscuring additional modal dialogs

Now that one is scary...

thanks
Comment 15 Elijah Newren 2005-07-25 04:57:59 UTC
Okay, I committed both patches.  This should provide a fairly good workaround
for this problem.  It'd still be nice to have it fixed completely, but we need
gtk+ to get fixed as well as metacity and the metacity portion of the real fix
is already filed as bug 126489 so I'm going to mark as a duplicate.

2005-07-24  Elijah Newren  <newren@gmail.com>

	* src/place.c (find_most_freespace): try to place windows denied
	focus near the focus window and fix a xinerama bug with the
	placement, (avoid_being_obscured_as_a_second_modal_dialog): avoid
	modal dialogs being obscured in somewhat pathologically strange
	circumstances that Eclipse seems to be good at triggering,
	(meta_window_place): have dialog windows make use of
	avoid_being_obscured_as_a_second_modal_dialog().  Fixes one of the
	issues found in #307875.

2005-07-24  Elijah Newren  <newren@gmail.com>

	* src/window.c (meta_window_raise): raise the window as well as
	its ancestor; fixes a stacking bug with an ancestor that has more
	than one child window.  Fixes one of the issues in #307875.

*** This bug has been marked as a duplicate of 126489 ***