GNOME Bugzilla – Bug 307875
App-modal dialogs need to be supported so they don't pop behind and become lost (+ some other stuff)
Last modified: 2005-07-25 04:57:59 UTC
metacity 2.10.1-2 (Debian unstable) and metacity 2.10.0-1 (Fedora Core 4) If an application opens two modal dialogs in sequence, D1 and then D2, sometimes when D2 opens it appears below D1 in the stacking order (although it does gain focus). This behaviour is new in metacity 2.10 and is not reproducable with earlier versions. Both D1 and D2 are marked as transients for the main window, and both have the dialog type hint set. The problem can be reproduced in eclipse by selecting a file in the resource navigator, hitting Ctrl+C to copy, and Ctrl+V to paste. Two dialogs appear, the first is a progress dialog, the second is a name conflict dialog requesting user input. It always opens below the progress dialog, obscured from view, and cannot be raised above it. I can reproduce this in a small SWT application by simply creating two dialogs as children of a main window. The application is attached, to run it using gij, use: gij-4.0 --cp . -Djava.library.path=. TwoModalDialogs With the small example, it is clear that this is a race. The problem does not happen all the time. However, the race does not depend on when the second dialog is opened. Even if I do this: open 1st dialog flush event loop sleep for 1 second flush event loop open 2nd dialog It can still happen that sometimes the second dialog appears below the first. I have so far been unable to construct a small GTK+ example, nor have I been able to conclusively isolate what triggers it in the SWT code.
Created attachment 47846 [details] focustest.tar.gz
Based on some more experiments, my best guess for what is going on is that if a window gets a _NET_ACTIVE_WINDOW request with a timestamp newer than the last user interaction during open, it appears below the currently active window in the stacking order.
Created attachment 47881 [details] multiplemodal.c Here is a small GTK+ application which consistently reproduces the problem for me.
I think we have some existing discussion on this topic, either on a bug or wm-spec-list. I don't remember the details.
I searched around on both but found nothing relevant. There's clearly a bug here if the calls to focus the windows modify the stacking order.
Okay, I found some time to finally look at this bug. Found the issue. Well, two issues. One in Metacity, but you should also note that your testcase itself is buggy. The metacity issue here is bug 166894; in order to keep windows stacked together, we raise a window's ancestor instead of raising the window itself. But I didn't think about multiple children. I think we can fix this by simply raising both the ancestor and the child. About the bug in the testcase: Sending a _NET_ACTIVE_WINDOW message with a timestamp of 0 is inherently broken; we support it as best we can for legacy applications but it will cause bugs. But do you even need to send a _NET_ACTIVE_WINDOW message? Newly mapped windows don't need activation messages as they are activated by default. (If you are doing it because windows aren't being properly focused due to gtk+ using incorrect timestamps, you should instead use the gtk+ functions to correctly set _NET_WM_USER_TIME before mapping). Your original comment suggests that the situation for your real code is for newly mapped windows as well (at least, as far as I could tell), but if it's for some other situation were you really do want to send activation messages then you need to get a valid timestamp. gdk_x11_display_get_user_time() will most likely be of assistance here (it returns the most recent user interaction with any windows of your application; if that's the user interaction that caused the activation message to be sent, then it's what you need; if you have one of the other really rare cases, you'll need to get a timestamp from the relevant user interaction event). WARNING: REALLY VERBOSE DETAILS FOLLOW ABOUT TIMESTAMP 0 BEING BAD WITH _NET_ACTIVE_WINDOW In case you're curious, here's a detailed history of _NET_ACTIVE_WINDOW messages with timestamp of 0 and why it's buggy (lots of detail here so that I can just refer to this comment later when this same issue comes up): _NET_ACTIVE_WINDOW was originally added to the EWMH spec without a timestamp parameter (oops!). The EWMH declares that all unused fields in a client message should be 0. Thus, if Metacity gets a _NET_ACTIVE_WINDOW message with a timestamp of 0, it can't tell whether it came from a legacy client using the old version of the spec or is a client using the new version of the spec (with an invalid timestamp). For backwards compatibility, it must treat this as "please raise and focus me". But now there's a dilemma: Metacity has no clue what timestamp caused the window to be activated. That's bad. It means that it's impossible for Metacity to provide correct behavior in all circumstances and it just has to take a wild guess. This is a problem that is definitely accentuated by focus stealing prevention (since it relies more heavily on having correct timestamps), but it is not new to 2.10.x; see bug 152000 for all kinds of race conditions caused by missing timestamps. IF we didn't have to worry about legacy apps, we could solve this problem by either ignoring such requests ("obviously this window requested to be activated at some time in the ancient past and we missed the request until now; no point in activating since the user has done lots of stuff in the mean time") or treat them as requests to lower/defocus the window. Here's a step-by-step explanation of events that occur in this case, which sort of illustrate why it's problematic: 1) User types in the main window of multiplemodal testcase program at xserver timestamp A. 2) Gtk+ receives keystroke event and sets _NET_WM_USER_TIME of the main window to A. 2) Program launches new modal dialog (gtk+ sets _NET_WM_USER_TIME of new dialog to A, as it should) 3) Metacity gets new map request, checks the timestamp (A), compares it to the current window with focus (the main window, which also has timestamp A), and thus places the new window on top and with focus 4) Program sends a _NET_ACTIVE_WINDOW message for this modal dialog with timestamp 0. 5) Metacity receives the activate request, notices that the timestamp is 0, pings the xserver for a timestamp which will return B (where B is later than A), and sets the _NET_WM_USER_TIME of the modal dialog to B. 6) Program launches second modal dialog (gtk+ sets _NET_WM_USER_TIME of second modal dialog to A, as it should) 7) Metacity gets new map request, checks the timestamp (A) compares it to the current window with focus (the first modal dialog, which has timestamp B), notices that "the user has interacted with modal-dialog-1 after having launched modal-dialog-2" and thus does not focus the window and stacks it below modal dialog 1 so that the user can "continue" interacting with that window. 8) Program sends a _NET_ACTIVE_WINDOW message for modal dialog 2 with timestamp 0. 9) Metacity receives the activate request, notices that the timestamp is 0, pings the xserver for a timestamp which will return C (where C is later than both B and A), raises the second modal dialog (well, it's ancestor which doesn't quite work here) and focuses it and sets its _NET_WM_USER_TIME to C. Note that although you sent a _NET_ACTIVE_WINDOW message for the second dialog, others might not and they'd get the wrong focus/stacking. Also, in your case, if the user manages to type in another application while waiting for the window to appear (which is feasible if the app is running from a remote machine), then your activate request would erroneously treat the newly created window as having been interacted with more recently than the application the user is actually typing in, causing your window to rudely pop up on top and steal focus, likely sending keystrokes to the wrong place. Also, you are probably noticing races because although the application can send xevents and messages in a certain order, Metacity receives them asynchronously. Without a timestamp to differentiate, Metacity doesn't know which activate message was actually first. (Yes, there is a related problem that is inherently problematic--namely, if a single user action launches two windows then their _NET_WM_USER_TIME values will be the same and if the mapping request events are generated out of order Metacity won't be able to tell. I don't know a good way to fix this for Metacity. But your example makes it more of an issue since you have two map requests and two activate window messages, allowing for more things to go out of order)
Thanks for looking into this. I'm still confused -- why does the stacking order depend on any timestamps? If an application window creates two dialogs, I think the stacking order should depend entirely on the order of the map requests, regardless of any user interaction or focus messages. In what case would the map requests be out of order?
Created attachment 48977 [details] [review] Fix meta_window_raise for the case of multiple transients of a single window
Billy: If you want the focusing & stacking to rely on map requests, why are you sending _NET_ACTIVE_WINDOW messages? Because you send those messages, it is those messages that are determining the focusing & stacking order instead of stuff related to mapping or launch times determining the order. I believe events such as map request and client messages sent by a single client can execute in a different order due to the asynchronousness of the X connection, but my knowledge on what X does and doesn't guarantee with respect to order are fuzzy so let me avoid that and just address the larger issue: You're thinking in terms of a single application whereas metacity has to deal with several apps (and often can't even tell which windows are a part of which app). An example: User launches OOo writer. User waits. and waits. User opens terminal. Metacity gets map request for terminal and maps it (with focus and on top, of course). User starts typing in the terminal. Metacity finally gets map request for OOo writer window. Metacity maps it. Metacity does NOT raise or focus it[1] because it was launched long before the last user interaction and to do so would rudely interrupt the user typing in their terminal. Another example: Two apps send _NET_ACTIVE_WINDOW message requests, both with a timestamp of 0. When Metacity receives these events, how does it know which one sent it second? Or more importantly, how does it know which one corresponds to a later user interaction time than the other? This is an inherent race condition. (See bug 152000 for all kinds of similar examples that used to happen, e.g. when closing one window and then quickly clicking in a different one that wouldn't normally be the next to receive focus) Does that clear things up? Also, I'm still curious about my previous question--namely, why do you need to send a _NET_ACTIVE_WINDOW message? [1] I just tried this. Apparently this _NET_ACTIVE_WINDOW message with a 0 timestamp is a more widespread problem because OOo sends such a message immediately after the window is mapped. That totally breaks focus stealing prevention and causes the OOo writer window to RUDELY interrupt what I was in the middle of doing and steal focus. That really, really sucks.
I believe that if there are N windows, all of which are transients of the same window, their stacking order should depend on the order in which the map requests are received by the WM. This is important when an application opens multiple modal dialogs. You never, ever want a dialog that is blocking an application to be obscured (and you can never change the stacking order once it appears). I do not see how the map requests from a single application could ever get out of order. I agree that in the case of multiple applications, things are different, but for dialogs I believe the rules are clear. SWT uses gdk_window_focus() (_NET_ACTIVE_WINDOW) in a few cases, most notably in the implementation of setFocus() on a widget (if one our windows has focus, and the app has requested that focus go to a widget in a different window, we call gdk_window_focus()). It's called when dialogs are opened as a side-effect of this: the window opens and we request a control in that window get focus, and we haven't got the event from the WM to know that focus is in that window yet.
Okay, so after talking with Billy in IRC, I think I understand the issue a little bit better. There's also a third bug in addition to the two I discussed at length above. The testcase does not exhibit this third bug because of two important differences between it and his actual problem: (1) in the testcase the second window is not modal to the first one, (2) there's an actual pause between the appearance of the first and second modal dialogs (the pause is accidental and is due to the application just having lots of work to do or something). So, I'll try to describe the third bug in detail. Three windows are involved (A, B, and C) and two timestamps (X and Y). Here's how the bug is triggered: - User action occurs in Window A at timestamp X. - dialog B (which is modal to A) is launched as a result of user's interaction with Window A (i.e. launchtime of B is X) - metacity compares B's launchtime (X) to user_time of A (the currently focused window), which is X, so dialog B takes focus. - User clicks on dialog B (updating B's timestamp to Y, where Y > X) - dialog C (which is modal to both A and B but metacity can only detect that it's modal to A) is launched also as a result of user's interaction with window A (i.e. launchtime is also X) - metacity compares C's launchtime (X) to user_time of B (which is Y). X < Y and C is not detected as being a transient of B, so C does not get focus and is stacked behind B. - dialog C is smaller in size than dialog B, thus the user can only see window A and dialog B. User's interactions with either will be no-ops because C is modal to both. User thinks application is frozen. User eventually manually kills the app, unless they fortuitously move dialog B out of the way of C first and thus discover that C exists. The big problem here is that metacity simply can't detect that C is modal to both A and B because Eclipse sets C's WM_TRANSIENT_FOR property to A. I believe that's a violation of the EWMH spec[1], though Metacity doesn't support that part of the spec anyway. I believe that marking C as a transient for B may workaround this specific case (though I've never tested grandparent->parent->child chaining of modality), but I don't know if that's exactly right and it's skirting the bigger problem that metacity (and gtk+ for that matter) don't really support app/group-modal windows. So, we need to add such support. Note that if Metacity could detect that C was modal to B, then the changes in bug 164716 and bug 151996 would kick in and fix this issue. Bug 126489 is already filed about that, though it's looking at it from a slightly different (and less urgent) angle. Note that bug 126489 links to useful info, in particular bug bug 109638 and bug 124525. [1] "_NET_WM_STATE_MODAL indicates that this is a modal dialog box. If the WM_TRANSIENT_FOR hint is set to another toplevel window, the dialog is modal for that window; if WM_TRANSIENT_FOR is not set or set to the root window the dialog is modal for its window group." So Eclipse isn't correctly hinting that the final window is app/group-modal, probably due to the fact that gtk+ doesn't provide support for it (bug 124525) but also because we don't have any checks for it (this bug or bug 126489, I guess).
Okay, so what I said in comment 11 is still valid and we do need to fix the buggy handling of modal dialogs in metacity and gtk+ and eclipse. However, given that gtk+ itself almost enforces buggy behavior (it relies on the app to set the transient_for hint, only provides a way to hint about parent-modal, and only provides a way to do group-modal[1], we really need a workaround. Luckily, I figured out how to provide a workaround that isn't too difficult. In bug 166524 we decided to try to avoid having denied-focus windows be obscured by the focus window. But we only did that for new windows that went through the normal algorithm. We can use the same basic idea with a few restrictions to avoid dialogs like these being lost. I'll attach a patch in a minute that does this. [1] Okay, technically gtk+ does provide a way of putting windows into new groups so app authors could make it so that group-modal and parent-modal happen to be the same in some circumstances (no window can be a member of more than one group), but that's a royal pain even when it would work.
Created attachment 49307 [details] [review] Avoid obscuring additional modal dialogs I'm a little unsure about my requirement of modality. It may make sense to remove that and avoid the obscuring for all transients. This patch actually fixes a secondary issue while I was at it--I accidentally assumed workarea.x == 0 && 0 == workarea.y in place.c:find_most_freespace, which would probably break the function on xinerama. (Note that the first patch in this bug is still relevant and fixes a separate issue)
Comment on attachment 49307 [details] [review] Avoid obscuring additional modal dialogs Now that one is scary... thanks
Okay, I committed both patches. This should provide a fairly good workaround for this problem. It'd still be nice to have it fixed completely, but we need gtk+ to get fixed as well as metacity and the metacity portion of the real fix is already filed as bug 126489 so I'm going to mark as a duplicate. 2005-07-24 Elijah Newren <newren@gmail.com> * src/place.c (find_most_freespace): try to place windows denied focus near the focus window and fix a xinerama bug with the placement, (avoid_being_obscured_as_a_second_modal_dialog): avoid modal dialogs being obscured in somewhat pathologically strange circumstances that Eclipse seems to be good at triggering, (meta_window_place): have dialog windows make use of avoid_being_obscured_as_a_second_modal_dialog(). Fixes one of the issues found in #307875. 2005-07-24 Elijah Newren <newren@gmail.com> * src/window.c (meta_window_raise): raise the window as well as its ancestor; fixes a stacking bug with an ancestor that has more than one child window. Fixes one of the issues in #307875. *** This bug has been marked as a duplicate of 126489 ***