GNOME Bugzilla – Bug 392889
Metacity freezes using emacs with accessibility enabled
Last modified: 2020-11-06 20:07:50 UTC
Please describe the problem: ('frame' in Emacs means 'window' in WM) In Emacs, fucntion 'raise-frame' supposed to bring the window to the front failed in metacity but succeeded in many other WMs. The developers of Emacs has been creating workarounds in the past. But now it failed again in metacity 2.16. They are about to give up on this. Could you look at this issue? The discussion starts here and The suspicious code is posted. http://permalink.gmane.org/gmane.emacs.pretest.bugs/16379 http://permalink.gmane.org/gmane.emacs.pretest.bugs/16385 Steps to reproduce: 1. Start Emacs 2. M-x server-start 3. In shell, emacsclient -e "(raise-frame)" Actual results: Emacs window keep flashing in the Taskbar Expected results: Emacs window should be raised. Does this happen every time? Yes Other information: This is tested for Emacs 22.0.92. raise-frame in Emacs 21 might fail too.
I don't know if it's the problem, but the timestamp sent by that Emacs code is gibberish, which could break something even if it isn't the issue here. (Assuming I understand the Emacs code.) I don't believe raise-frame is intended to unconditionally work in metacity, btw. This is legitimate window manager behavior and no spec requires that the WM unconditionally honors a raise request.
Emacs first sent timestamp 0, but metacity complained about that, so we tried something else. The timestamp is not clearly defined in the extended window manager hints specification. Note that Emacs in the current CVS (which will be released soon) does not send _NET_ACTVATE_WINDOW (the code is commented out) as some Metacity version hanged when we did that. So currently there is only a call to XRaiseWindow/XFlush. Is there a way to make Metacity raise windows or shall Emacs just document the fact that Metacity does not honor XRaiseWindow?
(In reply to comment #1) > I don't believe raise-frame is intended to unconditionally work in metacity, > btw. This is legitimate window manager behavior and no spec requires that the > WM unconditionally honors a raise request. Are you sure this is a deliberate decision by Metacity rather than a bug? I understand that a window manager is free to react any way it wants to, but if an application calls XRaiseWindow and the window manager does not raise the corresponding window, it should have a good reason for it. So could you explain why Metacity refuses to raise the window in our case? After all, to the OP, this behavior looked like a bug.
IIRC the current draft spec says the timestamp should be the USER_TIME hint set on the window; but if unspecified in EWMH, what's intended is always the timestamp of the button or key event the app is currently processing (i.e. the event that caused the behavior you are asking for). If you don't have an event, the right way to get the timestamp is to set a property to get a PropertyNotify event and use the timestamp from the property notify event. There are little utility functions to do that floating around in various codebases including gtk and metacity. Emacs is probably already setting the correct timestamp when doing things like selection/clipboard handling, so perhaps some code can be copied from there. The purpose of this is to avoid race conditions by always ordering effects in the same order as their causes, i.e. if the user clicks in two places to raise two windows, the window clicked second should always "win" (a common case of this is launching an app, then clicking back on another app). Or when setting the clipboard contents, the most recently copied needs to win, which isn't guaranteed without the timestamps. Re: why metacity ignores some window raises; I am not up on the current code, but one reason it might happen is "focus stealing prevention" which is essentially that if you click in a way that cause a raise, then focus another app before the raise occurs, then the raise will not occur. Again a common case is to launch a new app, then click back on an old app. Often metacity will set a hint causing the taskbar button to flash, instead of making the window active, if it believes the user is busy right now. I think I saw on another bug that in mouse focus modes, raises may be disabled completely, but I don't know the details of that or why. Elijah will know. btw the panel pager and many apps use _NET_ACTIVE_WINDOW, so it is widely used; I would be surprised if it hangs in any widely-deployed version of metacity, unless the hang depends on some other less-common aspect of the situation. This would be good to debug. Finally, XRaiseWindow() and _NET_ACTIVE_WINDOW are different things with different goals; _NET_ACTIVE_WINDOW is saying "this window wants to be in the foreground and have focus" and is a semi-semantic hint; XRaiseWindow() is harder for metacity to interpret (should the window also be focused? what is the intent?). In click to focus mode for example, the user-expected invariant is that raised to top and focused almost always correspond with one another. In any case, it would be a good idea to define which of these the Emacs function is intended to correspond to - is the Emacs function defined to be XRaiseWindow() or defined to be "make this the active window"
About timestamps. The raise-frame is a Lisp function so it may not be associated with an X event at all. But we can fix that if we need it. Emacs just want to do XRaiseFrame in this case, i.e. focus goes wherever the window manager puts it. There is an obsolete function focus-frame to set focus. But this currently doesn't do anything because focus is controlled by the window manager and apps shouldn't try to adjust focus.
If a WM allows any XRaiseWindow() unmodified, it can result in broken invariants; for example, in the normal, default click to focus mode, it is an invariant that the on top window is also focused (setting aside special windows such as the panel). So it should not be allowed to raise a window without making it the active window. If raise-frame is defined as simply XRaiseWindow(), most likely Emacs should have some higher-level wrapper functions that Lisp code should use depending on the intent. For example, GTK+ has a gtk_window_present() that's intended for use when a window should be brought to the user's attention, for example because the user re-selected the Preferences menu item and the Preferences dialog is already open. Then you would gtk_window_present() the Preferences window. gtk_window_present() uses _NET_ACTIVE_WINDOW and if the user has typed in another window since the timestamp in the _NET_ACTIVE_WINDOW, the window won't be focused but its taskbar icon may blink or equivalent. However, a different situation would be writing some sort of special application, say some kind of dock or sidebar; in that case a programmer might need to use XRaiseWindow() specifically to achieve a desired effect, and semantically the XRaiseWindow() might not be intended as a "set the active window" request. I am having trouble even thinking of an example here, in general I would say XRaiseWindow() on toplevel managed windows should be rarely useful. The recommended thing is to be sure the Emacs Lisp API allows programs to indicate their intent - is it an "activate window due to some user interaction" request or a "call this Xlib function" request. If it is a _NET_ACTIVE_WINDOW / activate window request, the timestamp is essential to avoid bugs if there is an associated event. If there really is no event (for example, the window is activated by a timer function), then the desirable behavior is usually to only flash the taskbar icon and not in fact raise the window. Anyway, for more details you probably want Elijah and not me, since I'm too lazy to go unpack what the current code does ;-) Incidentally you should be aware of how to check for EWMH hint support if you haven't noticed that part of the spec; GTK+ has gdk_net_wm_supports() which implements this. You can check gdk_net_wm_supports("_NET_ACTIVE_WINDOW") which should allow you to decide whether to use _NET_ACTIVE_WINDOW or fall back to XRaiseWindow() for ancient WMs. btw, If there are versions of metacity which hang when using the hints correctly, my advice to you would be to just use the hints correctly and let the major distributions push a metacity update. If everyone tries to work around everyone else it tends to go in circles, while if everyone just follows the spec things reach a steady state where it all works.
(In reply to comment #6) > If a WM allows any XRaiseWindow() unmodified, it can result in broken > invariants; Obviously. This is not what is being asked. > such as the panel). So it should not be allowed to raise a window without > making it the active window. There are many options: for example, it could raise the window somewhat but still keep it lower than the current active window. > If raise-frame is defined as simply XRaiseWindow(), most likely Emacs should > have some higher-level wrapper functions that Lisp code should use depending on > the intent. AFAIK the intent is pretty clear: make the window more visible by bringing it further up in the stack of windows. > I am having trouble even thinking of an example here, in > general I would say XRaiseWindow() on toplevel managed windows should be rarely > useful. Lack of imagination. Think of applications with multiple toplevel managed windows, where a command in one window has as its main effect a change in another window. > The recommended thing is to be sure the Emacs Lisp API allows programs to > indicate their intent - is it an "activate window due to some user interaction" > request or a "call this Xlib function" request. XRaiseWindow does not have anything to do with focus, only with the stacking order. It only means "make the window less-occluded". Emacs indeed has a problem here because it doesn't have a working "focus-frame" function for those cases where you want the focus to jump from one window to another. I guess we could implement it with _NET_ACTIVE_WINDOW. It seems the main problem is that Metacity may behave poorly with apps that are not specifically re-written for EWMH. But the "misbehavior" is apparently intentional.
Warning: *Really* verbose post follows. (In reply to comment #2) > Note that Emacs in the current CVS (which will be released soon) does not > send _NET_ACTVATE_WINDOW (the code is commented out) as some Metacity > version hanged when we did that. By "hang" do you mean completely unresponsive, or refuses to transfer focus? (If the users clicked on the X in the upper right corner of the frame of a window, would it close? If some application tried to open a new window, would it show?) I'm guessing the latter, in particular due to the fact that you used something random for the timestamp. We didn't initially think to have Metacity check for apps sending random values for the timestamp and try to workaround such bugs; in the case of _NET_ACTIVE_WINDOW this could result in metacity thinking some window had most recently been activated up to about 25 days in the future. This resulted in subsequent activation requests (_NET_ACTIVE_WINDOW messages or clicking in windows) being treated as "too old" and being ignored. No such problems should be observed in any metacity version if you set the timestamp properly. (If it really was a hard hang you observed by using _NET_ACTIVE_WINDOW, I'd love to know as I had never heard of any such problems) (In reply to comment #3) > I understand that a window manager is free to react any way it wants to, but > if an application calls XRaiseWindow and the window manager does not raise > the corresponding window, it should have a good reason for it. So could you > explain why Metacity refuses to raise the window in our case? After all, to > the OP, this behavior looked like a bug. Very Basic Explanation of why (more detailed explanation comes much later): We'd love to ignore XRaiseWindow requests unconditionally; it's just bad API for modern window managers since it simply does not convey enough information (see Havoc's response plus extra details below). That probably isn't realistic right now (maybe in 10 years?) so metacity honors XRaiseWindow if the window being raised is the currently active window (or in the same "application" as defined by group leader hints or the inactive window somehow has a newer user_time hint than the active one). If the window attempting to be raised isn't the active window, why would we raise it? That probably corresponds to either (1) a request getting to us after the user has already switched to another application or (2) braindead apps like old versions of mozilla or eclipse that allow their windows to randomly be raised when the user didn't request it and thus interrupting what the user is busy doing. What to do about it: If you have one of the exceptions that don't fall into the above list and for which something really should be done, there is almost certainly a better "API" for it elsewhere. For example, if you have one application trying to activate a different one (e.g. the user clicking a url in an email client and the email client wanting to activate the web browser) then XRaiseWindow() is insufficient anyway and the app should be using something like _NET_ACTIVE_WINDOW instead. If there isn't a better "API" elsewhere, we should create it. (In reply to comment #4) > IIRC the current draft spec says the timestamp should be the USER_TIME hint > set on the window; but if unspecified in EWMH, what's intended is always the > timestamp of the button or key event the app is currently processing (i.e. > the event that caused the behavior you are asking for). Not quite; here's how I'd put it: The timestamp passed should be the timestamp of the event that caused the requested behavior, or as a fallback, the last time the user interacted (clicked/typed) with the application. Nasty technical details explaining this wording: Matthias added something to the spec so the user time hint isn't actually set on the toplevel window but rather a separate unshown xwindow, so USER_TIME may not be set on the window but some other window. Further, the user time may not actually be updated for either of these windows if the application is using some ancient or one-off or not-up-to-date toolkit, so you wouldn't want either of these anyway. The current event being processed is not necessarily the event that caused the behavior being asked for. If these two are different, you always want the timestamp of the event that caused the behavior to be requested rather than the current event being processed. 99.9% of the time, the timestamp of the event that caused the behavior to be requested is the last x timestamp that the user interacted with the application. > I think I saw on another bug that in mouse focus modes, raises may be disabled > completely, but I don't know the details of that or why. Elijah will know. I really need to update the description of !raise_on_click, yes. All app authors need to know is that they can and should ignore this setting, and tell users its their fault if they don't like the behavior of that setting. (I'm not kidding about that.) Nasty details about how I'd update the wording: The main thing to add is that app authors should ignore the raise_on_click setting (assuming it is always true) and tell users they misconfigured their window manager if they modify it and don't like any resulting behavior. !raise_on_click means windows are only raised if (1) the window was just barely launched, (2) the user manually raises the window directly (clicks on frame, alt-tabs to it, or alt-clicks on window), or (3) a pager (tasklist applet, window selector applet, etc.) requests the window to be activated. In particular, !raise_on_click means to specifically ignore any programmatic requests from normal applications to raise windows under _any_ conditions or as side-effects of _any_ other actions. If users don't like that behavior, they shouldn't use that pref and should use the default raise_on_click behavior instead. (In reply to comment #5) > About timestamps. The raise-frame is a Lisp function so it may not be > associated with an X event at all. But we can fix that if we need it. > > Emacs just want to do XRaiseFrame in this case, i.e. focus goes wherever the > window manager puts it. Havoc already responded to this in terms of focus, but there are lots of extra cases too. So, Havoc's response combined with the following is the more detailed version of why we *can't* make XRaiseWindow() work right in all cases: If the window is minimized, should it be unminimized? Should the request be ignored? Should the stacking be changed such that when it is unminimized it has been raised relative to other windows (assuming there's a way to unminimize without simultaneously raising, of course)? If the desktop is in show-desktop mode, should we exit? Should the request be ignored? Should the request be noted and the window's position modified relative to the others once the show-desktop mode is left? If the window is on another workspace and we get a raise request, should we switch the window to the current workspace? Raise it relative to the other windows on its own workspace but leave it there? Ignore the request? If there is a different fullscreen window at the time, should raising the window bring it above the fullscreen window? We currently only do so if the focus switches from the fullscreen window to another window (partially dropping the fullscreen window out of fullscreen mode). Is that what is really wanted? There may be other cases I'm not thinking of. In addition, future WMs may have other states/abilities adding other difficulties to this question. XRaiseWindow is inherently problematic because of all these issues. We have no clue about what is proper for each case and receive no hints from applications on the matter. We already know that one size does not fit all; some apps use the hint to mean "activate this window", while others use it to mean "this window needs attention from the user." In fact, I currently see no reason apps (or pagers) should use stacking requests. It makes sense for me for apps to try to "activate" a window, which is what _NET_ACTIVE_WINDOW exists for. They could try to request attention through _NET_WM_DEMANDS_ATTENTION. If there are other _good_ use cases, there should either be a EWMH hint or we should use one. Hope that helps.
> It seems the main problem is that Metacity may behave poorly with apps that > are not specifically re-written for EWMH. I would put it differently; metacity tries to put all actions/behaviors into a coherent overall scheme for "how the desktop works" - for pre-EWMH apps, sometimes it does not have as much information as it does for EWMH apps, but it still makes a best effort to figure out the best thing to do. This is the technical definition of a WM after all, it's the single X client that registers to intercept and reinterpret X requests for child windows of the root window in order to impose a window management policy. Metacity (like several other WMs and the EWMH) tends toward a "coherent overall scheme" that is somewhat more Windows-like than traditional-fvwm/twm-like. But that is why the WM is pluggable, so people can choose the WM they like. I would consider e.g. the "can't raise without focusing" invariant to be a legitimate behavior that metacity users might prefer. But other users of other WMs might not. There's no reason for Emacs to worry about this policy one way or the other; the important thing in Emacs (or any app) is to convey as much information as possible. For example, if the window should be "presented" use _NET_ACTIVE_WINDOW, and try to use accurate timestamps whenever possible. It is of course possible to write a WM that only draws decorations, and does not ever second-guess application requests. I don't know of such a WM though; perhaps twm comes close, but even twm and other old WMs will veto apps from time to time. In any case, again you really don't have to worry about it as an app author. If metacity hangs, or does something annoying, as long as you've provided accurate information according to the specs, you can simply refer users to the WM developers for bug reports.
In comment 6: You can check gdk_net_wm_supports("_NET_ACTIVE_WINDOW") which should allow you to decide whether to use _NET_ACTIVE_WINDOW or fall back to XRaiseWindow() for ancient WMs. As Emacs supports multiple toolkits, we can not use gdk_net_wm_supports except for the Gtk+ port. But we can do the same logic. In comment 8: By "hang" do you mean completely unresponsive, or refuses to transfer focus? (If the users clicked on the X in the upper right corner of the frame of a window, would it close? If some application tried to open a new window, would it show?) I'm guessing the latter, in particular due to the fact that you used something random for the timestamp. The original report is quoted here: Essentially, metacity grabs the mouse then stops, waiting for something; the only way to get my desktop back is to restart metacity from somewhere else. Later we changed how Emacs sends _NET_ACTIVATE_WINDOW a bit, and the a different behaviour was seen: - With CVS grabbed in the morning (US/Mountain) of November 30: locks up maybe one time in five - but still definitely locks up. It is, however, easier to unwedge: for whatever reason, switching to another virtual console and back shakes things loose. The version was metacity-2.17.2-1.fc7 At this point we descided to remove _NET_ACTIVATE_WINDOW. But from your suggestions I think we will re-enable it with correct timestamp (last X interaction).
Metacity is consistently locking up for me with CVS emacs. I can reproduce the hang when I either try to resize the window with the mouse, or press Alt+Tab to switch to another window. I can reproduce this very easily with CVS emacs and metacity 2.18.5.
I think it's pretty likely that Rodney's issue is different. I'm experiencing the same and can reproduce, but it doesn't seem to match the prior description. However, the debug spew shows current metacity locking with current emacs at the gtk_widget_show_all() call inside meta_ui_tab_popup_set_showing() (inside metacity/src/tabpopup.c). So, this is some kind of ugly gtk-interaction issue, which sucks, since I'm not so good at debugging the in-process gtk-metacity interaction issues. :-(
Elijah, post a backtrace?
Created attachment 94736 [details] Back trace (2 of them, both 'bt' and 'bt full') Seems to go from gtk_widget_show_all() into the accessibility stack and then into ORBit...and then back to g_main_context_iterate?
The g_main_context_iterate looks like the famous corba reentrancy that causes so many bugs. I don't have an explanation for exactly why it hangs, but anything metacity or gtk does in the main loop could have just happened in the main loop there before the hang. You could probably break on the a11y call in gdb and then step through execution and see what happens, unless it's a time-sensitive race. I think it's a bug in the a11y stack if they make a blocking corba call that runs the main loop when widgets are shown. There's no way apps can be robust against that. I don't know if this is new or not, but especially if it's new it should be reverted immediately. If it isn't new it should still be fixed, along with all similar cases. Running a recursive main loop is equivalent to starting a second thread but not using any mutexes.
The best I can find is bug 329454, but I have no clue if it's really related. Could well be. Adding Bill and Michael to the cc list, since it's corba/accessibility stuff where metacity is hanging. Bill, Michael: Please see comment 11 for steps to get the hang (though Ctrl+Alt+F1 followed by Ctrl+Alt+F7 seems to be golden at unlocking the hang for me), the stacktrace in comment 14 of where the hang occurs, and Havoc's comments immediately above.
OK. I can verify that turning off a11y (which is enabled by default on Ubuntu), makes emacs behave properly again. I'm not sure why only emacs has this problem though.
I think this shows up only with Emacs because Emacs has a quite unusual main loop. That is, it is not the straight gtk main loop, but rather a mixture of gtk, X, lisp and signals. We may miss something gtk does in it main loop because of this. We have had problems with timers with both the Gtk and Xt version of Emacs for example.
But why should emacs having an unusual main loop cause metacity to hang? Metacity doesn't use emacs' main loop; it has it's own. Perhaps the a11y layer is somehow trying to communicate with emacs and blocking on a response? I have no idea. I'm hoping Bill or Michael do.
interesting; the trace in 14 looks fine - metacity is sending an outgoing event. The question is: who is blocking when they receive that & why ? If you can repeat this easily; it'd be great to have a stack trace set from: metacity, at-spi-registryd, emacs to see where the deadlock lies; thanks.
*nudge* Is there any progress on this? This seems to have broken a recent update of fedora 8. Anything I can do to help?
I see the hang here nicely; but ... when I last attached gdb to emacs I got a nonsense / no stack trace: making it rather hard to debug ;-) but sure it's easy enough to reproduce - just try using emacs under gnome with a11y enabled for a few seconds.
Datapoint: hitting caps lock when it's hung gets me back control of the apps. I'll try to get backtraces of the relevant programs, but given that I have to "unbreak" the hang using the above method I think the emacs backtrace will be at the wrong place anyway?
Created attachment 119192 [details] backtrace from metacity
Created attachment 119193 [details] backtrace from at-spi-registryd
Created attachment 119194 [details] backtrace from emacs while hanging
So, we have metacity and at-spi-registry in poll() and emacs in __newselect_nocancel() at the time of the hang...
As comment #23 says, the backtrace from Emacs does not say much. It is the normal backtrace for Emacs when just being idle and waiting for input.
Created attachment 132112 [details] Small Tcl/Tk Script to demonstrate the ignored-XRaiseWindow()-problem On my personal system I switched from metacity to sawfish long ago because metacity seems to ignore the raise window requests of applications. But since metacity is the default Window Manager of the Gnome, customers bother me from time about the behaviour described in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=340725 which refers upstream to this bug report here. I now attach a tiny Tcl/Tk script (picked from the before mentioned http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=340725 and only slightly modified), which demonstrates the effect: Run it and move the main window over the second one. Normally it is expected that clicking the button in the first window will raise the second window. Most other window managers succeed to honor this raise window request. I've tried metacity Version: 1:2.24.0-0ubuntu1 on my Ubuntu Notebook and it did not. I would like to know, whether there is a way to configure metacity so that it will behave as I and obviously many other third party application developers expect it to handle XRaiseWindow(). Otherwise I will have to continue to recommend customers a switch to another alternative window manager or even to the KDE desktop ;-).
The raise window stuff is probably better explained and tracked on bug 567528 This bug should be about the emacs hang it appears.
Bug submitted in Ubuntu bug tracker: https://bugs.launchpad.net/fedora/+source/emacs22/+bug/287577
Hi, Has there been any progress on this issue. I am an Orca and Emacspeak user and when Emacs starts up Orca stops responding normaly. I understand the issue is with emacs from reading this bug. This makes it almost impossible to use emacspeak and orca together. Regards Bart
I'm also experiencing this issue. When Emacs has focus, and if I try to switch to a different application (Alt+Tab) or to a different desktop (Ctrl+Alt+Arrow), the Window manager wouldn't respond. So, I used to switch to a different virtual console (Ctrl+Alt+F?) and switch back. This was until I came across this ticket to find that when a11y and Emacs don't play well. Now I have turned "off" a11y and I'm not experiencing this "freeze" with Emacs.
bugzilla.gnome.org is being replaced by gitlab.gnome.org. We are closing all old bug reports in Bugzilla which have not seen updates for many years. If you can still reproduce this issue in a currently supported version of GNOME (currently that would be 3.38), then please feel free to report it at https://gitlab.gnome.org/GNOME/metacity/-/issues/ Thank you for reporting this issue and we are sorry it could not be fixed.