GNOME Bugzilla – Bug 721345
Reduce server grabs in window creation codepaths
Last modified: 2014-01-07 14:00:15 UTC
As discussed here: https://mail.gnome.org/archives/gnome-shell-list/2013-December/msg00020.html we would like to reduce the number of places that Mutter does server grabs. They have the potential to cause deadlocks, particularly when mutter emits signals while holding a server grab. We've seen these deadlocks in practice with the Mali binary OpenGLESv2 drivers :( Here are some patches to do that. Now I only see two server grabs happening during normal usage: request_xserver_input_focus_change (unavoidable, but very limited scope, so no problems here) and idle_calc_showing (to be addressed later, since we also caught a deadlock in this path).
Created attachment 265154 [details] [review] Remove meta_window_new_with_attrs The compositor code used to handle X windows that didn't have a corresponding MetaWindow (see commit d538690b), which is why the attribute query is separated. As that doesn't happen any more, we can clean up. No functional changes. Suggested by Owen Taylor.
Created attachment 265155 [details] [review] screen: use stack tracker for initial window query In meta_screen_manage_all_windows() we can use our own stack tracker to get the list of windows - no need to query X again.
Created attachment 265156 [details] [review] Add missing X error traps I performed a relatively thorough audit of the code, looking for Xlib calls related to windows, which might go away at unfortunate times. I found two places which don't seem to have error trap protection, fixed here. I also found some cases where the X error state was going unchecked, before going on to modify mutter's internal state. Fix those too.
Created attachment 265157 [details] [review] Discourage server grabs Server grabs are not as evil as you might expect, but there is agreement in that their usage should be limited. Server grabs can cause things to go rather wrong when mutter emits a signal while it has grabbed the server. If the receiver of that signal waits for a synchronous action performed by another client, then you have a deadlock. This happens with Mali binary GLESv2 drivers :(
Created attachment 265158 [details] [review] Reduce server grabs during window creation Remove some obvious server grabs from the window creation codepath, also ones that are taken at startup. During startup, there is no need to grab: we install the event handlers before querying for the already-existing windows, so there is no danger that we will 'lose' some window. We might try to create a window twice (if it comes back in the original query and then we get an event for it) but the code is already protected against such conditions. When windows are created later, we also do not need grabs, we just need appropriate error checking as the window may be destroyed at any time (or it may have already been destroyed). The stack tracker is unaffected here - as it listens to CreateNotify and DestroyNotify events and responds directly, the internal stack representation will always be consistent even if the window goes away while we are processing MapRequest or similar.
Created attachment 265159 [details] [review] frame: remove unnecessary server grab meta_window_ensure_frame() creates its own grab and has a comment claiming that it must be called under a grab too. But the reasoning given in the comment does not seem relevant here. We only frame non-override-redirect windows, so we are creating the frame in response to MapRequest. There is no way that the child could receive a MapNotify at this point, since that only happens much later, once we go through the CALC_SHOWING queue and call XMapWindow() from meta_window_show(). Remove the unnecessary grab.
Created attachment 265162 [details] torture test I torture tested these patches with this test app, which creates windows and destroys them a short random time later.
Review of attachment 265156 [details] [review]: Thanks for the patch, and more importantly for the audit that produced the patch! The stack tracker is designed to be robust against predicted operations "not working out" in practice, so I don't think the changes to make window stacking and reparenting operations synchronous so we can tell if they succeeded or failed make sense. There's a distinct performance penalty from going from: meta_error_trap_push() to: meta_error_trap_push_with_return() Since instead of simply recording that we're going to ignore errors for a range of serials, we have to actually XSync() and see if an error occurred. Other traps you added here seem to be for operations on windows Mutter creates. ::: src/core/display.c @@ +1130,3 @@ + meta_error_trap_push (display); + XDestroyWindow (display->xdisplay, display->leader_window); + meta_error_trap_pop (display); We created this window, we don't need to trap destroying it. ::: src/core/frame.c @@ -148,3 @@ window->rect.x, window->rect.y); - /* FIXME handle this error */ You didn't actually handle what's being talked about - the FIXME isn't about the stack tracker record (which was added 10+ years after the FIXME) it's rather than if the XReparentWindow fails (in particular for a reason other than the window being destroyed) then we'll show the user an empty frame and it will look weird. The main reason I can think of for this is if a window a) uses a non-default visual and depth b) specifies a ParentRelative background. Of course, if an app is doing that - an app bug, then I'm not sure that refusing to manage it is better than showing a broken window, so the FIXME probably could be removed on that grounds, and the grounds that if it hasn't been fixed in 13 years, it doesn't matter. If you want to submit a separate patch that removes the FIXME with a summary of the above as a commit message, that would be OK. @@ +432,2 @@ if (cursor == META_CURSOR_DEFAULT) XUndefineCursor (frame->window->display->xdisplay, frame->xwindow); frame->xwindow is our window, we don't need to trap operations on it.
(In reply to comment #8) > Review of attachment 265156 [details] [review]: > > Thanks for the patch, and more importantly for the audit that produced the > patch! > > The stack tracker is designed to be robust against predicted operations "not > working out" in practice, so I don't think the changes to make window stacking > and reparenting operations synchronous so we can tell if they succeeded or > failed make sense. There's a distinct performance penalty from going from: > > meta_error_trap_push() > > to: > > meta_error_trap_push_with_return() > > Since instead of simply recording that we're going to ignore errors for a range > of serials, we have to actually XSync() and see if an error occurred. Yeah, I understand the performance penalty - but I had missed that design point of the stack tracker. If I revert these changes, my torture test triggers cases when the predicted operations don't work out, and it does complain loudly. See the g_warning() calls in meta_stack_op_apply(). Could those warnings be demoted to debug messages? Or should I just ignore them?
Review of attachment 265157 [details] [review]: Seems OK.
Review of attachment 265154 [details] [review]: Nice cleanup. I've been wanting to do this for a while.
Review of attachment 265159 [details] [review]: Yeah, this does seems suspect. I'm tempted to poke through the FVWM source code to figure out exactly what it was doing, but I think this is correct.
Review of attachment 265155 [details] [review]: ::: src/core/screen.c @@ +899,1 @@ + /* Copy the stack as it will be modified as part of the loop */ Why will it get modified as part of the loop? @@ +899,2 @@ + /* Copy the stack as it will be modified as part of the loop */ + stack = g_array_sized_new (FALSE, FALSE, sizeof (Window), n_children); Seems a bit silly to use a GArray for this when it's static size. I know this is how copy_stack does it, but that's because it needs a dynamic array. Can we just use g_memdup here?
Review of attachment 265158 [details] [review]: ::: src/core/window.c @@ +800,3 @@ + * the window could have already gone away, or could go away at any point, + * so we must be careful with X error handling. + */ This comment should probably go above the error_trap_push above. @@ +804,1 @@ + if (!XGetWindowAttributes (display->xdisplay, xwindow, &attrs)) Hm. The code before looked for an error trap even if the function returned TRUE. Looking through the libX11 code, it does seem completely unnecessary. Can you split this fix into its own commit, since it's sort of unrelated? @@ +805,3 @@ + { + meta_verbose ("Not managing due to failed attribute query\n"); + meta_error_trap_pop (display); This is, again, an unrelated fixup, but it would be nice to have a "goto error;" path that pops the global trap and returns NULL;, so we don't forget it in one of the paths. @@ +890,3 @@ * with Mutter we want to be able to create manageable windows from within * the process (such as a dummy desktop window), so we do not want this + * call failing to prevent the window from being managed. This entire comment seems superfluous, because we have a global error trap to ignore errors up until this point.
Daniel, would you mind cherry-picking those pushed commits to the wayland and gnome-3-10 branches?
Comment on attachment 265156 [details] [review] Add missing X error traps Marking this one as obsolete. Owen's comments above show that all the added traps are unnecessary, and the stack tracker should be left to handle situations which don't go according to plan.
Created attachment 265472 [details] [review] screen: use stack tracker for initial window query In meta_screen_manage_all_windows() we can use our own stack tracker to get the list of windows - no need to query X again. A copy is needed because the stack gets modified as part of the loop. Specifically, meta_stack_tracker_get_stack() at this time returns the predicted stack, and meta_window_new() performs a few operations (e.g. framing) which cause immediate changes to the predicted stack.
Created attachment 265473 [details] [review] meta_window_new: clean up error handling The return code of XGetWindowAttributes() indicates whether an error was encountered or not. There is no need to specifically check the error trap. The trap around XAddToSaveSet() was superfluous. We have a global error trap to ignore any errors here, and there is no need to XSync() as GDK will later ignore the error asynchronously if one is raised. Also move common error exit path to an error label.
Created attachment 265474 [details] [review] Reduce server grabs during window creation Remove some obvious server grabs from the window creation codepath, also ones that are taken at startup. During startup, there is no need to grab: we install the event handlers before querying for the already-existing windows, so there is no danger that we will 'lose' some window. We might try to create a window twice (if it comes back in the original query and then we get an event for it) but the code is already protected against such conditions. When windows are created later, we also do not need grabs, we just need appropriate error checking as the window may be destroyed at any time (or it may have already been destroyed). The stack tracker is unaffected here - as it listens to CreateNotify and DestroyNotify events and responds directly, the internal stack representation will always be consistent even if the window goes away while we are processing MapRequest or similar.
Review of attachment 265473 [details] [review]: Looks good. ::: src/core/window.c @@ +803,3 @@ meta_verbose ("Failed to get attributes for window 0x%lx\n", xwindow); + goto error; Indentation here is still off. Fix that when you commit.
Review of attachment 265472 [details] [review]: Looks. ::: src/core/screen.c @@ +902,3 @@ for (i = 0; i < n_children; ++i) { + meta_window_new (screen->display, children [i], TRUE, Nit: no space between variable and [ children[i]
Review of attachment 265474 [details] [review]: Looks good.
All now pushed to master, thanks for the reviews. The torture test will trigger the stack tracker to complain sometimes, when the events on the predicted stack do not come true. I guess that is OK, no harm is done. (In reply to comment #15) > Daniel, would you mind cherry-picking those pushed commits to the wayland and > gnome-3-10 branches? Just to confirm, which patches should also go on those other branches?
(In reply to comment #23) > Just to confirm, which patches should also go on those other branches? All of them. If you're having issues with merge conflicts, let me know and I'll try to fix them up.
Bit confused looking at those branches - there is stuff in gnome-3-10 that is not in master, like meta_window_new_for_wayland. Does this mean that gnome-3-10 was branched off the wayland branch?
(In reply to comment #25) > Bit confused looking at those branches - there is stuff in gnome-3-10 that is > not in master, like meta_window_new_for_wayland. Does this mean that gnome-3-10 > was branched off the wayland branch? Where? I don't see that. gnome-3-10 was branched from master.
Oops, not sure what happened there. I pushed to gnome-3-10. I'll look at the wayland conflicts tomorrow.
Done. Was nothing complicated in the end but you might want to quickly check the wayland branch.