Bug 721345 – Reduce server grabs in window creation codepaths

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 721345 - Reduce server grabs in window creation codepaths


Summary:	Reduce server grabs in window creation codepaths


Status:	RESOLVED FIXED

Product:	mutter
Classification:	Core
Component:	general
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	mutter-maint
QA Contact:	mutter-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2014-01-02 16:13 UTC by Daniel Drake
Modified:	2014-01-07 14:00 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Remove meta_window_new_with_attrs (15.05 KB, patch) 2014-01-02 16:18 UTC, Daniel Drake	committed	Details \| Review
screen: use stack tracker for initial window query (1.85 KB, patch) 2014-01-02 16:18 UTC, Daniel Drake	reviewed	Details \| Review
Add missing X error traps (8.35 KB, patch) 2014-01-02 16:18 UTC, Daniel Drake	rejected	Details \| Review
Discourage server grabs (1.68 KB, patch) 2014-01-02 16:18 UTC, Daniel Drake	committed	Details \| Review
Reduce server grabs during window creation (6.64 KB, patch) 2014-01-02 16:18 UTC, Daniel Drake	reviewed	Details \| Review
frame: remove unnecessary server grab (2.04 KB, patch) 2014-01-02 16:19 UTC, Daniel Drake	committed	Details \| Review
torture test (988 bytes, text/plain) 2014-01-02 16:40 UTC, Daniel Drake		Details
screen: use stack tracker for initial window query (2.03 KB, patch) 2014-01-06 20:05 UTC, Daniel Drake	committed	Details \| Review
meta_window_new: clean up error handling (4.62 KB, patch) 2014-01-06 20:06 UTC, Daniel Drake	committed	Details \| Review
Reduce server grabs during window creation (3.61 KB, patch) 2014-01-06 20:06 UTC, Daniel Drake	committed	Details \| Review

Description Daniel Drake 2014-01-02 16:13:57 UTC

As discussed here: https://mail.gnome.org/archives/gnome-shell-list/2013-December/msg00020.html
we would like to reduce the number of places that Mutter does server grabs. They have the potential to cause deadlocks, particularly when mutter emits signals while holding a server grab. We've seen these deadlocks in practice with the Mali binary OpenGLESv2 drivers :(

Here are some patches to do that.

Now I only see two server grabs happening during normal usage: request_xserver_input_focus_change (unavoidable, but very limited scope, so no problems here) and idle_calc_showing (to be addressed later, since we also caught a deadlock in this path).

Comment 1 Daniel Drake 2014-01-02 16:18:39 UTC

Created attachment 265154 [details] [review]
Remove meta_window_new_with_attrs

The compositor code used to handle X windows that didn't have a
corresponding MetaWindow (see commit d538690b), which is why the
attribute query is separated.

As that doesn't happen any more, we can clean up. No functional changes.

Suggested by Owen Taylor.

Comment 2 Daniel Drake 2014-01-02 16:18:43 UTC

Created attachment 265155 [details] [review]
screen: use stack tracker for initial window query

In meta_screen_manage_all_windows() we can use our own stack
tracker to get the list of windows - no need to query X again.

Comment 3 Daniel Drake 2014-01-02 16:18:47 UTC

Created attachment 265156 [details] [review]
Add missing X error traps

I performed a relatively thorough audit of the code, looking for Xlib
calls related to windows, which might go away at unfortunate times.
I found two places which don't seem to have error trap protection, fixed
here.

I also found some cases where the X error state was going unchecked,
before going on to modify mutter's internal state. Fix those too.

Comment 4 Daniel Drake 2014-01-02 16:18:51 UTC

Created attachment 265157 [details] [review]
Discourage server grabs

Server grabs are not as evil as you might expect, but there is agreement
in that their usage should be limited.

Server grabs can cause things to go rather wrong when mutter emits
a signal while it has grabbed the server. If the receiver of that signal
waits for a synchronous action performed by another client, then you
have a deadlock. This happens with Mali binary GLESv2 drivers :(

Comment 5 Daniel Drake 2014-01-02 16:18:57 UTC

Created attachment 265158 [details] [review]
Reduce server grabs during window creation

Remove some obvious server grabs from the window creation codepath,
also ones that are taken at startup.

During startup, there is no need to grab: we install the event handlers
before querying for the already-existing windows, so there is no danger
that we will 'lose' some window. We might try to create a window twice
(if it comes back in the original query and then we get an event for it)
but the code is already protected against such conditions.

When windows are created later, we also do not need grabs, we just need
appropriate error checking as the window may be destroyed at any time
(or it may have already been destroyed).

The stack tracker is unaffected here - as it listens to CreateNotify and
DestroyNotify events and responds directly, the internal stack
representation will always be consistent even if the window goes away while
we are processing MapRequest or similar.

Comment 6 Daniel Drake 2014-01-02 16:19:01 UTC

Created attachment 265159 [details] [review]
frame: remove unnecessary server grab

meta_window_ensure_frame() creates its own grab and has a comment
claiming that it must be called under a grab too.

But the reasoning given in the comment does not seem relevant here.
We only frame non-override-redirect windows, so we are creating
the frame in response to MapRequest. There is no way that the child
could receive a MapNotify at this point, since that only happens
much later, once we go through the CALC_SHOWING queue and call
XMapWindow() from meta_window_show().

Remove the unnecessary grab.

Comment 7 Daniel Drake 2014-01-02 16:40:28 UTC

Created attachment 265162 [details]
torture test

I torture tested these patches with this test app, which creates windows and destroys them a short random time later.

Comment 8 Owen Taylor 2014-01-02 17:43:33 UTC

Review of attachment 265156 [details] [review]:

Thanks for the patch, and more importantly for the audit that produced the patch!

The stack tracker is designed to be robust against predicted operations "not working out" in practice, so I don't think the changes to make window stacking and reparenting operations synchronous so we can tell if they succeeded or failed make sense. There's a distinct performance penalty from going from:

 meta_error_trap_push()

to:

 meta_error_trap_push_with_return()

Since instead of simply recording that we're going to ignore errors for a range of serials, we have to actually XSync() and see if an error occurred.

Other traps you added here seem to be for operations on windows Mutter creates.

::: src/core/display.c
@@ +1130,3 @@
+      meta_error_trap_push (display);
+      XDestroyWindow (display->xdisplay, display->leader_window);
+      meta_error_trap_pop (display);

We created this window, we don't need to trap destroying it.

::: src/core/frame.c
@@ -148,3 @@
                    window->rect.x,
                    window->rect.y);
-  /* FIXME handle this error */

You didn't actually handle what's being talked about - the FIXME isn't about the stack tracker record (which was added 10+ years after the FIXME) it's rather than if the XReparentWindow fails (in particular for a reason other than the window being destroyed) then we'll show the user an empty frame and it will look weird. The main reason I can think of for this is if a window a) uses a non-default visual and depth b) specifies a ParentRelative background.

Of course, if an app is doing that - an app bug, then I'm not sure that refusing to manage it is better than showing a broken window, so the FIXME probably could be removed on that grounds, and the grounds that if it hasn't been fixed in 13 years, it doesn't matter. If you want to submit a separate patch that removes the FIXME with a summary of the above as a commit message, that would be OK.

@@ +432,2 @@
   if (cursor == META_CURSOR_DEFAULT)
     XUndefineCursor (frame->window->display->xdisplay, frame->xwindow);

frame->xwindow is our window, we don't need to trap operations on it.

Comment 9 Daniel Drake 2014-01-02 22:15:01 UTC

(In reply to comment #8)
> Review of attachment 265156 [details] [review]:
> 
> Thanks for the patch, and more importantly for the audit that produced the
> patch!
> 
> The stack tracker is designed to be robust against predicted operations "not
> working out" in practice, so I don't think the changes to make window stacking
> and reparenting operations synchronous so we can tell if they succeeded or
> failed make sense. There's a distinct performance penalty from going from:
> 
>  meta_error_trap_push()
> 
> to:
> 
>  meta_error_trap_push_with_return()
> 
> Since instead of simply recording that we're going to ignore errors for a range
> of serials, we have to actually XSync() and see if an error occurred.

Yeah, I understand the performance penalty - but I had missed that design point of the stack tracker.

If I revert these changes, my torture test triggers cases when the predicted operations don't work out, and it does complain loudly. See the g_warning() calls in meta_stack_op_apply().

Could those warnings be demoted to debug messages? Or should I just ignore them?

Comment 10 Jasper St. Pierre (not reading bugmail) 2014-01-04 05:00:38 UTC

Review of attachment 265157 [details] [review]:

Seems OK.

Comment 11 Jasper St. Pierre (not reading bugmail) 2014-01-04 05:03:03 UTC

Review of attachment 265154 [details] [review]:

Nice cleanup. I've been wanting to do this for a while.

Comment 12 Jasper St. Pierre (not reading bugmail) 2014-01-04 05:04:17 UTC

Review of attachment 265159 [details] [review]:

Yeah, this does seems suspect. I'm tempted to poke through the FVWM source code to figure out exactly what it was doing, but I think this is correct.

Comment 13 Jasper St. Pierre (not reading bugmail) 2014-01-04 05:08:29 UTC

Review of attachment 265155 [details] [review]:

::: src/core/screen.c
@@ +899,1 @@
+  /* Copy the stack as it will be modified as part of the loop */

Why will it get modified as part of the loop?

@@ +899,2 @@
+  /* Copy the stack as it will be modified as part of the loop */
+  stack = g_array_sized_new (FALSE, FALSE, sizeof (Window), n_children);

Seems a bit silly to use a GArray for this when it's static size. I know this is how copy_stack does it, but that's because it needs a dynamic array. Can we just use g_memdup here?

Comment 14 Jasper St. Pierre (not reading bugmail) 2014-01-04 05:16:37 UTC

Review of attachment 265158 [details] [review]:

::: src/core/window.c
@@ +800,3 @@
+   * the window could have already gone away, or could go away at any point,
+   * so we must be careful with X error handling.
+   */

This comment should probably go above the error_trap_push above.

@@ +804,1 @@
+  if (!XGetWindowAttributes (display->xdisplay, xwindow, &attrs))

Hm. The code before looked for an error trap even if the function returned TRUE. Looking through the libX11 code, it does seem completely unnecessary. Can you split this fix into its own commit, since it's sort of unrelated?

@@ +805,3 @@
+    {
+      meta_verbose ("Not managing due to failed attribute query\n");
+      meta_error_trap_pop (display);

This is, again, an unrelated fixup, but it would be nice to have a "goto error;" path that pops the global trap and returns NULL;, so we don't forget it in one of the paths.

@@ +890,3 @@
    * with Mutter we want to be able to create manageable windows from within
    * the process (such as a dummy desktop window), so we do not want this
+   * call failing to prevent the window from being managed.

This entire comment seems superfluous, because we have a global error trap to ignore errors up until this point.

Comment 15 Jasper St. Pierre (not reading bugmail) 2014-01-06 17:23:06 UTC

Daniel, would you mind cherry-picking those pushed commits to the wayland and gnome-3-10 branches?

Comment 16 Daniel Drake 2014-01-06 18:52:22 UTC

Comment on attachment 265156 [details] [review]
Add missing X error traps

Marking this one as obsolete. Owen's comments above show that all the added traps are unnecessary, and the stack tracker should be left to handle situations which don't go according to plan.

Comment 17 Daniel Drake 2014-01-06 20:05:08 UTC

Created attachment 265472 [details] [review]
screen: use stack tracker for initial window query

In meta_screen_manage_all_windows() we can use our own stack
tracker to get the list of windows - no need to query X again.

A copy is needed because the stack gets modified as part of the loop.
Specifically, meta_stack_tracker_get_stack() at this time returns the
predicted stack, and meta_window_new() performs a few operations
(e.g. framing) which cause immediate changes to the predicted stack.

Comment 18 Daniel Drake 2014-01-06 20:06:04 UTC

Created attachment 265473 [details] [review]
meta_window_new: clean up error handling

The return code of XGetWindowAttributes() indicates whether an error
was encountered or not. There is no need to specifically check the error
trap.

The trap around XAddToSaveSet() was superfluous. We have a global error
trap to ignore any errors here, and there is no need to XSync() as GDK
will later ignore the error asynchronously if one is raised.

Also move common error exit path to an error label.

Comment 19 Daniel Drake 2014-01-06 20:06:44 UTC

Created attachment 265474 [details] [review]
Reduce server grabs during window creation

Remove some obvious server grabs from the window creation codepath,
also ones that are taken at startup.

During startup, there is no need to grab: we install the event handlers
before querying for the already-existing windows, so there is no danger
that we will 'lose' some window. We might try to create a window twice
(if it comes back in the original query and then we get an event for it)
but the code is already protected against such conditions.

When windows are created later, we also do not need grabs, we just need
appropriate error checking as the window may be destroyed at any time
(or it may have already been destroyed).

The stack tracker is unaffected here - as it listens to CreateNotify and
DestroyNotify events and responds directly, the internal stack
representation will always be consistent even if the window goes away while
we are processing MapRequest or similar.

Comment 20 Jasper St. Pierre (not reading bugmail) 2014-01-06 20:14:30 UTC

Review of attachment 265473 [details] [review]:

Looks good.

::: src/core/window.c
@@ +803,3 @@
          meta_verbose ("Failed to get attributes for window 0x%lx\n",
                         xwindow);
+         goto error;

Indentation here is still off. Fix that when you commit.

Comment 21 Jasper St. Pierre (not reading bugmail) 2014-01-06 20:16:25 UTC

Review of attachment 265472 [details] [review]:

Looks.

::: src/core/screen.c
@@ +902,3 @@
   for (i = 0; i < n_children; ++i)
     {
+      meta_window_new (screen->display, children [i], TRUE,

Nit: no space between variable and [

children[i]

Comment 22 Jasper St. Pierre (not reading bugmail) 2014-01-06 20:26:53 UTC

Review of attachment 265474 [details] [review]:

Looks good.

Comment 23 Daniel Drake 2014-01-06 20:30:48 UTC

All now pushed to master, thanks for the reviews.
The torture test will trigger the stack tracker to complain sometimes, when the events on the predicted stack do not come true. I guess that is OK, no harm is done.

(In reply to comment #15)
> Daniel, would you mind cherry-picking those pushed commits to the wayland and
> gnome-3-10 branches?

Just to confirm, which patches should also go on those other branches?

Comment 24 Jasper St. Pierre (not reading bugmail) 2014-01-06 20:34:07 UTC

(In reply to comment #23)
> Just to confirm, which patches should also go on those other branches?

All of them. If you're having issues with merge conflicts, let me know and I'll try to fix them up.

Comment 25 Daniel Drake 2014-01-06 21:32:49 UTC

Bit confused looking at those branches - there is stuff in gnome-3-10 that is not in master, like meta_window_new_for_wayland. Does this mean that gnome-3-10 was branched off the wayland branch?

Comment 26 Jasper St. Pierre (not reading bugmail) 2014-01-06 21:58:00 UTC

(In reply to comment #25)
> Bit confused looking at those branches - there is stuff in gnome-3-10 that is
> not in master, like meta_window_new_for_wayland. Does this mean that gnome-3-10
> was branched off the wayland branch?

Where? I don't see that. gnome-3-10 was branched from master.

Comment 27 Daniel Drake 2014-01-06 22:19:47 UTC

Oops, not sure what happened there.

I pushed to gnome-3-10. I'll look at the wayland conflicts tomorrow.

Comment 28 Daniel Drake 2014-01-07 14:00:15 UTC

Done. Was nothing complicated in the end but you might want to quickly check the wayland branch.