GNOME Bugzilla – Bug 587344
Reduce overpaint in the window group
Last modified: 2009-07-31 14:14:42 UTC
The memory bandwidth taken up by overpaint is a real problem for day-to-day usability of Mutter; once you have a bunch of windows open (maybe all maximized) redraws start getting getting slow and basic operations like scrolling a web browser chug. I see this more than most people on a 64-megabyte R300 where window textures get pushed out to system RAM and accessed over the AGP bus, but it is also going to be a problem for lower-end unified memory chipsets that simply don't have much memory bandwidth or fill rate. The attached patch is a very good start at fixing this. On th above described system, with 4-5 windows open, I saw frame rates go from about ~14 FPS to about ~45 FPS when the scene was forced to redraw constantly. As described in comments, it is defeated by ARGB windows; if GNOME moves to ARGB themes with toplevel transparency, we'll have to figure out some way of handling that. Other than that, the main potential improvement is to avoid redrawing the scene when a fully obscured window changes. To do that, you'd have to track obscured windows persistently (not just during repaint) and then clutter_actor_hide() any fully obscured windows. Tracking obscured state persistently requires watching stacking order, which there may not be any good way to do within Clutter.
Created attachment 137576 [details] [review] Reduce overpaint in the window group When we are painting a stack of 5-10 maximized windows, the standard bottom-to-top method of drawing every actor results in a tremendous amount of overdraw and can easily max out the available memory bandwidth on a low-end* graphics chipset. It's even worse if window textures are being accessed over the AGP bus. When we have opaque windows, we can go ahead and compute visibility ourselves (in classic X-server fashion) and use that information to restrict drawing obscured actors. * Add MutterWindowGroup - a ClutterGroup subclass with logic for figuring out obscured regions. * Add mutter_window_get_obscured_region() to get the region obscured by that window. * Add mutter_shaped_texture_set_clip_region() to hint a clip region to the painting code; this is set based on the computed visible region of MutterWindowGroup. * Add tidy_texture_frame_set_needs_paint() to hint that the paint can be skipped entirely; this is used when we detect that the window shadow is entirely obscured.
I've gone ahead and pushed this since: A) It's been working without any apparent problems for me for the last week and a half. B) It really makes things much more pleasant to use on a system with limited video ram. We can iterate things as necessary later. (Feel free to put review comments here and reopen.)
Commit 83f8bfd2caa5b6001749b081372d97c21791fc8f breaks the painting of clones of the MutterShapedTexture. Even if the window is occluded, we still want to be able to paint clones. Unfortunately, there currently isn't any way to differentiate in the paint function between painting ourselves and painting on behalf of a clone so I opened http://bugzilla.openedhand.com/show_bug.cgi?id=1685 with request for an extra flag. Also, I am not entirely sure we should be doing this; it feels too much like returning back to an Xrender compositor, this is the sort of a thing that we should not need to be doing with OpenGL. I understand the bandwidth problem though, perhaps we could have a configure option to enable/disable this ?
I don't really understand the problem - clones work fine in gnome-shell, and the way it works was designed to work fine for this. The way it works is that it's all triggered out of paint() of the window group - it computes the unobscured regions, sets them on the actors, then unsets the unobscured regions. Painting of cloned windows should not go through the paint() of the window group, so everything works as per normal. You cannot assume that OpenGL painting is infinitely fast - consider a very modest case where you have 5 1024x768 windows on top of each other - to paint that at 60fps is 3GB/sec of memory bandwidth, which is going to by itself be well beyond the limits of a i915, or more than can be handled for windows over AGP, which is typically 1GB/sec. (1GB/sec to fetch the textures, 1GB/sec to read the color buffer, 1GB/sec to write back.) It's not ridiculous to have 10 stacked 1920x1200 windows, which would be 16GB/sec. So, occlusion is needed. A more typical OpenGL approach would be to do a depth-only pass: - Draw the shapes to the depth buffer as a series of rectangles - Then draw the windows as per normal, with an appropriate depth test, so any obscured window is not drawn. (This could also be used potentially to avoid the need for the multitexturing to get window shapes) However, some difficulties with that: * turning on the depth buffer, changing the depth tests might have "interesting" interactions with other parts of the UI or require COGL extensions * windows would have to be drawn without any transformation but with different depths. This would require switching to an orthographic projection, or maybe doing something with glDepthRange [I think using glDepthRange between window drawing would cause inefficiency on current drivers] So, this approach was much more restricted in scope and easier to get going.
(In reply to comment #4) > I don't really understand the problem - clones work fine in gnome-shell, and > the way it works was designed to work fine for this. Some clones work fine in Moblin too; I need to get to the bottom of the problems with 587251 before I can investigate this further.
OK, I worked out what is going on. The moblin workspace switching effect is a composite actor that is inserted on the top of the window_group, so consequently, the clones it contains are painted inside the MutterWindowGroup paint() function, and hence the occlusion mechanism kicks in. I can fix this by moving the effect into the overlay layer, but that made me think that when we add documentation to mutter_get_window_group_for_screen() function, we should note there that plugins should avoid inserting their own actors into there and/or be aware of the special nature of this container. I think this bug can be changed to fixed again.