After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 792633 - gnome-shell is wasting CPU repainting an unchanging panel
gnome-shell is wasting CPU repainting an unchanging panel
Status: RESOLVED FIXED
Product: gnome-shell
Classification: Core
Component: general
3.26.x
Other Linux
: Normal normal
: ---
Assigned To: gnome-shell-maint
gnome-shell-maint
Depends on:
Blocks:
 
 
Reported: 2018-01-18 06:52 UTC by Daniel van Vugt
Modified: 2018-08-28 07:15 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
fix-792633-v1.patch (1.25 KB, patch)
2018-01-18 06:59 UTC, Daniel van Vugt
rejected Details | Review
Google CPU Profiler output PDF - before (14.95 KB, application/pdf)
2018-01-22 03:32 UTC, Daniel van Vugt
  Details
Google CPU Profiler output PDF - after (14.78 KB, application/pdf)
2018-01-22 03:32 UTC, Daniel van Vugt
  Details

Description Daniel van Vugt 2018-01-18 06:52:15 UTC
gnome-shell is wasting CPU repainting an unchanging panel. Every time an app window changes all visible actors including the panel get repainted. But the panel mostly doesn't change, so should be cached via offscreen-redirect instead.
Comment 1 Daniel van Vugt 2018-01-18 06:59:17 UTC
Created attachment 366991 [details] [review]
fix-792633-v1.patch

Here's a fix, which noticeably reduces gnome-shell's CPU usage.
Comment 2 Ray Strode [halfline] 2018-01-19 19:14:49 UTC
Review of attachment 366991 [details] [review]:

so this is clearly a workaround if anything, not a fix!
Comment 3 Ray Strode [halfline] 2018-01-19 19:24:28 UTC
so we definitely had this sort of problem in the past (see bug 792633 )

there was also a mesa software rendering bug for a while that forced full screen updates unnecessarily.  That eventually got fixed I think, but we had this commit for a while in cogl to work around it:

https://git.gnome.org//browse/cogl/commit/?id=11f2f6ebb42398978ec8dd92b3c332ae8140a728

what video driver are you using?

does CLUTTER_PAINT=redraws in the environment show full screen redraws ?  are you using a stock gnome-shell theme?
Comment 4 Daniel van Vugt 2018-01-22 01:59:13 UTC
No, it's a proper fix, not a workaround.

Caching complicated GUI elements on the GPU between frames as textures is the correct way to make rendering the desktop more efficient. I have done this in the JavaScript because a targeted fix for the most relevant widget is all you really want. Applying the same fix in general within ST or Clutter is less ideal than this, because it wastes GPU memory on caching things at multiple levels, and also can accidentally lead to caching things you don't want (like ShellGeneric would include app windows and get invalidated on each frame).

So this is a proper carefully considered fix, and not a workaround.
Comment 5 Daniel van Vugt 2018-01-22 02:02:03 UTC
This work was directed by profiles I have collected over the past week. I can provide the full details if you want but suffice to say the above patch eliminates between a tenth and a quarter of gnome-shell's CPU usage when redrawing (e.g. some app is busy).
Comment 6 Daniel van Vugt 2018-01-22 03:32:23 UTC
Created attachment 367194 [details]
Google CPU Profiler output PDF - before
Comment 7 Daniel van Vugt 2018-01-22 03:32:40 UTC
Created attachment 367195 [details]
Google CPU Profiler output PDF - after
Comment 8 Daniel van Vugt 2018-01-22 03:42:32 UTC
I am using Intel graphics, and the above profile output is from the stock Gnome Shell session (Ubuntu dock removed). I have intentionally avoided Ubuntu customisations for the purpose of this bug and am handling inefficiencies in those in a separate bug. This bug is intentionally and specifically about problems common with upstream Gnome Shell. The profiling and the fix were both done against builds of: https://git.gnome.org/browse/gnome-shell/log/?h=gnome-3-26
Comment 9 Ray Strode [halfline] 2018-01-22 11:48:01 UTC
but why is the panel getting repainted when an unrelated window changes? only the part of the window that changes should get repainted, right?
Comment 10 Daniel van Vugt 2018-01-23 01:47:17 UTC
Ray, you seem to be describing ideal efficient behaviour, which is not the current clutter behaviour.

Yes, ideally things that aren't changing shouldn't be regenerated or recalculated between frames. In the old world of display servers those old pixels would be cached on the screen itself -- kept in the framebuffer and only the changing ones get written. In the modern world we use compositing and accelerate rendering using GPUs, and we also use double-buffering which prevents the efficient retention of unchanged pixels (it's still ideal to use double buffering though).

It's still too CPU-intensive to regenerate a deep tree of complex widgets 60 times per second. You have to be clever and save some work from previous frames. That's what clutter's offscreen redirect feature does -- it turns on caching so that an actor that hasn't changed next frame doesn't need to be expensively regenerated in order to be redisplayed. It's saved on-GPU in an FBO texture.

The common-sense "only repaint things that change" approach you're expecting is turned off by default in clutter. Although the official docs are wrong and say otherwise: https://developer.gnome.org/clutter/stable/ClutterActor.html#ClutterOffscreenRedirect

In reality clutter uses zero caching by default, which is part of the reason why gnome-shell is so CPU-intensive: https://git.gnome.org/browse/mutter/tree/clutter/clutter/clutter-actor.c#n6784
Comment 11 Daniel van Vugt 2018-01-23 01:51:11 UTC
It's worth noting you shouldn't just turn on caching for all widgets. Because some of those widgets are large and encompass the whole screen. Those widgets' caches would be invalidated on every frame so caching in those cases would be detrimental to performance. Ideal performance is achieved by only caching those largest parts of the screen that are unchanged by app window redraws -- panels and docks.
Comment 12 Ray Strode [halfline] 2018-01-23 02:05:13 UTC
you're not correct, modulo some recent regression perhaps.  is this with llvmpipe? intel? nvidia? did you try what i asked in comment 3?
Comment 13 Daniel van Vugt 2018-01-23 02:41:22 UTC
What part of what I said is not correct?

The link you refer to points to a commit from 2012 that doesn't sound relevant. That's the "old world" way of doing things (particularly for Xorg, not Wayland/EGLnative) I mentioned above.
Comment 14 Daniel van Vugt 2018-01-23 02:46:57 UTC
As mentioned in comment #8 I am using Intel graphics. But that's probably not relevant.

This is a basic CPU bottleneck that I've taken the time to profile, understand and fix. What's not to believe?
Comment 15 Ray Strode [halfline] 2018-01-23 02:54:35 UTC
i'm saying if the panel is redrawing when the terminal cursor blinks (or whatever) then the fix isnt to make the panel redraw faster.  it's to fix whatever recent problem got introduced thats leading to the panel getting redawn erroneously.
Comment 16 Jonas Ådahl 2018-01-23 02:58:31 UTC
gnome-shell has worked as in the "new world" way since GNOME 3.0, where everything is always composited (except for the fullscreen unredirect thing used by games). What is being questioned is what causes the panel to be redrawn when there should be no damage reported on that region.

Damage tracking is still a thing, we should use the buffer age EGL extension both on X11 and Wayland (which is available on intel) thus we should be able to reuse the old buffer and avoid redrawing the panel (as in, clutter should not attempt to clutter_actor_paint() the panel at all) if it had not changed and no damage overlapping it was queued.

It seems that, for some reason, damage overlapping the panel is queued causing it to be painted, but this should in theory not happen.

The code that deals with buffer age damage region tracking lives in clutter-stage-cogl.c. What *should* happen is, if some part not overlapping the panel changed, damage is queued, triggering a redraw. clutter-stage-cogl.c will look at the draw buffer age, accumulate damage region in a way that we can make sure we also draw things not being up to date on the old buffer. Then this accumulated region is used to tell clutter to draw *part* of the stage. In the normal case, when frame after frame only e.g. a window content is updated, we should be able to more or less always reuse the buffer without having to redraw anything more than the newly reported damage region.
Comment 17 Ray Strode [halfline] 2018-01-23 03:01:04 UTC
maybe related to the changes in bug 747163 ?
Comment 18 Jonas Ådahl 2018-01-23 03:02:55 UTC
One thing to note is that, for vertically tiled gnome-terminals, gnome-terminal will IIRC still draw shadow around the window. This shadow will overlap with the panel, so if gnome-terminal damages the whole surface area (including the shadow region) even if it only blinks the cursor, it would trigger redraw of the panel.
Comment 19 Ray Strode [halfline] 2018-01-23 03:09:18 UTC
daniel are you only seeing this problem for windows that overlap the panel?
Comment 20 Daniel van Vugt 2018-01-23 03:14:53 UTC
I'm testing with gears spinning fullscreen, so it's 60 times per second with no visible overlap.

Where exactly are you imagining the unchanging panel image is coming from 60 times per second?...

It's not being saved in the front buffer because; double buffering...

It's not being saved in a texture/FBO because:
https://git.gnome.org/browse/mutter/tree/clutter/clutter/clutter-actor.c#n6784

It not being saved anywhere. It's getting repainted 60 times per second in the least efficient way possible (not reusing textures from the previous frame). That's exactly what clutter's offscreen redirect feature fixes.

Jonas is suggesting we use EGL extensions to make the shell effectively single buffered. I understand that might be an option, but:
  (a) if it's enabled then it's not working as you say; and
  (b) you shouldn't do that at all (which we learned the hard way in Compiz): because
    (i) Not all drivers can flip a subset of the screen with reliable vsync to avoid tearing; and
    (ii) Single/double buffering hurts frame rates as you can't pipeline the GPU effectively (using triple buffering); and
    (iii) Tracking damage is too CPU-intensive; and
    (iv) Saving previous work offscreen is easy without framebuffer damage tracking -- just use a texture. As does clutter's offscreen redirect feature. Just turn it on.

Seriously. I'm not making this up. I learned by gnome-shell was inefficient the hard way by looking at CPU profiles of it for hours on end. Then prototyped some fixes. Then verified the fixes work. It makes no sense why you wouldn't want an optimization that is both theoretically correct and practically proven.
Comment 21 Daniel van Vugt 2018-01-23 03:21:47 UTC
Correction: I'm testing with gears spinning maximized (because legacy X fullscreening without hints doesn't work). There may be some overlap under the panel but I can't tell with the solid opacity.
Comment 22 Jonas Ådahl 2018-01-23 03:26:56 UTC
(In reply to Daniel van Vugt from comment #20)
> I'm testing with gears spinning fullscreen, so it's 60 times per second with
> no visible overlap.
> 
> Where exactly are you imagining the unchanging panel image is coming from 60
> times per second?...

That is what we are pondering about, and what should be researched.

> 
> It's not being saved in the front buffer because; double buffering...
> 
> It's not being saved in a texture/FBO because:
> https://git.gnome.org/browse/mutter/tree/clutter/clutter/clutter-actor.
> c#n6784
> 
> It not being saved anywhere. It's getting repainted 60 times per second in
> the least efficient way possible (not reusing textures from the previous
> frame). That's exactly what clutter's offscreen redirect feature fixes.
> 
> Jonas is suggesting we use EGL extensions to make the shell effectively
> single buffered.

I'm suggesting no such thing; I just explained what we do, and that the way we do things we already do 1) double buffering 2) avoiding unnecessary redraws

I understand that might be an option, but:
>   (a) if it's enabled then it's not working as you say; and

We already use buffer-age when available.

>   (b) you shouldn't do that at all (which we learned the hard way in
> Compiz): because
>     (i) Not all drivers can flip a subset of the screen with reliable vsync
> to avoid tearing; and

We use eglSwapBuffersRegion() when available. Without it we can still use buffer regions, as that's another thing.

>     (ii) Single/double buffering hurts frame rates as you can't pipeline the

We always double-buffer.

> GPU effectively (using triple buffering); and
>     (iii) Tracking damage is too CPU-intensive; and

We already track damage, but damage regions in the clutter stage, and damage in prevously swapped buffers (to be used when comparing buffer age). Why would it be CPU intensive?

>     (iv) Saving previous work offscreen is easy without framebuffer damage
> tracking -- just use a texture. As does clutter's offscreen redirect
> feature. Just turn it on.
> 
> Seriously. I'm not making this up. I learned by gnome-shell was inefficient
> the hard way by looking at CPU profiles of it for hours on end. Then
> prototyped some fixes. Then verified the fixes work. It makes no sense why
> you wouldn't want an optimization that is both theoretically correct and
> practically proven.

You are missing the point. If the code that exist and is used has not regressed, and that nothing is explicitly reporting damage that overlaps with the panel, the optimization should not be needed and would instead just increase memory usage.
Comment 23 Daniel van Vugt 2018-01-23 03:38:21 UTC
OK, when I context switch back to this work I will start profiling using small (definitely not overlapping) app windows...

Still, as with everything computational you need to trade time vs space. Gnome Shell is presently too CPU-intensive for many machines so I'm trying to make it more efficient. And I don't think the argument that keeping a texture of just the panel rectangle uses too much graphics memory is a very good one.

The panel on a FHD screen is 1920 wide by 27 pixels high. So that's 1920 x 27 x 4 bpp = 207360 bytes of video memory. To save a noticeable amount of CPU that's a small price to pay.
Comment 24 Florian Müllner 2018-01-23 09:17:27 UTC
(In reply to Daniel van Vugt from comment #23)
> The panel on a FHD screen is 1920 wide by 27 pixels high. So that's 1920 x
> 27 x 4 bpp = 207360 bytes of video memory. To save a noticeable amount of
> CPU that's a small price to pay.

Maybe, but it's hard to tell while there is a bug that causes unnecessary redraws. Until that is fixed, making redraws more efficient is just papering over the real bug.

If it turns out that *after* the bug has been fixed the optimization still provides a reasonable trade-off, then we should apply it.

But as it stands, it is a workaround for the bug you have identified, not a fix.
Comment 25 Daniel van Vugt 2018-01-23 10:02:05 UTC
Update: I have now confirmed my test case was just accidentally pathological, and caused constant panel repaints that most test cases would not.

The problem specifically is that old-school (GL)X apps don't support window management hints, so they try to go fullscreen by just placing and resizing themselves to fit the monitor. This means they cover the whole screen area but the shell ensures the panel is still on top. So they /look/ maximized.

Changing to the same window properly maximized using the window menu avoids the overlap and avoids the constant panel repaints.

So yes, the bug exists, but not often enough to be particularly important. You may wish to say WONTFIX if you feel a that memory usage is too large. I'm working on many other things at the moment so the outcome of this particular bug is not something that bothers me now.
Comment 26 Ray Strode [halfline] 2018-01-23 14:57:07 UTC
So there's still something fishy going on.  Even if the application requests a full monitor usize/uposition, the window manager doesn't need to honor it. I would expect mutter's constraints code to position the window within the workarea, (so y=24 or whatever the panel height is).

Or is glxgears override redirect when it's started fullscreen? If not, maybe there's an off-by-one in the code?
Comment 27 Daniel van Vugt 2018-08-28 07:12:03 UTC
Fix committed to gnome-shell master, scheduled for release in version 3.29.92:
https://gitlab.gnome.org/GNOME/gnome-shell/commit/f77b3da74fccfd

To answer Ray's question, I suspect what we were missing back in January is that when some actors are offscreened, their FBOs sometimes get expanded to a few (up to 3) pixels bigger than the actor. This is a "feature" of _clutter_paint_volume_get_stage_paint_box apparently:

https://gitlab.gnome.org/GNOME/mutter/blob/master/clutter/clutter/clutter-paint-volume.c#L1169

So offscreened actors not even touching may be calculated as having paint volumes that slightly overlap with some transparent pixels. So the panel's bounding box starts at something like (-2,-2) or (-1,-1) - I forget now. By the same mechanism the panel will be calculated as overlapping any maximized window by 1 or 2 pixels.

That all said, repainting the panel now mostly has negligible overhead since it's a pre-rendered texture already on the GPU, after commit f77b3da74fccfd.