After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 95802 - metacity performance
metacity performance
Status: RESOLVED FIXED
Product: metacity
Classification: Other
Component: general
unspecified
Other All
: Normal normal
: ---
Assigned To: Metacity maintainers list
Metacity maintainers list
: 96042 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2002-10-15 10:17 UTC by Brian Cameron
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: ---
GNOME version: 2.0


Attachments
profile by function (249.56 KB, patch)
2002-10-30 21:43 UTC, Havoc Pennington
none Details | Review

Description Brian Cameron 2002-10-15 10:17:54 UTC
I performed a test where I started up a desktop, launched
8 gnome-calculators in 3 different workspaces, and then
switched between the 3 workspaces 10 times (a total of 30
switches).

Performance analysis with teh Forte Analysis programs
highlighted that that metacity spent 10.58 user seconds
and 5.610 system seconds in this test.

8.260 user seconeds is spent in meta_frames_expose_event.
most of this time is spent in a static function called by
pixops_scale.

meta_frames_expose_event         8.260 user, 0.820 system
meta_frames_paint_to_drawable    8.260 user, 0.820 system
meta_theme_draw_frame            7.950 user, 0.820 system
meta_frame_style_draw            7.940 user, 0.820 system
meta_draw_op_list_draw           7.890 user, 0.820 system
meta_draw_op_draw_with_env       7.870 user, 0.820 system
draw_op_as_pixbuf                5.410 user, 0.190 system
scale_and_alpha_pixbuf           5.400 user, 0.190 system
gdk_pixbuf_scale_simple          5.490 user, 0.190 system
pixops_scale                     5.340 user, 0.190 system
static function                  5.340 user, 0.190 system

  [ note gdk_pixbuf_scale_simple is also called by
    scaled_from_pixdata which accounts for 0.100 user time ]

Not sure if this code has opportunities for tuning, but this
is where metacity is spending most of its time when switching
desktops.
Comment 1 Havoc Pennington 2002-10-15 13:59:36 UTC
This is going to be theme-dependent; for a theme with no 
pixmaps, it wouldn't show up.
Comment 2 Brian Cameron 2002-10-16 08:15:52 UTC
Understandable.  Still, it would be nice if metacity were a bit more
snappy at dealing with themes with pixmaps.  Why is it necessary to
rescale the images more than once?  For example, the pixmap used
in the titlebar is the same scale for all windows in all workspaces.
Couldn't it be scaled one time, and then cache the scaled image for
reuse?  Obviously it would need to be rescaled if a user preference
were changed that affected the pixmap sizes.  So it might be also
necessary to cache such preferences and rescale if any has changed.
Or am I completely misunderstanding the situation?  Or are themes
with pixmaps going to be such a rarity that optimizing their
behavior in metacity is not really useful?

Comment 3 Brian Cameron 2002-10-16 10:58:26 UTC
I've verified with the Forte analyzer that switching my theme
to Atlanta removes pixops_scale from being a significant contributor
to the time spent in metacity.
Comment 4 Brian Cameron 2002-10-16 12:06:01 UTC
I ran a metacity test where I compared continually resizing a
window for 3 minutes with Crux vs. with Atlanta themes.  

Overall time with Crux:    16.270 user, 8.550 system
  (of which 0.480 user and 6.570 system was spent in _poll)
Overall time with Atlanta:  5.880 user, 8.620 system
  (of which 0.440 user and 6.960 system was spent in _poll)

Using Crux, 8.390 user and 0.320 system was spent in the same
stack trace as with switching workspaces.

I didn't give similar overall time specs with my previous switching
workspace test, so this information follows for comparison:

In the switching workspace test the times were:
Overall time with Crux:    10.580 user, 5.610 system
  (of which 0.150 user and 3.930 system was spent in _poll)
Overall time with Atlanta:  6.270 user, 7.550 system
  (of which 0.280 user and 6.120 system was spent in _poll)
Comment 5 Havoc Pennington 2002-10-16 14:45:51 UTC
It'd be nice to optimize, sure. However a pixmap theme is always 
going to have more overhead than a regular theme (that's why 
metacity supports non-pixmap themes).

It doesn't do caching because 1) it would use a ton of RAM and 
2) it would not help for resizing, because the size keeps changing.
Maybe the workspace switch is more important, dunno. Also, it's 
possible that the MMX stuff on x86 reduces the impact of the scaling.

Note that metacity-theme-viewer will time the theme being viewed for
you, to give an idea how fast/slow it is.
Comment 6 Brian Cameron 2002-10-16 15:49:17 UTC
Thanks for your comments.  Aren't there still opportunities for 
improving the speed?  

The code is spending a lot of time resizing the icons while the
user is in the process of resizing a window.  I am not sure this
is really necessary.  Couldn't the resizing be delayed until the
user is actually finished with the resize operation?   Or at least
make it possible to tune the code to do the resizing less often
during a resize?

I still suspect that caching the resized icons would be useful.
Not just for switching workspaces and de-iconifying windows, but
also for some window resize operations.

In other words, it is not necessary to resize the x-axis icons if
only the y-axis is being resized, and vice versa.  Some icons
(like corners and the titlebar buttons) should never need to be
resized.  So using a cached version of icons that don't resize
seems appropriate.  Or perhaps the code is already smart enough
to only resize those icons that need to be resized?

Are the other opportunities to avoid resizing by just being smarter
about identifying the need to do so?  Also, do you suspect that
there may be opportunties to make gdk_pixbuf_scale_simple() more
effecient?

I realize that this isn't so much of a performance issue running
on Intel (perhaps because of MMX).  However, the delays during
window resizing and workspace switching is very noticable and are
often complained about issues on Solaris.  Therefore, we would be
delighted to address any issues that would improve performance
in these areas.

I'm mostly looking, at this point, for advice about the best ways
to approach tuning this behavior.  I am hoping that we don't just
have to grin and bear these performance problems.  :)
Comment 7 Havoc Pennington 2002-10-17 20:33:30 UTC
> Couldn't the resizing be delayed until the
> user is actually finished with the resize operation?   

That would look pretty ugly.

> Or at least
> make it possible to tune the code to do the resizing less often
> during a resize?

It's already rate-limited. You could tune the rate to which it's
limited, though using less CPU just for the sake of less CPU seems
pointless; you really want to update as fast/often as you can
when resizing, I would think.


I personally doubt there are many opportunities to benefit from 
caching during resize; most resizes are in both dimensions. 
A cache may help more during workspace switching, but 
to help you'd need to cache all pixbufs for all windows, 
and you will just trade speed complaints for memory complaints.
Maybe the answer to speed complaints is to use a faster theme 
by default?

scale_simple can probably be sped up a bit, but it's going to 
be a matter of fooling around with inner loops and assembly and 
stuff like that, most likely.

It's probably also possible to tweak Crux itself to use fewer or
smaller pixmaps, if you were so inclined, and maybe willing to 
change the design somewhat if required.

I dunno, you'd have to experiment.
Comment 8 Havoc Pennington 2002-10-18 22:52:10 UTC
Suggestion in http://bugzilla.gnome.org/show_bug.cgi?id=95520
may reduce expose events when switching spaces and 
improve things a bit. It would only matter if your 
gnome-calculators were overlapping each other though.
Comment 9 Brian Cameron 2002-10-23 13:48:23 UTC
Looking more closely at the scaling code I notice that resizing is
happening in other situations where caching could speed things.
  
1. When a window is moved or resized over another window, the
   window in back constantly resizes its borders to meet the
   window in front.  So if window A moves across the left-side
   border of window B, the top and  bottom of that left-hand border
   are resized to meet the edges of window A.  Obviously this resize
   happens over & over again while window A is being moved.  
   Caching would completely avoid the need to rescale here.
2. When a window gains or loses focus.  I suspect that this is
   because different graphics are used to indicate the "selected"
   window.  Caching both the "selected" and "unselected" graphics
   would eliminate the need for resizing all the icons.

[ and the following two points which have been discussed already ]

3. When a window is resized.  Caching would benefit some situations
   Specifically when the window is resized only in the x or y
   dimension.
4. Some icons only need to be rescaled when the font used on the
   titlebar changes (e.g. buttons).  Caching would avoid rescaling
   them when.

It seems the main reason for not caching these images is because it
would use so much memory.  However, don't the scaled borders that
are displayed on the screen take up memory anyway?  Some benefit
(to 1, 3, and 4) would be gained from simply checking to see if the
current image simply does not need to be refreshed, and avoiding
the rescale completely.

Are the scaled images so large that they would create a memory usage
problem if they were cached?  While I realize that to support such
caching, every window on the screen would need to cache its own
borders, these scaled images are typically quite small aren't they?
Also, only those windows that are not iconified would really get the
most benefit from such caching...so it would perhaps be acceptable
to throw out the cache when a window becomes iconified.  This might
help keep the caching from eating up too much memory.

Just tossing out some ideas based upon my better understanding of
exactly when resizing is happening.

Your suggestion of tuning the pixbuf scaling that is done when a
window is resized might be useful to us.  These opporations are
sluggish on Solaris, so I don't think it is a matter of saving CPU
for just saving CPU.  It is an attempt to address the sluggishness,
especially on lower-end Sun hardware (Ultra 10/60/etc). 

I am currently working to implement pixbuf scaling code that uses
the VIS architecture on Sparc chips (similar to MMX on Intel).  This
is coming along nicely and seems to speed up the crux theme by
twofold (according to the metacity-theme-viewer timings).  While this
is an important improvement, it doesn't completely address the 
sluggishness problem.  So we may have to also do some other tuning
to get the behavior to an acceptable level.
Comment 10 Havoc Pennington 2002-10-23 22:21:29 UTC
> Are the scaled images so large that they would create a memory usage
> problem if they were cached?  While I realize that to support such
> caching, every window on the screen would need to cache its own
> borders, these scaled images are typically quite small aren't they?

Here is the math as I see it, assume a 400x400 window, with 40-pixel 
titlebar and 10-pixel sides, and a theme which has one pixbuf scaled 
to cover each of the 4 edges. 

400*400 - 360*380 = 23200 pixels
23200 pixels * 4 bytes per pixel = 92800

So 90K for a 400x400 window. A larger window more, a smaller window less.

Some themes might use less than that, if they say only use pixmaps for 
a part of the frame area; some might use more, if for example they 
overlap pixmaps.

Some cheesy math, 100 windows = 9 megs of cache, so 
10 windows is 1 meg of cache.

Whether that's OK I don't know.
Comment 11 Havoc Pennington 2002-10-23 22:46:09 UTC
Owen points out another possible optimization that will probably fix
Crux. The code for this can be copied from the pixbuf engine in
gtk-engines.

The idea is to optimize scaling pixbufs that are just a bunch of 
vertical or horizontal stripes.

When we load each pixbuf, we analyze whether all the pixels in each 
row are the same. If so, we scale by taking the first pixel of each 
row and copying it over and over, instead of using gdk_pixbuf_scale.
Similarly, if all the pixels in each column are the same, we scale 
by memcpy()'ing the first row to each subsequent row in the new
image. 

The code in pixbuf-engines assumes we're scaling in only a single
direction. To handle Crux we should handle scaling in both directions,
as follows. If a pixbuf is made up of horizontal stripes, we first 
scale a single-pixel column vertically using gdk_pixbuf_scale, then 
we copy the pixels from that into the destination image's rows.
If a pixbuf is made up of vertical stripes, we scale the first row
horizontally using gdk_pixbuf_scale(), then memcpy the result 
to each row of the destination image.

Crux has quite a few stripey images, so this should speed it up 
substantially.

At that point if we still need the cache, we can cache only
non-stripey images.
Comment 12 Brian Cameron 2002-10-30 11:40:30 UTC
Thanks for your & Owen's comments.  We're looking into making a
Metacity-specific scaling function to speed things up.  I've been
thinking about this and have a few more thoughts to share:

1. Note that moving/resizing as wireframes would eliminate all of
   the time spent in resizing border pixmaps during move/resize
   or when a window is moved over another window.  Perhaps a good
   reason to support wireframe move/resize.

2. Would it be possible to cache the height/width of a given pixmap
   and not redraw that border element to the frame buffer if the
   size hasn't changed?  This would speed the situations where
   windows are de-iconified, workspaces are switched, and when a 
   window is resized in just 1 dimension.  It would also benefit a
   2-dimension resize for some of the pixmaps (like corners and
   other elements that do not actually change size).  I suspect
   there might be a bit of work involved with this since some 
   icons are on top of other icons (like buttons) and if the 
   underlying graphic changes, then the icon would need to be 
   redrawn even if it doesn't change size.  Still, I suspect this
   would eliminate quite a lot of rescaling when not needed.
   Caching just the width/height (as integers) for each border
   image would be much less heavy on memory than caching the 
   actual scaled images.
Comment 13 Heath Harrelson 2002-10-30 13:30:01 UTC
*** Bug 96042 has been marked as a duplicate of this bug. ***
Comment 14 Havoc Pennington 2002-10-30 13:41:43 UTC
#2 is quite hard to implement I think. I can't think of how offhand.

Let's do the optimizations with no extra memory or UI cost, and then 
if there's still a problem bad enough to worry about, we can evaluate
the cost/benefit of optimizations that have negative cost.

Right now all our time is in pixops_scale, and the stripey-pixbuf
optimization should almost eliminate that time for any theme that 
looks good when scaled. (Themes that are scaling non-stripey pixbufs
look really bad anyway - indeed Crux does, if it has to scale its 
nonstripey bits. We should be able to avoid scaling those bits at 
least at the default font size.)
Comment 15 Heath Harrelson 2002-10-30 15:45:18 UTC
Batch adding GNOME2 keyword to Metacity bugs.  Sorry for the spam.
Comment 16 Havoc Pennington 2002-10-30 21:41:16 UTC
Here's some more profiling data from oprofile on Linux, which is a
whole system profiler. Profile is of a screen with nautilus, panel, 
and several terminals; one terminal is resized quickly for a long
time, causing exposes for the other three terminals and their window
frames, and for nautilus, and updating the pager applet etc.

CPU used per process, the second column is percentage, 
first column is absolute sample count.

2271       1.3186 0.0000 /usr/bin/nautilus
3323       1.9294 0.0000 /usr/bin/gnome-panel
25022     14.5283 0.0000 /usr/bin/gnome-terminal
31812     18.4708 0.0000 /usr/X11R6/bin/XFree86
40654     23.6046 0.0000 /usr/bin/metacity
66949     38.8721 0.0000 /boot/vmlinux-2.4.18-14smp

I interpret this to mean that we're spending 40% time with the kernel
doing context switches and pushing network data between the panel,
nautilus, X server, and 4 terminals.

I'll attach the per-function profile, which basically agrees with 
yours (time spent scaling pixbufs as far as metacity is concerned).
Comment 17 Havoc Pennington 2002-10-30 21:43:05 UTC
Created attachment 11922 [details] [review]
profile by function
Comment 18 Havoc Pennington 2002-10-30 21:59:20 UTC
The profile is essentially unchanged if I remove the three terminals
other than the one being resized, or if I profile workspace switching
instead of resizing. 

The conclusion I draw from that is that the slow part is redrawing, 
not resizing in particular. I guess this was obvious.
Comment 19 Havoc Pennington 2002-10-31 00:45:50 UTC
Turns out the kernel isn't context switching or network, it's just:

c0133c40 10952 6.61133 __constant_c_and_count_memset
/boot/vmlinux-2.4.18-14smp

(this is 7% in brk() or mmap(), i.e. memory allocation from 
userspace malloc() usage)

c01070d0 39900 24.0862 default_idle /boot/vmlinux-2.4.18-14smp

(this is doing nothing, just sitting around waiting for an interrupt,
I suppose that may mean waiting for IO to complete or something)
Comment 20 Brian Cameron 2002-11-06 12:52:54 UTC
The patch to optimize stripey patches that went in on 11/04 has
made a huge difference, bringing down the time metacity-theme-viewer
from around 2.5 seconds to 1.10 seconds (or 1.05 with mediaLib).

I ran the Forte Performance Analyzer tool against
metacity-theme-viewer to see where time is now being spent.  The
tool lets you add up the performance of multiple runs, so I ran
metacity-theme-viewer 10 times.  So these timings are adding up
10 separate runs:

20.100 user time, 45.540 wait time, 2.580 system time
(12.510 seconds of the wait time was in _poll).

11% of user time (2.220 user seconds and 0.010 system) seconds 
was spent in:

gdk_pixbuf_render_to_drawable_alpha
_gdk_draw_pixbuf
gdk_drawable_real_draw_pixbuf
composite_0888

2.3% of user time (0.480 user/0.520 Wall/0.040 system) was spent
here:

.gdk_pixbuf_render_to_drawable_alpha
_gdk_draw_pixbuf
gdk_drawable_real_draw_pixbuf
gdk_draw_rgb_image_dithalign
gdk_draw_rg_image_core
gdk_rgb_convert_0888

12.4% of user time and 58% of system time (2.500 user seconds, 
12.140 wait seconds and 1.520 system seconds) were spent in:

gdk_pixbuf_render_to_drawable_alpha
_gdk_draw_pixbuf
gdk_drawable_real_draw_pixbuf
_gdk_drawable_copy_to_image
_gdk_x11_copy_to_image
XGetSubImage

XGetSubImage spent it's time in XGetImage (1.430 user/1.680 Wall)
_XSetImage (1.030 user/10.420 Wall) and a little bit in
_XDestroyImage  (0.020 user/0.020 Wall).  _XSetImage spent roughly
equal time in _XPutPixel32 and _XGetPixel32.  Here's the detail
on XGetImage:

XGetImage spent its time in these functions:
  _XReply (0.510 user/9.510 Wall)          
  malloc (0.140 user/0.150 Wall)
  XCreateImage (0.120 user/0.130 Wall)

_XReply spent 0.270 user/6.440 Wall in _XFlushInt
              0.270 user/6.900 Wall in _XRead.


To further speed up metacity, these seem to be the areas that would
most benefit.  I would appreciate any suggestions regarding where
time would be best spent, or ideas of how to best approach these
areas.  

Perhaps getting the draw functions to make more use of MMX on Intel
and mediaLib on Sparc might make a difference?
Comment 21 Havoc Pennington 2002-11-06 15:28:17 UTC
It looks like the main problem now is latency for GetImage requests. 
This is sucking down pixels for alpha compositing, due to lack of 
the Xrender extension. So the client is spending a lot of time 
waiting for replies from the server with the GetImage reply data.
The main client-side CPU usage now is apparently actually doing the
compositing, and converting from pixbuf format to display format, 
those things are going to be hard to speed up but don't seem like 
a huge problem anyhow, they are much smaller than the latency problem.

The first thing I would do is look at how many of the images in Crux
actually have an alpha channel, and how many are fully opaque. If 
fully opaque, I would be darn sure gdk_pixbuf_get_has_alpha() == FALSE
at the time that we draw the final scaled pixbuf for those images. If
the pixbuf has no alpha channel, GDK should not be doing the GetImage
stuff (though possibly it still is, if so it should be fixed).

For things that have alpha, if the alpha is 1-bit alpha, we can
potentially record that fact alongside the vertical_striped etc.
flags, and for one-bit alpha pixbufs use render_pixmap_and_mask to 
draw the pixbuf with a bitmask, instead of alpha compositing it. 
That might help.

If all the images currently are using full alpha compositing, then
probably we have to tweak the theme to reduce the alpha usage.






Comment 22 Brian Cameron 2002-11-15 15:27:05 UTC
Okay, the stripey patch and making Crux a bit more flat has 
improved the performance of pixmaps substantially on Solaris.
Together these improvements account for a 300% improvement
based on the statistics returned from metacity-theme-viewer.

Therefore, I think we can close this bug as being properly
addressed.