After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 169544 - reduce memory consumption in GdkRGB
reduce memory consumption in GdkRGB
Status: RESOLVED WONTFIX
Product: gtk+
Classification: Platform
Component: Backend: X11
unspecified
Other All
: Normal normal
: Medium fix
Assigned To: gtk-bugs
gtk-bugs
Small Patch
Depends on:
Blocks:
 
 
Reported: 2005-03-07 23:52 UTC by Matthias Clasen
Modified: 2007-06-15 08:58 UTC
See Also:
GNOME target: ---
GNOME version: 2.9/2.10


Attachments
patch by Matthias (2.88 KB, patch)
2006-09-29 12:13 UTC, Tim Janik
none Details | Review
patch by Matthias (3.79 KB, patch)
2006-09-29 12:14 UTC, Tim Janik
none Details | Review

Description Matthias Clasen 2005-03-07 23:52:51 UTC
Please describe the problem:
GdkRGB always allocates ~400K of memory to use for shared memory image transport
between client and X server. This memory is shared with the X server (if it is
local), but each client has its own copy. The only case where performance will
really suffer drastically without the shared memory is if the X server doesn't
have the RENDER extension, since GTK has to pull a lot of image data from the
server side for client-side compositing. 

Here is the plan:
- Only allocate shared memory if it is needed (e.g. not if the X server
  is remote)
- Make it per-screen configurable wether to use shared memory for image
transport or not
- Default to not using shared memory unless the X server is local and doesn't
  have the RENDER extension


Steps to reproduce:



Actual results:


Expected results:


Does this happen every time?


Other information:
Comment 1 Tim Janik 2006-09-11 12:41:27 UTC
wouldn't it also make sense to release those cached segments after being unused for a while? say after 60 seconds or so, that way programs which need to be drawn very seldomly don't constantly waste space (e.g. panel applets).
Comment 2 Matthias Clasen 2006-09-16 04:59:32 UTC
Looking at Owens findings in 

http://lists.freedesktop.org/archives/xorg/2006-September/017897.html

I'd rather get rid of the caching altogether. I've posted a patch to
do that here:

http://mail.gnome.org/archives/performance-list/2006-July/msg00012.html
Comment 3 Tim Janik 2006-09-29 12:13:14 UTC
Created attachment 73625 [details] [review]
patch by Matthias

Matthias described it as:
>These are about the shared memory area which gdk allocates
>for image transport to the X server.
>The first patch just turns off the shared memory, but still allocates
>the same amount of scratch GdkImages.
Comment 4 Tim Janik 2006-09-29 12:14:38 UTC
Created attachment 73626 [details] [review]
patch by Matthias

Matthias described it as:
> These are about the shared memory area which gdk allocates
> for image transport to the X server.
> The second patch does away with the scratch images altogether
> and just allocates and frees a suitable GdkImage whenever one
> is needed.
Comment 5 Tim Janik 2006-09-29 12:33:11 UTC
(In reply to comment #2):
> I'd rather get rid of the caching altogether. I've posted a patch to
> do that here:
> 
> http://mail.gnome.org/archives/performance-list/2006-July/msg00012.html

i've re-diffed and attached the patches individually because they were hard to read without diff -up and because getting them from the email webarchive needs transliteration of xml escapes.

also, i've run gtkperf and testrgb on:
  Athlon 1833.218MHz 512KB,
  XFree86 Version 4.3.0.1 (Debian 4.3.0.dfsg.1-14sarge1 20050901212727)
  OS Kernel: Linux version 2.6.12.4
  Device "Matrox Graphics, Inc. MGA G550 AGP"
  MGA(0): Direct rendering disabled
  RandR enabled

with these results:
  GtkPerf, stock: Total time: 17.81
  GtkPerf, diff1: Total time: 18.20
  GtkPerf, diff2: Total time: 18.22
that is, some additional time gets consumed by the image creation, but that is really spread out over the individual tests and doesn't present a significant difference for the user.

blitting looks different though, numbers are in megapixels/s:
  testrgb, stock: Color=33.00 Greyscale=31.84 Alpha=01.89
  testrgb, diff1: Color=21.77 Greyscale=21.46 Alpha=00.54
  testrgb, diff2: Color=21.93 Greyscale=21.63 Alpha=00.54
that is, blitting gets faster by a rough third.

to summarize, i think we should apply diff2. this'll get us significant memory and speed savings for blitting. for regular drawing it introduces a slight but i think unnoticable penalty.
Comment 6 Tim Janik 2006-09-29 12:37:56 UTC
(In reply to comment #5)
> to summarize, i think we should apply diff2. this'll get us significant memory
> and speed savings for blitting. for regular drawing it introduces a slight but
> i think unnoticable penalty.

sorry, i screwed up the interpretation here.
the stock testrgb run is *faster* (as for megapixels/s, greater numbers indicate *better* ;), here's the complete results:

stock:
  Chose visual type=4 depth=24, image bpp=32, lsb first
  Color test time elapsed: 0.39s, 128.9 fps, 33.00 megapixels/s
  Grayscale test time elapsed: 0.40s, 124.4 fps, 31.84 megapixels/s
  Alpha test time elapsed: 6.78s, 7.4 fps, 1.89 megapixels/s
  Alpha test (to pixmap) time elapsed: 6.77s, 7.4 fps, 1.89 megapixels/s

diff1:
  Chose visual type=4 depth=24, image bpp=32, lsb first
  Color test time elapsed: 0.59s, 85.0 fps, 21.77 megapixels/s
  Grayscale test time elapsed: 0.60s, 83.8 fps, 21.46 megapixels/s
  Alpha test time elapsed: 23.84s, 2.1 fps, 0.54 megapixels/s
  Alpha test (to pixmap) time elapsed: 23.84s, 2.1 fps, 0.54 megapixels/s

diff2:
  Chose visual type=4 depth=24, image bpp=32, lsb first
  Color test time elapsed: 0.58s, 85.7 fps, 21.93 megapixels/s
  Grayscale test time elapsed: 0.59s, 84.5 fps, 21.63 megapixels/s
  Alpha test time elapsed: 23.89s, 2.1 fps, 0.54 megapixels/s
  Alpha test (to pixmap) time elapsed: 23.91s, 2.1 fps, 0.54 megapixels/s

outlook not so rosy anymore for applying diff1/diff2 ;)
Comment 7 Matthias Clasen 2006-09-29 14:08:30 UTC
We should probably look at where the time is actually spent. 
From my look at gdkdrawable-x11.c:draw_images, it appears we are

1) allocating a depth 32 pixmap in the server
2) allocating a depth 32 image in the client
3) converting whatever image data we have into the depth 32 image
4) calling gdk_draw_image to transfer the data from the image to the pixmap
5) calling XRenderComposite with Over to draw the pixmap to the final destination
Comment 8 Owen Taylor 2006-09-29 14:15:11 UTC
I suspect the really big difference for the alpha version (the only
one where the speed difference *really* matters, likely) is that the 
SHM pixmap usage is keeping the source image in system memory.

If you can try repeating the test with the XaaNoOffscreenPixmaps server
option set, that would be interesting. The timings you have there look
to me like that with the old code you were doing:

 - Software compositing, source in system memory, destination in video
   memory. (Bad)

And with diff1/diff2 you are doing:

 - Software compositing, source in video memory, destination in video
   memory. (Really, really, bad)

Comment 9 Matthias Clasen 2006-12-20 21:36:05 UTC
Tim, did you ever repeat your measurements with XaaNoOffscreenPixmaps ?
Comment 10 Tim Janik 2007-05-24 14:11:06 UTC
benchamrk on same machine with option "XaaNoOffscreenPixmaps" "true" in the device section:

stock:
Chose visual type=4 depth=24, image bpp=32, lsb first
Color test time elapsed: 0.38s, 130.8 fps, 33.49 megapixels/s
Grayscale test time elapsed: 0.38s, 130.2 fps, 33.34 megapixels/s
Alpha test time elapsed: 6.73s, 7.4 fps, 1.90 megapixels/s
Alpha test (to pixmap) time elapsed: 1.56s, 32.0 fps, 8.19 megapixels/s

diff1:
Chose visual type=4 depth=24, image bpp=32, lsb first
Color test time elapsed: 0.52s, 96.1 fps, 24.59 megapixels/s
Grayscale test time elapsed: 0.53s, 94.6 fps, 24.22 megapixels/s
Alpha test time elapsed: 7.03s, 7.1 fps, 1.82 megapixels/s
Alpha test (to pixmap) time elapsed: 1.83s, 27.3 fps, 7.00 megapixels/s

diff2:
Chose visual type=4 depth=24, image bpp=32, lsb first
Color test time elapsed: 0.56s, 90.0 fps, 23.03 megapixels/s
Grayscale test time elapsed: 0.56s, 89.3 fps, 22.87 megapixels/s
Alpha test time elapsed: 7.16s, 7.0 fps, 1.79 megapixels/s
Alpha test (to pixmap) time elapsed: 1.95s, 25.6 fps, 6.56 megapixels/s

looks like the diffs are still slower than stock. however the "Alpha test (to pixmap)" throughput has increased in all 3 cases.
Comment 11 Tim Janik 2007-06-04 12:17:17 UTC
due to the last profiling findings i posted, i'd like to close this report unless anyone speaks up with issues still outstanding and worth discussing here.