GNOME Bugzilla – Bug 169544
reduce memory consumption in GdkRGB
Last modified: 2007-06-15 08:58:57 UTC
Please describe the problem: GdkRGB always allocates ~400K of memory to use for shared memory image transport between client and X server. This memory is shared with the X server (if it is local), but each client has its own copy. The only case where performance will really suffer drastically without the shared memory is if the X server doesn't have the RENDER extension, since GTK has to pull a lot of image data from the server side for client-side compositing. Here is the plan: - Only allocate shared memory if it is needed (e.g. not if the X server is remote) - Make it per-screen configurable wether to use shared memory for image transport or not - Default to not using shared memory unless the X server is local and doesn't have the RENDER extension Steps to reproduce: Actual results: Expected results: Does this happen every time? Other information:
wouldn't it also make sense to release those cached segments after being unused for a while? say after 60 seconds or so, that way programs which need to be drawn very seldomly don't constantly waste space (e.g. panel applets).
Looking at Owens findings in http://lists.freedesktop.org/archives/xorg/2006-September/017897.html I'd rather get rid of the caching altogether. I've posted a patch to do that here: http://mail.gnome.org/archives/performance-list/2006-July/msg00012.html
Created attachment 73625 [details] [review] patch by Matthias Matthias described it as: >These are about the shared memory area which gdk allocates >for image transport to the X server. >The first patch just turns off the shared memory, but still allocates >the same amount of scratch GdkImages.
Created attachment 73626 [details] [review] patch by Matthias Matthias described it as: > These are about the shared memory area which gdk allocates > for image transport to the X server. > The second patch does away with the scratch images altogether > and just allocates and frees a suitable GdkImage whenever one > is needed.
(In reply to comment #2): > I'd rather get rid of the caching altogether. I've posted a patch to > do that here: > > http://mail.gnome.org/archives/performance-list/2006-July/msg00012.html i've re-diffed and attached the patches individually because they were hard to read without diff -up and because getting them from the email webarchive needs transliteration of xml escapes. also, i've run gtkperf and testrgb on: Athlon 1833.218MHz 512KB, XFree86 Version 4.3.0.1 (Debian 4.3.0.dfsg.1-14sarge1 20050901212727) OS Kernel: Linux version 2.6.12.4 Device "Matrox Graphics, Inc. MGA G550 AGP" MGA(0): Direct rendering disabled RandR enabled with these results: GtkPerf, stock: Total time: 17.81 GtkPerf, diff1: Total time: 18.20 GtkPerf, diff2: Total time: 18.22 that is, some additional time gets consumed by the image creation, but that is really spread out over the individual tests and doesn't present a significant difference for the user. blitting looks different though, numbers are in megapixels/s: testrgb, stock: Color=33.00 Greyscale=31.84 Alpha=01.89 testrgb, diff1: Color=21.77 Greyscale=21.46 Alpha=00.54 testrgb, diff2: Color=21.93 Greyscale=21.63 Alpha=00.54 that is, blitting gets faster by a rough third. to summarize, i think we should apply diff2. this'll get us significant memory and speed savings for blitting. for regular drawing it introduces a slight but i think unnoticable penalty.
(In reply to comment #5) > to summarize, i think we should apply diff2. this'll get us significant memory > and speed savings for blitting. for regular drawing it introduces a slight but > i think unnoticable penalty. sorry, i screwed up the interpretation here. the stock testrgb run is *faster* (as for megapixels/s, greater numbers indicate *better* ;), here's the complete results: stock: Chose visual type=4 depth=24, image bpp=32, lsb first Color test time elapsed: 0.39s, 128.9 fps, 33.00 megapixels/s Grayscale test time elapsed: 0.40s, 124.4 fps, 31.84 megapixels/s Alpha test time elapsed: 6.78s, 7.4 fps, 1.89 megapixels/s Alpha test (to pixmap) time elapsed: 6.77s, 7.4 fps, 1.89 megapixels/s diff1: Chose visual type=4 depth=24, image bpp=32, lsb first Color test time elapsed: 0.59s, 85.0 fps, 21.77 megapixels/s Grayscale test time elapsed: 0.60s, 83.8 fps, 21.46 megapixels/s Alpha test time elapsed: 23.84s, 2.1 fps, 0.54 megapixels/s Alpha test (to pixmap) time elapsed: 23.84s, 2.1 fps, 0.54 megapixels/s diff2: Chose visual type=4 depth=24, image bpp=32, lsb first Color test time elapsed: 0.58s, 85.7 fps, 21.93 megapixels/s Grayscale test time elapsed: 0.59s, 84.5 fps, 21.63 megapixels/s Alpha test time elapsed: 23.89s, 2.1 fps, 0.54 megapixels/s Alpha test (to pixmap) time elapsed: 23.91s, 2.1 fps, 0.54 megapixels/s outlook not so rosy anymore for applying diff1/diff2 ;)
We should probably look at where the time is actually spent. From my look at gdkdrawable-x11.c:draw_images, it appears we are 1) allocating a depth 32 pixmap in the server 2) allocating a depth 32 image in the client 3) converting whatever image data we have into the depth 32 image 4) calling gdk_draw_image to transfer the data from the image to the pixmap 5) calling XRenderComposite with Over to draw the pixmap to the final destination
I suspect the really big difference for the alpha version (the only one where the speed difference *really* matters, likely) is that the SHM pixmap usage is keeping the source image in system memory. If you can try repeating the test with the XaaNoOffscreenPixmaps server option set, that would be interesting. The timings you have there look to me like that with the old code you were doing: - Software compositing, source in system memory, destination in video memory. (Bad) And with diff1/diff2 you are doing: - Software compositing, source in video memory, destination in video memory. (Really, really, bad)
Tim, did you ever repeat your measurements with XaaNoOffscreenPixmaps ?
benchamrk on same machine with option "XaaNoOffscreenPixmaps" "true" in the device section: stock: Chose visual type=4 depth=24, image bpp=32, lsb first Color test time elapsed: 0.38s, 130.8 fps, 33.49 megapixels/s Grayscale test time elapsed: 0.38s, 130.2 fps, 33.34 megapixels/s Alpha test time elapsed: 6.73s, 7.4 fps, 1.90 megapixels/s Alpha test (to pixmap) time elapsed: 1.56s, 32.0 fps, 8.19 megapixels/s diff1: Chose visual type=4 depth=24, image bpp=32, lsb first Color test time elapsed: 0.52s, 96.1 fps, 24.59 megapixels/s Grayscale test time elapsed: 0.53s, 94.6 fps, 24.22 megapixels/s Alpha test time elapsed: 7.03s, 7.1 fps, 1.82 megapixels/s Alpha test (to pixmap) time elapsed: 1.83s, 27.3 fps, 7.00 megapixels/s diff2: Chose visual type=4 depth=24, image bpp=32, lsb first Color test time elapsed: 0.56s, 90.0 fps, 23.03 megapixels/s Grayscale test time elapsed: 0.56s, 89.3 fps, 22.87 megapixels/s Alpha test time elapsed: 7.16s, 7.0 fps, 1.79 megapixels/s Alpha test (to pixmap) time elapsed: 1.95s, 25.6 fps, 6.56 megapixels/s looks like the diffs are still slower than stock. however the "Alpha test (to pixmap)" throughput has increased in all 3 cases.
due to the last profiling findings i posted, i'd like to close this report unless anyone speaks up with issues still outstanding and worth discussing here.