GNOME Bugzilla – Bug 685513
Memory Leak with usage (not age)
Last modified: 2014-12-01 22:49:58 UTC
There seems to be a memory leak in Gnome-Shell that comes with use (not age). Hardware: ThinkPad W510 (nVidia Quadro FX 880M, driver 304.51), Core2-Duo-based desktop with nVidia 560Ti (same driver) Software: Ubuntu Quantal Beta, up-to-the-moment updates, both 64- and 32-bit (one of each). *Note: I can *not* test with nouveau. Not only is it completely unsuitable for laptops due to terrible power management, but neither computer will boot with it - I get the PFIFO error, even though the -16 kernel was supposed to fix that. Steps to reproduce: 1) Log into Gnome-Shell and launch System Monitor (gnome-system-monitor). Set to view "all processes" and organize by the "memory" column. Take note of the amount of RAM taken by the Gnome-Shell and Xorg processes. 2) Open a bunch of programs. I use: Nautilus, Gnome-Terminal, Chrome, Banshee, EasyTag, EasyMP3Gain, Empathy, Eclipse, and Rhythmbox. 3) Close all of the programs except gnome-system-monitor. On both of my systems, the RAM usage of the Gnome-Shell and Xorg processes has increased. 4) Repeat steps 3-4, perhaps several times. Results: The Gnome-Shell process gets fatter with each repetition -- on the order of ~0.5-1.0 MiB each time on each of my computers. It isn't a lot, but it creates an environment wherein the longer one uses one's G-S session, the less free RAM one has to work with (as neither the G-S or Xorg processes ever seem to give up any of their RAM, ever.) I don't seem to have any stability or performance issues stemming from this. ALSO: Opening the Activities window raises the RAM usage by a noticeable amount, and sometimes subsequent openings also grow the process's RAM. This eventually ends, when you've clicked on every button and seen every icon you can see, but sometimes fooling around with the desktops can give it a few more KB's. *I*, of course, think this is a pretty high-priority issue, but as the RAM use and growth isn't all THAT large, that fast, I could understand if it weren't marked critical... but I think any leak is critical, so there's my bias. Also, just to clarify: if G-S is left alone, the memory use doesn't change, not even after many hours.
If the Shell memory use does not grow indefinitely, then it's not really a memory leak. There's nothing wrong with using RAM as long as it's freed when it's needed, i.e. when the system is under some memory pressure. Is that the case? Does the Shell reaches e.g. 500MB of resident memory?
It *does* grow indefinitely, yes. With opening and closing programs repeatedly throughout the day. It does *not* grow simply from poking around in Activities, nor does it grow while idle (so that's good). I've seen it as high as ~600M.
...er, it *does* grow from poking around in Activities, just not indefinitely. That takes program use. Apparently, the drawing and destroying of many windows.
This is a known bug with the nvidia driver.
I had a feeling you'd say that. If I may ask: where is the bug report for it? I looked and haven't found it, and although I admit to being ignorant about the innermost workings of X and Gnome-Shell, I tend to believe that the Xorg proc would bloat if it were a driver issue (instead of the Gnome-Shell process). I'm not saying I don't believe you. I'm saying that if it is a known nVidia problem, then I want to work with them to resolve it.
Jasper, is it bug #642652 or something else?
This is from my own observations and discussions with nvidia engineers. As far as I know, nvidia is aware of the problem, and may resolve to fix it sometime soon.
@Jasper: I opened a case with nVidia myself to see if they knew about any issue like this. They did not, or at very least, claimed not to. In fact, they first tried to dismiss it with "linux memory management is very complex, the customer is seeing the *virtual* memory usage", but have agreed to re-evaluate the situation when I provided data showing that this was not the case. They've also asked me to try 304.60 and 310.14 (beta) to see if it helps. 310 helps, in that the overall mem usage is a bit lower, but the leak persists. This was all on the 18th, and I haven't heard anything since.
And as *soon* as I type that, the nVidia support tech I'm working with responded. He's installed F17 and can see the problem occur as I've described it. He has submitted a bug to engineering for investigation (and possible patch in a later driver), but stresses that this might take some time, and that the problem may very well *not* be driver-based. Either way, he'll update me when (if?) there's another update from engineering, as he's submitted to the bug report. Should I update this bug when I get updates, or is there still work to be done in the Gnome camp to flush this out?
Thanks, that's really helpful! Please post the response with NVIDIA when you get it. It's also possible that the support technician you encountered is not aware of the bug, but I'm quite sure that several engineers are already aware of it. I will double-check, of course.
There is a way to see quickly the memory leak : click many times on an element of the top bar (like calendar or the username menu) and see the memory consumption of the gnome-shell process increases. It's working for the activites menu too.
I also have this problem using a Mobility Radeon HD 3200 Graphics card, not an nvidia card, so I suspect a bug in the gnome-shell also. I am using gnome-shell 3.6.2 with linux kernel 3.6.6 (Arch distribution) on a Lenovo X100e. Clicking repeatedly on icons in the top bar as described by Anthony Ruhier causes the memory accumulation, and it does not clear when I clear the caches: echo 3 > sudo /proc/sys/vm/drop_caches.
I can confirm that the bug also affects to Intel cards. I'm running gnome-shell 3.6.2 with an Intel GM45 card and gnome-shell memory consumption reaches 500 MiB and continues.
I have a Sandy bridge, running on Arch Linux with Gnome shell 3.6.2 and Linux 3.6.6. I have tested on Ubuntu 12.10 with Gnome Shell 3.6 and there is the same memory leak.
*** Bug 665678 has been marked as a duplicate of this bug. ***
I have recently used a computer with an Intel video card and (open source) driver. The defect as I described it and as Anthony described it does, in fact, exist on that as well. On the nVidia front, the engineers there have been in contact with me once or twice since my last input here, asking for additional info and clarifications. The investigation is active, but so far has turned up nothing wrong on the driver side. Also on the nVidia front, the 310.19 driver brings the memory usage more in line with open source drivers -- it uses a similar amount on startup (instead of 2-3x as much) and the leak affects the binary blob by nearly the same amount and rate as the open source drivers also affected. I am not a support technician with the Gnome project, but I was a successful support technician for years before I moved to development... and to my eyes, given all this new data, I think this has every indication of being a Gnome-Shell problem, and not a binary blob driver issue.
Thanks for the information, and yes, it sounds like the rest of the issue is entirely on our side. I'll keep debugging!
Yeah doesn't seem to be driver or nvidia specific. I'm on intel ivybridge and I can reproduce it easily exactly the same way breedraper mentioned. Click any item on the top panel rapidly, the memory will just keep going up and up until you stop clicking.
cloned downstream for Fedora 18 blocker / NTH discussion: https://bugzilla.redhat.com/show_bug.cgi?id=888107
I also see something like a big memory leak in Gnome-Shell. Here its memory usage indefinitely increases with almost every interaction with it and may easily pass 900 MiB. This is while with same computer and the same Linux distro, the total memory consumption of plasma-desktop and kwin together hardly reaches 350 MiB after a day, although all KDE desktop effects are enabled (desktop cube, opacity blurring, etc.).
That does not sound like the same bug. You should file it separately, with much more detail on your configuration.
There currently are issues with SpiderMonkey's GC that upstream is aware of and have no easy fix or workaround for. Periodically activating the GC using the Memory tab in the Looking Glass may reduce the memory footprint somewhat. We used to periodically invoke the GC every so often, but it lead to deadlocks. It may be worth investigating that again with upstream.
Invoking "Full GC" through "Looking Glass" does not reduce the memory. There is a leak somewhere in Gnome Shell and the only way to keep it from growing is to reload it from time to time.
I do not have much time these days and I've investigated this a bit with valgrind/massif some time ago, so I'll post here so it's not lost. I made a quick and dirty extension to show off the memleak quickly: https://github.com/zzrough/gs-extensions-popup-stress-test just click on the star icon near the clock and this will open 100 times a set of system icon popup menus. You can adjust the frequency and times. But with the defaults, gnome-shell memory increases by ~+310Mb on my box (I've triggered the gc many times after the test just to be on the safe side, this results in no change). It seems there are two issues: - cairo is leaking on xlib-shm destruction (fixed upstream in http://cgit.freedesktop.org/cairo/commit/?id=5d21c9e224617110678a5b854a6a41049a02fca2) - the other issue involves st-drawing-area and its 'repaint' signal where the cairo context is not destroyed early enough, I'll explain that. st_drawing_area_paint is doing the following if a repaint is needed: - it creates a surface (surface-ref-count=1) - it creates a context (surface-ref-count=3, context-ref-count=1) - it emits the 'repaint' signal which is handled by the widget - it disposes the context (surface-ref-count=3, context-ref-count=1) - it disposes the surface (surface-ref-count=2, context-ref-count=1) In a repaint callback, for instance in _drawBorder of boxpointer, as soon as the area.get_context() is captured in a local variable, the context is ref'ed by _gjs_cairo_context_construct_internal for the get_context call, but is not destroyed when execution exits the scope of the variable. This means neither the temporary context nor the temporary surface from st_drawing_area_paint is disposed (the context is certainly disposed when the GC kicks in though). If I do not use the context getter, the memory increase is halved (~+150MB on my box), so there are certainly additional retention issues, but this might be a first step. I asked Jasper and it seems the ref lifecycle is tied to the wrapper object (the boxpointer for instance) and will be retained until the GC kicks in. I did not have the time to investigate the details of gjs to confirm this or is it from its special treatment of the cairo context. It's quite unexpected to have a ref on the getter but no unref when exiting the local scope. I did not have the time for a more precise testcase either, and won't have any more available for the next two weeks. If somebody with more gjs/cairo context integration knowledge could comment on the last paragraph, that would be much appreciated.
(In reply to comment #24) > I asked Jasper and it seems the ref lifecycle is tied to the wrapper object > (the boxpointer for instance) and will be retained until the GC kicks in. I did > not have the time to investigate the details of gjs to confirm this or is it > from its special treatment of the cairo context. It's quite unexpected to have > a ref on the getter but no unref when exiting the local scope. Yes. There's two issues: * Since SpiderMonkey doesn't know how much memory is being used by the cairo context and surface, it can't make the judgment to immediately GC -- what it sees is a very tiny wrapper object. Firefox wants this as well for accurate memory reporting, so they're looking at adding it to the new C++ API. * An immediate GC after drawing the drawing area doesn't dispose the context. Due to optimizations, SpiderMonkey can't accurately scan the C stack, so it thinks that the cairo context wrapper object is still in scope. Internal tests where we mangled the C stack using some fancy JavaScript showed that doing this made the context able to be GC'd. A full GC after a drawing area has been cleared should destroy and free the context, which should free the surface as well. I made sure that the context wrapper was destroyed, but I'll investigate to make sure that the native context object and surface are destroyed as well.
We could add an API to gjs to explicitly dispose wrapper objects: context.$dispose() or System.dispose(context) ?
Created attachment 232970 [details] [review] cairo-context: Add an explicit CairoContext.$dispose() function Due to limitations in SpiderMonkey's GC, cairo contexts, and thus, their surfaces, aren't cleaned up after dispose().
Created attachment 232971 [details] [review] js: Explicitly dispose all cairo contexts Due to limitations and bugs in SpiderMonkey's GC, wrapper objects for cairo contexts and similar may not get cleaned up immediately after repainting, leading to leaking memory. Explicitly disposing of such objects after they're not needed can clean up large portions of memory for cairo surfaces.
Review of attachment 232970 [details] [review]: Before we do this though - any thoughts on a more generic version? We could pretty easily make an API that allowed "disposing" ("unwrapping", "clearing", any other terminology ideas?) any GObject, GBoxed, as well as the custom wrapper types like cairo.
This is such a really core idea that I wouldn't want a shared version -- disposing some random object without knowing what it is seems like a bad idea. The GObject one is exposed as run_dispose. What would the GBoxed one do? Kill the storage for the boxed internally? I can't imagine that being useful, unless you have a really giant GBoxed, which you really shouldn't do.
(In reply to comment #30) > This is such a really core idea that I wouldn't want a shared version -- > disposing some random object without knowing what it is seems like a bad idea. > > The GObject one is exposed as run_dispose. That's not quite the same thing - dispose is intended to drop references to other GObjects. But the object itself could have non-trivial, non-GObject data such as a file descriptor, large GBytes instance, etc. > What would the GBoxed one do? Kill the storage for the boxed internally? I > can't imagine that being useful, unless you have a really giant GBoxed, which > you really shouldn't do. If cairo didn't require manual bindings for unrelated reasons, cairo.context would just be a boxed type. And as above, file descriptors and GBytes for example. But if you're not seeing it, we can certainly just move forward with the cario one and revisit this later.
Review of attachment 232970 [details] [review]: One minor style thing. ::: modules/cairo-context.c @@ +417,3 @@ + cairo_destroy(priv->cr); + priv->cr = NULL; + } g_clear_pointer(&priv->cr, cairo_destroy);
Review of attachment 232971 [details] [review]: You could go try { ... } finally { } with this too, but this is fine by me.
Comment on attachment 232970 [details] [review] cairo-context: Add an explicit CairoContext.$dispose() function Attachment 232970 [details] pushed as 19e899c - cairo-context: Add an explicit CairoContext.$dispose() function
Created attachment 233003 [details] [review] js: Explicitly dispose all cairo contexts Due to limitations and bugs in SpiderMonkey's GC, wrapper objects for cairo contexts and similar may not get cleaned up immediately after repainting, leading to leaking memory. Explicitly disposing of such objects after they're not needed can clean up large portions of memory for cairo surfaces. You spotted a bug!
Review of attachment 233003 [details] [review]: Looks right.
Attachment 233003 [details] pushed as 9548cd8 - js: Explicitly dispose all cairo contexts
Many thanks for this patch, I will try it later. Does this patch fix completely the bug or is it just to free more memory than now ?
Pardon my ignorance, but: how would one be able to try out this set of patches? Is there a PPA or test library, or does it need compiling from source? If source, which packages?
I don't know if PPAs or test packages are available. To try out this fix, you would need to compile the "gjs" and "gnome-shell" packages from source.
Thank you Jasper, but when I try to patch gnome-shell sources I get an error with some files which don't exist on 3.6.2 : error: js/ui/separator.js: No file exists error: js/ui/switcherPopup.js: No file exists I tried with "patch -p1 -i patchName.patch" and "git apply fileName.patch".
Indeed, some of these are new files in development versions. separator.js comes from equivalent code in popupMenu.js, which you might be able to patch manually, and switcherPopup.js comes from code in altTab.js and ctrlAltTab.js. Grep through the code for "get_context" and make sure that all the "cr" variables are disposed before the function returns.
I added "cr.$dispose();" after each time "cr" is used, it pass the compilation but it doesn't fix the memory leak for me. I trust you if you say it works for you, so I will wait for gnome 3.8 ;)
(In reply to comment #43) > I added "cr.$dispose();" after each time "cr" is used, it pass the compilation > but it doesn't fix the memory leak for me. > > I trust you if you say it works for you, so I will wait for gnome 3.8 ;) Unless you also applied the patch to gjs, which I doubt you did - it won't work, and will likely throw exceptions.
In fact I didn't patched anything... I did the modifications on the sources, but then I launched makepkg (on ArchLinux, which makes sources and generates a package), and I saw that it extracts sources even if they are already extracted (so the modified sources are replaced by the original ones ...). Now it is patched correctly, and it works thank you, but for me the memory leak is just reduced, if you open many times the date menu (for exemple), you can see that all the memory consumed is not free, even if it is very lower than before. But I didn't applied all the modifications you did in your patch Jasper (for gjs I did, because it is the same for gnome 3.7 and gnome 3.6, but not for gnome-shell), so maybe I missed something that fix completely the bug. Thank you again !
And I have a question to you Jasper St. Pierre : In your patch you modify all the gnome-shell menus to dispose the "cr" variables, but what about the extensions ? Extensions Developers have to do the same or gnome-shell does it automatically ? Thank you again and sorry to being a newbie...
Extension developers may have to dispose, which is unfortunate, but this is hopefully a temporary hack that will be fixed in future versions of gjs/gnome-shell.
(In reply to comment #25) > Yes. There's two issues: [...] Thanks for the precisions! (In reply to comment #26) > We could add an API to gjs to explicitly dispose wrapper objects: > > context.$dispose() or System.dispose(context) ? I thought about explicitly disposing the context, but I imagined you guys would go for a GJS fix, but I understand it's complex and an acceptable medium-term fix. (In reply to comment #45) > Now it is patched correctly, and it works thank you, but for me the memory leak > is just reduced, if you open many times the date menu (for exemple), you can > see that all the memory consumed is not free, even if it is very lower than > before. Indeed, I told fixing the leak would halve the memory increase and that there were certainly additional retention issues. I'll try to find out more about this bug until then: it would be nice to have the other causes, even if I hope switching to js-187 will improve the situation. Maybe we should keep this bug open until "most of it" is fixed?
I can't see any leak after applying the patches and the memory seems to be freed correctly now. Thank you.
I have installed an experimental version of Gnome-Shell (3.7.4-git) in Ubuntu 12.10 from the ricotz/staging repository. - Gnome-Shell initially loads using ~100MiB - Memory eventually can increase to around ~180-200MiB - I have seen spikes as high as 258MiB, but in a few minutes they appear to be garbage-collected out Granted, I only ran it for an hour or two, but I was running the System Monitor extension with a very high refresh rate the entire time, and didn't see the memory increase. Also, I clicked on the calender for 2-3 minutes solid; when G-S first loads, this does increase the RAM count, but only to a point. I'm going to tentatively agree with the "fixed" status. I'll revisit the shell when it gets released to see how it fares over longer periods of time, but until then: thank you one and all!
Is there some easy way to backport the bugfix to Gnome-Shell 3.6.2 and Cairo 1.12.10?
> Is there some easy way to backport the bugfix to Gnome-Shell 3.6.2 ... You could recompile gnome-shell-3.6.2 and gjs-1.34.0 after applying the patches manually, i.e. line by line.
(In reply to comment #51) > Is there some easy way to backport the bugfix to Gnome-Shell 3.6.2 and Cairo > 1.12.10? The gjs patches should apply cleanly. For gnome-shell patches, you need to look through the JS files for "let cr" and make sure to add "cr.$dispose();" near the ends of functions. I can try and do this tonight, but I won't be able to test the patches.
(In reply to comment #53) > The gjs patches should apply cleanly. For gnome-shell patches, you need to look > through the JS files for "let cr" and make sure to add "cr.$dispose();" near > the ends of functions. I can try and do this tonight, but I won't be able to > test the patches. If you can prepare the patches, I (and others for sure) will review and test them. Thanks for your help.
One last thought about this, mostly for Jasper: I grant that with these patches and the new SpiderMonkey port from ricotz/staging, the whole leak seems plugged. But... does the G-S environment look far heavier than it should be to anyone else? Clutter looks to be more lightweight than Compiz by a lot, and there are only well-designed and elementary effects here and there... so why is the shell taking up 200MB with no extensions? For comparison, that's about twice what I get with Unity and on par with KDE and all its Plasma desktop stuffs. Should I open a new bug on this, or is this within expectations?
(In reply to comment #55) > One last thought about this, mostly for Jasper: > > I grant that with these patches and the new SpiderMonkey port from > ricotz/staging, the whole leak seems plugged. But... does the G-S environment > look far heavier than it should be to anyone else? Clutter looks to be more > lightweight than Compiz by a lot, and there are only well-designed and > elementary effects here and there... so why is the shell taking up 200MB with > no extensions? Unfortunately, I don't have a good answer for this question. It would be nice to have some detailed memory profiling, to know what part is taken up by JavaScript, by mutter, by the shell, etc.
Additionally, note that we've turned our periodic off due to deadlocks that seem to have been fixed upstream. If/when we get a new tarball, we will re-enable the periodic GC to ensure that memory performance is much more optimal.
periodic GC is enabled on the test build from staging ppa.
@Jasper: I can get that detailed memory profiling (of the ricotz/staging PPA version of G-S 3.7.4, with the patches here and new SpiderMonkey) if you would be so kind as to point me in the direction of docs that can tell me how to give it to you. Also, I kind of purged that PPA already, as it broke the Empathy app, so it might take me a little while to re-install it and profile it, but I can certainly prioritize this.
(In reply to comment #59) > @Jasper: I can get that detailed memory profiling (of the ricotz/staging PPA > version of G-S 3.7.4, with the patches here and new SpiderMonkey) if you would > be so kind as to point me in the direction of docs that can tell me how to give > it to you. Unfortunately, we don't have the infrastructure set up for that sort of memory reporting. You can try using massif, but it may not give you the results you want.
massif tends to show all the memory gobbled up by mesa/libGL.
It could certainly be a mesa bug, or a cogl/clutter/gnome-shell bug where some GL resources don't get freed.
(In reply to comment #61) > massif tends to show all the memory gobbled up by mesa/libGL. Depending on how you are tracking memory use and what drivers you are using, it's possible that application window pixmaps will show up as memory "used" by Mutter, without actually consuming memory. Also, Mutter will have an overhead of approximately 1/3 of the memory used by your window pixmaps to store scaled down windows. This could be investigated by examining the memory usage of Mutter as a function of how many windows are open and of what size.
It is true: new windows make G-S's in-use memory count rise, and dismissing them make the memory use fall. The numbers I'm seeing are with no application windows open at all, though.
Can we get new releases for the affected packages for GNOME 3.6?
Did these changes make it into the Gnome-team PPA for raring? The shell doesn't act like it (read: In raring's Gnome-Shell 3.7.90, the memory leak still seems to be here)
If you have packaging questions for specific distributions (assuming that Raring is a distribution), please ask the packagers of that distribution. Thanks.
Andre, that wasn't a packaging question, that was a "did the code change officially make it into the Gnome 3.7.x branch" question. And to answer my own question: it appears it has not, and if it has, it doesn't work... So I don't think this bug deserves the "resolved/fixed" status.
Three patches went into git master (3.7.x) around 2013-01-08 on the GNOME side of things.
re-enabling GC on idle, fixes this, but that requires the updated spidermonkey, which is coming, albeit very slowly.
Can the updated spidermonkey be used in GNOME 3.8?
This has not been fixed. I'm using GNOME 3.6.3 on Ubuntu 13.04 and this issue still exists
@heroandtn3 The fixes are not in 3.6, only 3.8. and afiak the fixes in 3.8 will only work with some mythical new version of gjs/spidermonkey. and afiak there's also an issue with clutter leaking memory so the javascript fixes may not even completely fix the issue :/
@bwat47: but the "Resolution" tag has been flagged "FIXED", I think they should fix this issue in 3.6.x version.
What I should do to fix this issue? Should I upgrade to 3.8?
Its not really possible to fix this for 3.6 since it requires the new spidermonkey library and no distro carries that yet. Atleast on Ubuntu Raring, you can install gnome3 ppa to get 3.8 with the new spidermonkey.
Thanks for useful information, I will try upgrading to 3.8 with a risk
(In reply to comment #74) > I think they should fix this issue in 3.6.x version. "They" is your distribution which can backport patches for you. You are free to contact "them".
klapper: well, upstream *does* do point releases of 'old' versions.
*** Bug 694558 has been marked as a duplicate of this bug. ***
I think this still happens in GNOME Shell 3.8.3. If any developer needs any debugging information to investigate, feel free to let me know.
In fact it is "fixed", we just have to wait for the 3.10 version which will use the new spidermonkey. There is a gnome-shell branch on the official git repo which uses this new spidermonkey, but it is not as user-friendly as installing a package (for the moment).
This memory leak is still present in Gnome 3.10.0. Memory usage seems to grow even faster compared to gnome shell 3.8.4 when using it. It can easily go up to 500MB and beyond.
If you feel like your memory usage is abnormal, please give us massif and valgrind logs.
I tried to start gnome-shell 3.10 in valgrind in following this wiki page : https://wiki.gnome.org/GnomeShell/Debugging but I get an error telling me that the options "-g" and "--debug-command" don't exist. I also tried : G_DEBUG=gc-friendly G_SLICE=always-malloc valgrind --leak-check=full --show-reachable=yes --log-file=gnome-shell-valgrind.log gnome-shell --replace And I get this error after a few seconds : zsh: killed G_DEBUG=gc-friendly G_SLICE=always-malloc valgrind --leak-check=full Do you know I can I run gnome-shell in valgrind on ArchLinux ?
I can confirm that bug still exists, and is even worse, on 3.10. In no way does it deserve the "fixed" tag. This should be a blocker.
Anthony, you are missing the --tool option and its also important to include the clutter suppressions. G_SLICE=always-malloc G_DEBUG=gc-friendly valgrind -v --tool=memcheck --leak-check=yes --log-file=valgrind.log --suppressions=clutter-2.0.suppressions gnome-shell --replace You can get the Suppressions file here https://git.gnome.org/browse/clutter/tree/tests/data/clutter-2.0.suppressions
Valgrind crash after a few seconds... Gnome-shell is killed but can't start. Here is the valgrind log : https://gist.github.com/Anthony25/6764188
Are there any more viable ways to debug it? E.g looking glass or some other introspection tools to take a look at objects graph or something. I tried to start gnome-shell under valgrind, but I couldn't see desktop after 10 minutes waiting and gave up.
I have run into a situation where nautilus failed to open due to not being able to allocate memory, gnome-shell was sitting at roughly 1.7 GiB (!) journalctl showing several mentions of gnome-shell being unable to open 'Files' due to the inability to allocate memory. I, too, see the same behavior as listed above: memory footprint climbs significantly as the number of operations since cold boot climbs. System: AMD Phenom(tm) II X4 955 Processor GeForce GTX 460 (using nvidia blob) OS: Arch Linux 64 bit kernel 3.11.4 gnome-shell 3.10.0 - 0 extensions enabled A cold boot starts gnome-shell at roughly 85 MiB. I have attempted darkxst's valgrind syntax and clutter suppression, I will attach log output shortly.
Created attachment 256861 [details] valgrind log output for gnome-shell
As you can see in the log, a good portion of those are from the nvidia driver. These leaks don't appear to happen with other drivers like nouveau or Intel. They could be our bug, but without debug symbol information, they're extremely hard to track down. One example: ==1180== ==1180== 5,242,903 bytes in 1 blocks are possibly lost in loss record 19,838 of 19,839 ==1180== at 0x4C2757B: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==1180== by 0x1532BD98: ??? (in /usr/lib/nvidia/libGL.so.325.15) ==1180== by 0x1CE4942F: ??? ==1180== by 0x1E6E9CF7: ??? ==1180== by 0xF: ??? ==1180== by 0x18DA9918: ??? (in /usr/lib/libnvidia-glcore.so.325.15) It seems another portion of them are from closures, which might possibly be our issue, but without debug symbols for mozjs, gjs, clutter, and gnome-shell, we'll never know.
Created attachment 256879 [details] valgrind log output for gnome-shell (nouveau) Exact same system as previously attached log, now running nouveau 1.0.9 in stead of NVIDIA binary blob.
Created attachment 257117 [details] valgrind log output for gnome-shell debug And finally, valgrind log output for gnome-shell --replace with debug packages for gnome-shell, mozjs, gjs and clutter. Both while on NVIDIA binary and nouveau.
I observed a very similar memory increase issue on a system running inside a QEMU VM. OS: ROSA Fresh R1, GNOME Edition RAM: 4G Swap: 10G The VM emulates the 'standard' video adapter ('-vga=std' option is used) gnome-shell: 3.8.4 The issue was observed both on 32- and 64-bit x86 systems. I launched and closed many different GUI applicatons, one at a time. In several hours, the amount of virtual memory acquired by gnome-shell process reached 3G. From /proc/pid/maps I saw that the size of the heap was rather high, and it kept rising: [heap] - 1094M /usr/share/icons/gnome/icon-theme.cache 82M /dev/shm/pulse-shm-2700391189 64M /dev/shm/pulse-shm-3828358612 64M The rest memory areas are smaller than these. The responsivenes of the system degraded. In one of such experiments (when I turned off the swap file), I have even seen OOM killer at work. I checked gnome-shell with valgrind on the same usage scenario. The compressed log is 1.7M which seems to be too large for an attachment, so I have uploaded it here: http://cdn.2safe.com/162555033758/vg-gnome-shell-20131013.7z The log does not show memory leaks of this magnitude however. This might mean that the memory is deallocated when gnome-shell exits but is kept until then. Inefficient garbage collection by JS interpreter?
*** Bug 710693 has been marked as a duplicate of this bug. ***
I've looked at code and found out one interesting fact. If I change screen resolution (or add monitor, or change size of virtualbox window) then memory usage of gnome shell grows up quickly and every time I open shell, its memory usage increases by 6-8 Vb. It loads backgroud image every time I open the shell. In file background.js implemented cache of loaded images. Every time I open the shell, that cache is populated and then I close shell cached images is cleared. The interesting thing is that before I change screen resolution, cache is populated twice: if (params.cancellable && params.cancellable.is_cancelled()) content = null; else this._images.push(content); // one more time add image to cache and cache works. I don't understand the code (see commit 3be489c69e4980bca8aa85808ed6628b69551071), but after resizing of screen additional population don't work and background images always are removed from cache. This causes loading of background images every time I open shell. Because memory for images is allocated from C then JS GC don't know a size of allocated memory. It only knows JS_BYTES and uses that number to run GC. I've achieved following results: heap usage of gnome-shell: 2 Gb JS_BYTES: about 28 Mb. As you can see GC wasn't started, because JS_BYTES didn't reach 32 Mb threshold, but C code allocated 2 Gb of memory only for background images. Running full GC twice freed a lot of heap memory (2 Gb -> 150 Mb).
@Jasper: This is not an nVidia problem. These leaks appear on all drivers of all varieties, and on systems without any nVidia anything in them, and there are examples of people saying so right here in this thread. I have personally seen this occur on Intel and Nouveau drivers, and it is occurring for me, right now, in a VM right in front of me, using the VMware SVGA II Adapter driver. nVidia themselves, after months of researching this (at my behest and with my full cooperation and debug logs/traces) have come up empty-handed. The bug is closed on their end, and indeed, this leak (or anything like it) does not replicate with the nVidia binary blob on any other 3D accelerated environment. In Gnome 3.10 -- Fedora 20 beta with full garbage collection on according to Looking Glass -- all I have to do to expose this problem is open/close Firefox, or repeatedly click on the calendar. Or run that infamous System Monitor extension. This Gnome-shell session began life at 68M. My uptime is 34 minutes; it is now 229.6Mib. I understand that this is a difficult bug to diagnose, and that these things take time, and I am here to help with whatever bug tracing and reporting and whatever else you might need... but can we please, for the love of all things holy, stop calling this an nVidia issue (and then doing nothing afterward)? That just isn't true, and seeing it repeated release after release is just... not good.
can people who have a specific workflow that reliably produces what looks like a leak please run that workflow through valgrind? it'll make life much easier for the devs. thanks.
Created attachment 264026 [details] valgrind/massif output for running gnome-shell-3.10.2.1-3.fc20.x86_64 and opening/closing the top panel calendar several times I can reproduce growing memory usage when opening and closing the calendar. Here's a valgrind log of doing that several times, in latest F20.
Created attachment 264027 [details] valgrind/massif processed info for running gnome-shell-3.10.2.1-3.fc20.x86_64 and opening/closing the top panel calendar several times here's the result of running ms_print on the valgrind output.
@Adam Williamson : How do you get these logs please ? I have the same kind of leak on my laptop and my desktop computer. On my desktop (with 8GB of RAM), it grows to > 800MB after few hours (like 2 hours), and on my laptop it's about > 600MB (with 4GB of RAM). I'm running on ArchLinux with Gnome 3.10.2, and I have this leak since Gnome 3.6
I'm looking into the calendar bug right now. It might pull up something greater. In Adam's logs I can see that some big GBytes/Variant instances are being leaked, along with quite a few Clutter actors...
Created attachment 264029 [details] valgrind/massif output for running gnome-shell-3.10.2.1-3.fc20.x86_64, waiting a while, then opening/closing overview 10 times, running 'gc' from alt-f2 and stopping This case also produces an apparent increase in memory usage (opening and closing overview over and over, no extensions enabled).
Created attachment 264030 [details] valgrind/massif processed info for running gnome-shell-3.10.2.1-3.fc20.x86_64, waiting a while, then opening/closing overview 10 times, running 'gc' from alt-f2 and stopping
If I open two windows (for example, terminal and nautilus) and switch between them clicking on their title bars, then memory usage of Gnome shell will grow. It seems like there is leak in implementation (popupMenu.js) of application's menu in Gnome shell dash panel.
"This case also produces an apparent increase in memory usage (opening and closing overview over and over, no extensions enabled)." I can confirm. Are there other/more logs necessary, or does this bug have all the info it needs?
I can confirm this as well. I just spent this weekend trying to track down this memory leak. Mine starts out around 120MB and increases up to 2GB+ and some swap(100-200MB) usage as well out of my 4GB. I put my computer into suspend and don't shutdown, so the memory leak stays during these events as well. Only thing I could do was restart the gnome-shell. I can also confirm that gnome-shell was starting to get choppy as well, I can only assume that this is due to it starting to read/write to my swap partition. This needs to get fixed. Gnome-shell after a week of normal desktop usage without restarts, has very poor performance. Had to move to a different DE as I need this computer for consulting work and university classes. Arch linux x86_64 kernel 3.13.5 gnome-shell 3.10.4-1 nvidia 334.21-2
Still present on Gnome Shell 3.12.1 and gjs 1.40.1 (https://projects.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/gnome-shell, https://projects.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/gjs).
I think it's a problem with the nvidia drivers. On my laptop with an intel card and gnome 3.12.1 (on ArchLinux), the memory consumption it's lower, 200MB in average with many extensions. On my desktop with a nvidia graphic card (on ArchLinux too), it grows up to 800MB after a while (like before).
For the 70th time, there is no "it". Guessing wildly, people are probably discussing, like, fifteen different bugs, here. This bug report is now so diffuse and vague that there is really zero chance of the devs paying any more attention to it. If people want to keep kibbutzing, I mean, fine, but it's not getting anyone anywhere... I'm going to re-test my scenarios and open separate bugs for each that still exists, *with valgrind logs attached* and details about my GNOME configuration (especially enabled extensions) and hardware...I suggest other people who actually want to see things fixed do the same :)
@Adam Williamson: You *are* guessing wildly, especially considering all the Valgrinds and very very specific reproduction steps that have already been given more than once in this very thread. While this *may* have several causes, it only has two easy-to-reproduce formulas. Two. That's all. Either one is sufficient, and both take less than a sentence each to explain. Anyone who has less than five minutes can get it to go by either: - clicking on the calendar repeatedly, watching mem usage rise, or - opening and closing the same five programs, noting the memory not being freed up each round. Neither of those are "diffuse" or "vague". That just isn't confusing in any way. No one continuing to talk about it is "kibbutzing". And for the love of pete, *none of this is an nVidia driver problem* -- it happens on all drivers, even VMWare tools. The leak is only slightly worse with binary blogs, but exists everywhere. The reason no one is continuing to submit even more Valgrinds is because they've already done so. The reason no one is opening separate bugs about this is because they always get closed, marked duplicate of *this* one, and then nothing continues to happen. The reason nothing continues to happen is... well, I don't know. I'd only be guessing, but one guess would be the hope that Spidermonkey 24 would help.
I'm going to close this now. None of the developers read this bug. It's simply white noise. I've looked and found and fixed plenty of memory leaks. I've talked with the SpiderMonkey guys about lowering memory usage. It's a hard problem, and having more people saying "me too" won't magically fix it. Lots of different subcomponents use memory. The NVIDIA driver stores backing window pixmaps in our process because we are the compositing manager. That means that gnome-shell's process address space will grow, normally, but parts of that memory are not system RAM, but VRAM. The JS engine we use isn't as efficient with memory as we'd like it to be. We recently landed a large number of fixes in gjs, the bindings to SpiderMonkey that we use, that in combination with SpiderMonkey 24, make GC more aggressive. If you see very, very specific memory leaks that have clear reproducible instructions, as opposed to "slow memory leaks" which are probably a combination of ineffective GC and memory fragmentation, please file a new bug and I'll take a look at it.
Just... wow. Am I reading this correctly? Are you actually saying that this situation -- a desktop shell that leaks memory when anyone has the audacity to use it -- is an *acceptable* situation to Gnome developers? So acceptable that none of them even read the bug, or, at least, acceptable enough to close the bug (as *INVALID* no less)? If so, then I don't know what else to say. Good luck.
No Chad Rodrigue, that is not what Jasper wants to say. They (gnome developers) do what they can to reduce the memory consumption, and honestly on intel GPU it's quite good compared to the previous versions. Nvidia is doing some crap with the compositing manager and it's a big problem in gnome-shell because the memory gestion is very difficult to control in JavaScript. The bugreport is closed because it's a known bug and no one follows this thread because we just repeat what they already know.
(In reply to comment #114) > Just... wow. Am I reading this correctly? Are you actually saying that this > situation -- a desktop shell that leaks memory when anyone has the audacity to > use it -- is an *acceptable* situation to Gnome developers? It's a hard problem. It doesn't leak memory, as far as I know, it's just process address space becomes larger and larger over time. This is a combination of graphics drivers using lots memory, a somewhat ineffective GC, and memory fragmentation issues. System libraries also don't always free resident memory based on certain patterns and heuristics, if they think we're simply going to allocate again soon anyway. All of our research and investigation, including all of the massif and valgrind logs you guys have provided (thanks for that, it really helped us out!) has shown that it's a compound of many, many different technical issues. So this bug isn't really going to help fix anything. It's going to rack up more "me too" comments while people complain about high memory usage. If there's any easily reproducible memory leak (e.g. do this operation and it increases memory every single time), please file a bug, as that's something we can track down and fix independently. I'm not saying those don't exist, I'm just saying that they're not what the pattern we're seeing for high memory usage. > So acceptable that > none of them even read the bug, or, at least, acceptable enough to close the > bug (as *INVALID* no less)? "Me too" comments aren't helpful for fixing bugs. We're well aware of the problem of high memory usage, and we worked on it a bit last cycle. This bug remaining open doesn't help us keep track of anything. My options are "FIXED", "WONTFIX", "DUPLICATE", "NOTABUG", "NOTGNOME", "INCOMPLETE", "INVALID" and "OBSOLETE". I chose not to mark it as "FIXED" in case some peopled with NVIDIA drivers get their hopes up. "INVALID" seemed like it was the best match.
@Jasper St. Pierre, If this a combination of different variables; GNOME 3, NVIDIA drivers, and GC issues on the javascript end; would it not be more logical to mark this as INCOMPLETE. INVALID to me means that the issue is not an issue anymore as something was replaced that was causing the original issue. If this is a NVIDIA driver issue with a combination of ineffective GC in SpiderMonkey, marking it as INCOMPLETE and making a note on the bug that it may have an impact on systems with NVIDIA graphics drivers, would seem a little more appropriate than the INVALID status.
"INCOMPLETE" usually means that it was a bug previously marked as NEEDINFO that the original reporter has given up on or hasn't managed to reply in a while, and can be reopened. But OK, sure, I don't really care what status it ends up as.
Jasper: As a developer myself, I get it, I really do. I respect the manner in which you've handled this, and the reasoning behind your decision: it looks like you guys went and built a desktop shell on technologies that you don't control and that, in this particular way, don't work very well. I've been through something similar, and that's pretty rough. But no matter how understandable it is, it is still maddening, for the following reason: > If there's any easily reproducible memory leak (e.g. do this operation and it increases memory every single time), please file a bug That's *this bug*, though! The same exact instructions outlined in the description will, over a year and a half later, across 4 releases now, cause the problem 100% of the time! The instructions even remain completely unchanged: BEGIN: Boot machine with G-S on it, log in, open G-S-M, mark initial memory use. Open every shortcut in the favorites bar, close everything you opened, mark new memory use. END. Every single time I repeat what's between "BEGIN" and "END", on every driver I have (nouveau, intel, sgII, nVidia), on every machine type (real steel or VM), the G-S process increases gets fatter without exception. How *much* fatter appears to be driver dependent, but fatter nonetheless. > "INVALID" seemed like it was the best match. Except it gives the impression that there isn't really a problem when everyone -- you guys especially -- knows that isn't true. That's why you worked on it last cycle. Despite all the good explanations as to why, and the complexity of the problem, and how its going to look, "WONTFIX" is by far the best match, because... well... it's accurate, and because "probably can't fix" isn't a choice.
I can't reproduce. Or, rather, I can reproduce with your instructions, but I don't have the same analysis. I start up my desktop, I open every application, and memory increases since the new applications take up memory in gnome-shell's memory: things for the title of the window, the window's dimensions, etc. That's normal. Closing all applications doesn't decrease memory, so you might imagine that this is a leak. However, at least for me, reopening them again won't increase further, so it's not that we've leaked a resource, otherwise we'd see another resource leaked for the next app opened. What you're measuring is the RES, or resident memory, of the process, which will increase as more and more memory is allocated, and won't necessarily decrease as all memory is freed. This is normal behavior. Our memory allocators (in all of glib, mozjs, and glibc) won't free pages back to the kernel that often, since that's an expensive process, and it can recycle the pages the next time we ask for memory, which is fairly often. I'm fine with changing the resolution to whatever you think is best. If you think that's "WONTFIX", then that's "WONTFIX". I was afraid that people would think that we weren't interested in fixing it. If we had a more fitting status, I'd use that.
> However, at least for me, reopening them again won't increase further It is exactly here that we disagree. Every time I measure reopening and re-closing them, the memory usage does indeed increase further -- though it takes more than a first glance to notice. If you're on open-source drivers like intel or nouveau or ati, the amount of added bloat is small -- maybe its 0.5-1.5M per round of opening/closing. (It is easily more visible on nVidia). But thought it is less -- sometimes so small that G-S-M doesn't report a change -- it IS an increase, and it *will* continue to go up as you repeat rounds of opening/closing things. Further, I have never seen it go down and never -- not once, not on any release of G-S since 3.6, on any computer arrangement ever -- seen a version of Gnome-Shell that does not do this. > and it can recycle the pages the next time we ask for memory, which is fairly often The other odd thing is that G-S never lets this memory go. I had actually filed this very same bug against Unity at the same time I submitted this one, as it appeared to have the same symptom (especially with the nVidia driver). Interesting thing, though: if I were to arbitrarily request more memory than was available, by, say, opening a VM, the G-S systems would begin swapping (and, in some cases, freezing), but Unity would shrink from however many gigs it was taking up to around 60M. This is a trick I have not yet tried against 3.12.1, but it doesn't really look like its going to be relevant as I do, so.
chad: my comment was a bit intemperate, and sorry for that, but if you read through the whole bug, it's clearly fairly focused and productive up till 2013-01 and involves the resolution of a couple of clearly defined issues, but once it got re-animated in 2013-10, it becomes pretty much useless - you're still seeing *some* kind of resource usage issue but it's clearly not still being caused by the same things it was being caused by before, because those really were fixed. A lot of the people who 'confirmed' the bug aren't actually following either of your two scenarios, if you read their comments. It all just lost too much focus to be practical. I still think it might be a good idea for you to open a new bug with very specific descriptions of your reproducer and relevant valgrind logs, as from your comments and Jasper's it doesn't really sound like you're quite on the same wavelength. I just don't think doing it in this bug is going to work out for anyone.
I confirm this VERY HUGE problem because on low memory system that make system unusable: every action do with gnome-shell increase the memory usage, as already described by users above e.g.: in one day the process gnome-shell grow up from 97MB to 500MB
Such problems are not helpful at all without providing a valgrind log (explained above) and version information.
sure, I'm on ArchLinux x86_64 with gnome-shell 3.12.2 I run: G_SLICE=always-malloc G_DEBUG=gc-friendly valgrind -v --tool=memcheck --leak-check=yes --log-file=valgrind.log --suppressions=clutter-2.0.suppressions gnome-shell --replace gnome-shell crashes (or better windows decoration disappear, gnome-shell top and bottom bar disappear, I can move mouse cursor but any action is without effects) just after I do an action e.g. click on Application menu I attach the log
Created attachment 282332 [details] valgrind log by mattia.b89
that happens both on gnome-shell and fallback/flashback mode