GNOME Bugzilla – Bug 595321
memory leaks (ongoing)
Last modified: 2010-01-14 20:17:28 UTC
This bug will have patches to both analyze the shell's memory usage, data I've gathered, and fixes.
== Application Add/Remove == So first up I tried doing "while true; do yum -y install pidgin; yum -y remove pidgin" in a loop, since the application system has nontrivial memory management bits and we do a LOT of stuff when the applications change. A first observation is that the shell's RES grows about 1MB/s under this scenario. My new bindings for JS_GC (bug 595323) don't appear to shrink the RES, but does Spidermonkey release pages back to the OS? Don't know. I haven't tried valgrind yet, but one really interesting data point (using the tool in bug 354457) is: BigBox | created 15809 | destroyed 4758 | max-population 11051 So at some point we had over 11,000 BigBox instances floating around. That sounds like quite a lot! Is that a GC problem? It's hard to know. I need to improve that patch so that we can dump the data at any point rather than at exit. Also, I'd really like a set of stack traces ordered by most frequent allocation point. So, further work to be done on that patch. However for this one I'll turn to valgrind.
Let's start with the "definitely lost". They're small potatoes, but worth fixing. ==11710== 7,872 bytes in 62 blocks are definitely lost in loss record 263 of 312 ==11710== at 0x4A0776F: realloc (vg_replace_malloc.c:429) ==11710== by 0x6BCE673: g_realloc (gmem.c:170) ==11710== by 0x6BEE7CE: g_string_maybe_expand (gstring.c:361) ==11710== by 0x6BEEE35: g_string_insert_len (gstring.c:696) ==11710== by 0x6BEF233: g_string_append_len (gstring.c:844) ==11710== by 0x6BB26EC: g_build_path_va (gfileutils.c:1460) ==11710== by 0x6BB299C: g_build_filename (gfileutils.c:1730) ==11710== by 0x3FC0C12003: gnome_desktop_thumbnail_factory_lookup (in /usr/lib64/libgnome-desktop-2.so.11.2.6) ==11710== by 0xE92D5B5: load_pixbuf_thread (shell-texture-cache.c:427) ==11710== by 0x5E9AFEB: run_in_thread (gsimpleasyncresult.c:676) ==11710== by 0x5E8B904: io_job_thread (gioscheduler.c:182) ==11710== by 0x6BF737C: g_thread_pool_thread_proxy (gthreadpool.c:265) ==11710== ==11710== ==11710== 29,808 bytes in 207 blocks are definitely lost in loss record 286 of 312 ==11710== at 0x4A0763E: malloc (vg_replace_malloc.c:207) ==11710== by 0x6BCE575: g_malloc (gmem.c:131) ==11710== by 0x57BC777: cogl_texture_new_from_data (cogl-texture.c:1330) ==11710== by 0xE92DBFE: pixbuf_to_cogl_handle (shell-texture-cache.c:653) ==11710== by 0xE92E88B: on_pixbuf_loaded (shell-texture-cache.c:721) ==11710== by 0x5E9AD52: g_simple_async_result_complete (gsimpleasyncresult.c:588) ==11710== by 0x5E9AF06: complete_in_idle_cb_for_thread (gsimpleasyncresult.c:650) ==11710== by 0x6BC8CBE: g_idle_dispatch (gmain.c:4065) ==11710== by 0x6BC500C: g_main_dispatch (gmain.c:1960) ==11710== by 0x6BC648C: g_main_context_dispatch (gmain.c:2513) ==11710== by 0x6BC692D: g_main_context_iterate (gmain.c:2591) ==11710== by 0x6BC7080: g_main_loop_run (gmain.c:2799)
Created attachment 143256 [details] [review] [ShellTextureCache] Fix memory leaks Need to free the returned thumbnail path, and in the CachePolicy.NONE case we also need to unref the cogl handle.
Ok so moving on, what we see here are our two main big uses of memory; the JS heap and the GObject instance allocator. What we need to know here is basically, what JS objects are live and more interestingly if they're a proxy object, what GObjects are they pointing to? ==11710== 811,312 bytes in 22,363 blocks are possibly lost in loss record 304 of 312 ==11710== at 0x4A0763E: malloc (vg_replace_malloc.c:207) ==11710== by 0x4A077C7: realloc (vg_replace_malloc.c:429) ==11710== by 0x3FB201B91A: JS_realloc (in /usr/lib64/xulrunner-1.9.1/libmozjs.so) ==11710== by 0x3FB2059CA8: (within /usr/lib64/xulrunner-1.9.1/libmozjs.so) ==11710== by 0x3FB203E666: js_PutCallObject (in /usr/lib64/xulrunner-1.9.1/libmozjs.so) ==11710== by 0x3FB20448C6: (within /usr/lib64/xulrunner-1.9.1/libmozjs.so) ==11710== by 0x3FB20505AA: js_Invoke (in /usr/lib64/xulrunner-1.9.1/libmozjs.so) ==11710== by 0x3FB203D5A6: (within /usr/lib64/xulrunner-1.9.1/libmozjs.so) ==11710== by 0x3FB2049EB3: (within /usr/lib64/xulrunner-1.9.1/libmozjs.so) ==11710== by 0x3FB20505AA: js_Invoke (in /usr/lib64/xulrunner-1.9.1/libmozjs.so) ==11710== by 0x3FB203D7FC: (within /usr/lib64/xulrunner-1.9.1/libmozjs.so) ==11710== by 0x3FB2049EB3: (within /usr/lib64/xulrunner-1.9.1/libmozjs.so) ==11710== ==11710== ==11710== 17,634,208 bytes in 6,306 blocks are possibly lost in loss record 312 of 312 ==11710== at 0x4A05260: memalign (vg_replace_malloc.c:460) ==11710== by 0x4A05317: posix_memalign (vg_replace_malloc.c:569) ==11710== by 0x6BE87F0: allocator_memalign (gslice.c:1136) ==11710== by 0x6BE82FA: allocator_add_slab (gslice.c:1007) ==11710== by 0x6BE8517: slab_allocator_alloc_chunk (gslice.c:1055) ==11710== by 0x6BE760A: magazine_cache_pop_magazine (gslice.c:661) ==11710== by 0x6BE7976: thread_memory_magazine1_reload (gslice.c:736) ==11710== by 0x6BE7C20: g_slice_alloc (gslice.c:813) ==11710== by 0x6BE7CE5: g_slice_alloc0 (gslice.c:833) ==11710== by 0x633E205: g_type_create_instance (gtype.c:1698) ==11710== by 0x6322C63: g_object_constructor (gobject.c:1338) ==11710== by 0x63223B3: g_object_newv (gobject.c:1215)
(In reply to comment #4) > ==11710== 17,634,208 bytes in 6,306 blocks are possibly lost in loss record 312 > of 312 > ==11710== at 0x4A05260: memalign (vg_replace_malloc.c:460) > ==11710== by 0x4A05317: posix_memalign (vg_replace_malloc.c:569) > ==11710== by 0x6BE87F0: allocator_memalign (gslice.c:1136) > ==11710== by 0x6BE82FA: allocator_add_slab (gslice.c:1007) > ==11710== by 0x6BE8517: slab_allocator_alloc_chunk (gslice.c:1055) > ==11710== by 0x6BE760A: magazine_cache_pop_magazine (gslice.c:661) > ==11710== by 0x6BE7976: thread_memory_magazine1_reload (gslice.c:736) > ==11710== by 0x6BE7C20: g_slice_alloc (gslice.c:813) You need to run with G_SLICE=always-malloc
Created attachment 143309 [details] [review] [ShellAppMonitor] Squash small memory leak
Created attachment 143310 [details] [review] [ShellAppSystem] Unref GIcon we use internally
(In reply to comment #5) > (In reply to comment #4) > > ==11710== 17,634,208 bytes in 6,306 blocks are possibly lost in loss record 312 > > of 312 > > ==11710== at 0x4A05260: memalign (vg_replace_malloc.c:460) > > ==11710== by 0x4A05317: posix_memalign (vg_replace_malloc.c:569) > > ==11710== by 0x6BE87F0: allocator_memalign (gslice.c:1136) > > ==11710== by 0x6BE82FA: allocator_add_slab (gslice.c:1007) > > ==11710== by 0x6BE8517: slab_allocator_alloc_chunk (gslice.c:1055) > > ==11710== by 0x6BE760A: magazine_cache_pop_magazine (gslice.c:661) > > ==11710== by 0x6BE7976: thread_memory_magazine1_reload (gslice.c:736) > > ==11710== by 0x6BE7C20: g_slice_alloc (gslice.c:813) > > You need to run with G_SLICE=always-malloc Oops, durrr, yes. Ok so rerunning, the most interesting one is: ==27651== 2,116,627 (771,408 direct, 1,345,219 indirect) bytes in 15,528 blocks are definitely lost in loss record 298 of 307 ==27651== at 0x4A0763E: malloc (vg_replace_malloc.c:207) ==27651== by 0x60DB575: g_malloc (gmem.c:131) ==27651== by 0x60F4CA3: g_slice_alloc (gslice.c:824) ==27651== by 0xD9FF3BA: shell_app_info_new_from_tree_item (shell-app-system.c:131) ==27651== by 0xD9FFACD: gather_entries_recurse (shell-app-system.c:342) ==27651== by 0xDA00133: shell_app_system_get_applications_for_menu (shell-app-system.c:557) ==27651== by 0x561CEBB: ffi_call_unix64 (in /usr/lib64/libffi.so.5.0.6) ==27651== by 0x7FEFFE33F: ??? ==27651== by 0x7FEFFE16F: ??? ==27651== by 0x561CC43: ffi_call (in /usr/lib64/libffi.so.5.0.6) ==27651== by 0x7FEFFE0CF: ??? ==27651== by 0xFFFFFFFF: ???
Comment on attachment 143310 [details] [review] [ShellAppSystem] Unref GIcon we use internally > case SHELL_APP_INFO_TYPE_ENTRY: >- return themed_icon_from_name (gmenu_tree_entry_get_icon ((GMenuTreeEntry*)info->entry)); >+ { >+ return themed_icon_from_name (gmenu_tree_entry_get_icon ((GMenuTreeEntry*)info->entry)); >+ } This seems unnecessary and inconsistent with the rest of the code. >@@ -988,10 +990,12 @@ shell_app_info_create_icon_texture (ShellAppInfo *info, float size) > { > ret = clutter_texture_new (); > g_object_set (ret, "opacity", 0, "width", size, "height", size, NULL); >- return ret; > } >+ else >+ ret = shell_texture_cache_load_gicon (shell_texture_cache_get_default (), icon, (int)size); > >- return shell_texture_cache_load_gicon (shell_texture_cache_get_default (), icon, (int)size); >+ g_object_unref (icon); >+ return ret; > } The line cut off before the start of this chunk is "if (!icon)", so won't removing the "return ret" from that if cause a g_warning when you try to g_object_unref NULL?
Created attachment 143714 [details] [review] [ShellAppSystem] Fix memory leak in getting applications from menu
Next up: ==9726== 19,584 (9,792 direct, 9,792 indirect) bytes in 68 blocks are definitely lost in loss record 268 of 302 ==9726== at 0x4A0763E: malloc (vg_replace_malloc.c:207) ==9726== by 0x60DB575: g_malloc (gmem.c:131) ==9726== by 0x4EE7777: cogl_texture_new_from_data (cogl-texture.c:1330) ==9726== by 0xDA0C5D1: pixbuf_to_cogl_handle (shell-texture-cache.c:656) ==9726== by 0xDA0C767: on_pixbuf_loaded (shell-texture-cache.c:724) ==9726== by 0x53A7D52: g_simple_async_result_complete (gsimpleasyncresult.c:588) ==9726== by 0x53A7F06: complete_in_idle_cb_for_thread (gsimpleasyncresult.c:650) ==9726== by 0x60D5CBE: g_idle_dispatch (gmain.c:4065) ==9726== by 0x60D200C: g_main_dispatch (gmain.c:1960) ==9726== by 0x60D348C: g_main_context_dispatch (gmain.c:2513) ==9726== by 0x60D392D: g_main_context_iterate (gmain.c:2591) ==9726== by 0x60D4080: g_main_loop_run (gmain.c:2799)
Review of attachment 143714 [details] [review]: Looks good
Created attachment 143757 [details] [review] [ShellTextureCache] Don't leak when loading an already-cached item A previous patch fixed a leak when loading items which shouldn't be cached, but we also had a leak if two requests for the same item were outstanding. In that case we load the pixbuf twice, but should discard subsequent loads when we notice we've already cached it.
(In reply to comment #13) > Created an attachment (id=143757) [details] > [ShellTextureCache] Don't leak when loading an already-cached item > > A previous patch fixed a leak when loading items which shouldn't > be cached, but we also had a leak if two requests for the same > item were outstanding. In that case we load the pixbuf twice, > but should discard subsequent loads when we notice we've already > cached it. Is this happening frequently? Is this a problem? It sounds potentially like we might be doing a ton of extra io.
(In reply to comment #14) > (In reply to comment #13) > > Created an attachment (id=143757) [details] [details] > > [ShellTextureCache] Don't leak when loading an already-cached item > > > > A previous patch fixed a leak when loading items which shouldn't > > be cached, but we also had a leak if two requests for the same > > item were outstanding. In that case we load the pixbuf twice, > > but should discard subsequent loads when we notice we've already > > cached it. > > Is this happening frequently? Is this a problem? It sounds potentially like we > might be doing a ton of extra io. Not frequently, I think only a few times. It was a constant 21k leak, not big.
Review of attachment 143757 [details] [review]: Can I suggest: - Moving the unref to after out and guarding with a if (texdata) - Just ref'ing again when inserting and unreffing unconditionally Your patch looks like it would work fine, but I think standard cleanup patterns are better. In terms of whether we need to de-duplicate requests, what I'd suggest is adding some quick print debug logging to see when this is happening, and if it is happening in cases that are non-fringe (say, happens when multiple documents have the same document icon), then file a bug so we remember the issue.
(In reply to comment #16) > Review of attachment 143757 [details] [review]: > > Can I suggest: > > - Moving the unref to after out and guarding with a if (texdata) > - Just ref'ing again when inserting and unreffing unconditionally > > Your patch looks like it would work fine, but I think standard cleanup patterns > are better. Good point, fixed. > > In terms of whether we need to de-duplicate requests, what I'd suggest is > adding some quick print debug logging to see when this is happening, and if it > is happening in cases that are non-fringe (say, happens when multiple documents > have the same document icon), then file a bug so we remember the issue. Uuuugh. Ok no it turns out we are doing it a lot. So um, horrible mess, I'll file a separate bug.
closable?
(In reply to comment #18) > closable? No, some leaks are still present, dunno if we want to add future patches to this bug or a new bug for each leak/fix.
*** Bug 602853 has been marked as a duplicate of this bug. ***
i don't think there's any benefit to keeping this open, since it has been untouched for two months. if we find more memory leaks we can file more bugs. if someone goes on another memory-leak-fixing spree, they can open a bug for that spree.