GNOME Bugzilla – Bug 446277
gnome-panel is leaking memory if orca is running
Last modified: 2019-03-27 20:11:24 UTC
I noticed that gnome-panel needs 820MB of RAM on one machine where I have orca running since some days now. A little investigation showed that I can reproduce this problem on another machine of mine easily. Now gnome-panel is growing about 200KB ram every 10 secunds or so... How to reproduce: 1. Navigate to the panel clock. 2. Enter clock preferences and enable "show seconds" checkbox. 3. Close clock preferences and wait. If you now watch gnome-panel in top, you should be able to see it grow more and more memory consumption steadily.
Please see also bug 435199 - Orca is bloating the swap partition, so the system is no more usable after a short time. If you set orca.settings.debugMemoryUsage=True in your ~/.orca/orca-customizations.py file, Insert+Ctrl+F8 will dump some brief memory statistics to the console. For this bug, the only way I can think that Orca would be causing the problem is if it wasn't properly unref'ing CORBA objects so that gnome-panel could free them. I tried running the test case listed in the opener for this bug, though, and could not reproduce the behavior. :-( Using the new memory debug stuff, I also couldn't see any new objects being tracked by the garbage collector. What is the rate (e.g., how many meg's a minute) at which you are observing memory growth?
(In reply to comment #1) > Please see also bug 435199 - Orca is bloating the swap partition, so the system > is no more usable after a short time. If you set > orca.settings.debugMemoryUsage=True in your ~/.orca/orca-customizations.py > file, Insert+Ctrl+F8 will dump some brief memory statistics to the console. I followed the test case described in this bug and let it run for 15 minutes. During that time, I monitored the number of objects known to Python via Insert+Ctrl+F8 and the stuff from bug 435199. I didn't see the overall number of objects in the Orca Python process increase at all. But, I did observe that the memory usage (via top) of gnome-panel climbed during that time. In particular, RES grew from 83m to 92m. When I killed Orca and let the test case continue to run, gnome-panel no longer continued to grow. So, I'm perplexed. If Orca isn't growing, I'm not sure what it might be hanging onto that's causing gnome-panel to grow. This makes me lean towards calling this a bug on the gnome-panel side, either in gnome-panel itself or the atk/gail/libgail-gnome support.
I did some more debugging, which consisted of the following: 1) Running "top -p `ps -elf | grep gnome-panel | grep -v grep | awk '{ print $4 }'`" in one shell to watch the gnome-panel process 2) Selectively trimming (i.e., just adding 'return' statements to prevent code from being reached) where Orca was processing events and then rerunning Orca to see if gnome-panel grew It turns out that the check for event.source.state.count(atspi.Accessibility.STATE_DEFUNCT) in focus_tracking_presenter.py:_processObjectEvent was a contributor to the problem. By returning immediately after this call, gnome-panel grew. By returning immediately before it, gnome-panel remained the same size. When I then began trimming atspi.py:__get_state, it turns out that just the call to self.accessible.getState() was a contributor. Now...it's off to check out why this is causing a leak. I'm going to start by looking at some AT-SPI bugs: bug 375319, bug 427836, and bug 428007.
Created attachment 89901 [details] Standalone test case Here's a standlone test case that shows the problem. If you comment out line 253 (foo = e.source.getState()), gnome-panel's memory doesn't grow when you run this test case. If you uncomment the line, gnome-panel's memory slowly grows. Still digging and will be trying the latest AT-SPI stuff to see if it makes the problem go away.
Created attachment 89903 [details] Revised test case to make memory grow faster This test case is the same as the first except it makes the gnome-panel memory grow faster by calling getState() 1000 times for each event. It also tries to unref and del the result of the call to see if that solved any problems. It didn't. In addition, the problem continues to exist even with the latest AT-SPI. So, I'm guessing there's a leak in the atk somewhere.
If I make the following changes to the event-listener-test of at-spi, I can also reproduce the memory growth when I run the ensuing event-listener-test: wwalker@wwalker-laptop:~/at-spi/trunk/test$ svn diff Index: event-listener-test.c =================================================================== --- event-listener-test.c (revision 933) +++ event-listener-test.c (working copy) @@ -320,9 +320,12 @@ void report_event (const AccessibleEvent *event, void *user_data) { + int i; static long count = 0; char *s = Accessible_getName (event->source); - fprintf (stderr, "%s %s\n", event->type, s); + fprintf (stderr, "FOO %s %s\n", event->type, s); + for (i = 0; i < 1000; i++) + Accessible_getStateSet (event->source); if (s) SPI_freeString (s); if (count == 0) { g_timer_reset (timer);
When I investigate this more, and change the call from getState() to getRelationSet() in the python test case, I don't see the memory grow. I did that as a means to see if somehow the Python CORBA support was causing issues. So, I'm still leaning towards this being something on the application side. Li - this seems like there might be a leak in atk or gail. Do you have any ideas?
BTW, to give an indication of the severity of the problem: I think any call to getState() will cause the application to leak. Orca uses the state of Accessible's all over the place as a means to help Orca make decisions about things. A quick analysis of the Orca code shows about 137 references to ".state" in the code, each of which results in a call to getState(). When I add some debug to atspi.py:__get_state, I see there are typically about 3 calls to getState per event, and typically double that when Orca is actually trying to figure out what to present to the user. When we consider things such as the wireless network signal strength indicator applet are constantly emitting events, this really is a bad problem, and the most convenient way to recover is by logging out and logging back in. I'm marking this as "Urgent: This bug blocks usability of a large portion of the product, and really should be fixed before the next planned release." and "Critical: Crashes, causes loss of data, or is a severe memory leak."
Probably in at-spi/libspi/stateset.c, I will verify my patch when I am back to office tomorrow.
(In reply to comment #9) > Probably in at-spi/libspi/stateset.c, I will verify my patch when I am back to > office tomorrow. > Yeah Li! Thanks! Do you have a patch you can post for future reference?
I'd like to. But I find the patch only solve part of the problem, trying to find other leaks. And I will paste the final patch.
Sorry for update the bug so lately. I tried several ways but still can not exterminate leak. I ran valgrind with gedit, and found this leak is relative to getState. But I have no idea about how getState generates the leak. Maybe I need a little more time to look into it. ==9845== 354,003 (353,993 direct, 10 indirect) bytes in 8,852 blocks are definitely lost in loss record 234 of 237 ==9845== at 0x4021765: malloc (vg_replace_malloc.c:149) ==9845== by 0x4B27174: g_malloc (gmem.c:131) ==9845== by 0x4F3FAFF: ORBit_alloc_string (in /usr/lib/libORBit-2.so.0.1.0) ==9845== by 0x4F3F7B8: CORBA_string_dup (in /usr/lib/libORBit-2.so.0.1.0) ==9845== by 0x4F3F2FA: CORBA_exception_set (in /usr/lib/libORBit-2.so.0.1.0) ==9845== by 0x4F3F3A2: CORBA_exception_set_system (in /usr/lib/libORBit-2.so.0.1.0) ==9845== by 0x4F4D47D: (within /usr/lib/libORBit-2.so.0.1.0) ==9845== by 0x4F4E7C1: ORBit_handle_request (in /usr/lib/libORBit-2.so.0.1.0) ==9845== by 0x4F379E6: giop_connection_handle_input (in /usr/lib/libORBit-2.so.0.1.0) ==9845== by 0x4F5542C: (within /usr/lib/libORBit-2.so.0.1.0) ==9845== by 0x4F5831D: (within /usr/lib/libORBit-2.so.0.1.0) ==9845== by 0x4B1EFA3: g_main_dispatch (gmain.c:2061)
The full trace: ==17068== 335,643 (335,633 direct, 10 indirect) bytes in 8,393 blocks are definitely lost in loss record 230 of 234 ==17068== at 0x4021765: malloc (vg_replace_malloc.c:149) ==17068== by 0x4B27174: g_malloc (gmem.c:131) ==17068== by 0x4F44F65: ORBit_alloc_string (allocators.c:228) ==17068== by 0x4F4488A: CORBA_string_dup (corba-string.c:22) ==17068== by 0x4F441F0: CORBA_exception_set (corba-env.c:118) ==17068== by 0x4F4418B: CORBA_exception_set_system (corba-env.c:102) ==17068== by 0x4F53FAF: ORBit_POA_handle_request (poa.c:1555) ==17068== by 0x4F5915D: ORBit_handle_request (orbit-adaptor.c:298) ==17068== by 0x4F3B7E1: giop_connection_handle_input (giop-recv-buffer.c:1308) ==17068== by 0x4F61148: link_connection_io_handler (linc-connection.c:1387) ==17068== by 0x4F6389F: link_source_dispatch (linc-source.c:159) ==17068== by 0x4B1EFA3: g_main_dispatch (gmain.c:2061)
Hi Michael, do you have any idea about this?
Ha, I got an explanation. There is a memory leak in registryd, and a simple patch can fix it: Index: libspi/accessible.c =================================================================== --- libspi/accessible.c (revision 933) +++ libspi/accessible.c (working copy) @@ -331,6 +331,10 @@ bonobo_return_val_if_fail (object != NULL, NULL, ev); + old_set = g_object_get_data (object, "atk_state_set"); + if (old_set) + g_object_unref (old_set); + atk_set = atk_object_ref_state_set (object); set = spi_state_set_new (atk_set); @@ -338,6 +342,7 @@ BONOBO_OBJREF(set), ev); + g_object_set_data (object, "atk_state_set", atk_set); return retval; } But after applied the patch, the memory is still growing when I ran the python script. Thus I found the trace I pasted yesterday. It seems like a CORBA momery leak when there is an exception (maybe it is not a leak, I'm not very familiar with that section of code in CORBA). I looked into the script, it tries to unref the state until there is an exception. So I changed the script to: --- bug_446277.py 2007-06-26 16:38:24.000000000 +0800 +++ bug_446277_modified.py 2007-06-26 16:22:39.000000000 +0800 @@ -252,11 +252,8 @@ def notifyEvent(e): for i in range(0, 100): foo = e.source.getState() - while True: - try: - foo.unref() - except: - break + foo.unref() + foo.unref() del foo pass I unref the state twice because I know the ref_count will be 2. After this, the memory just stop growing. So I think my original patch can fix the leak.
(In reply to comment #15) > Ha, I got an explanation. There is a memory leak in registryd, and a simple > patch can fix it: Yeah! Thanks Li! > --- bug_446277.py 2007-06-26 16:38:24.000000000 +0800 > +++ bug_446277_modified.py 2007-06-26 16:22:39.000000000 +0800 > @@ -252,11 +252,8 @@ > def notifyEvent(e): > for i in range(0, 100): > foo = e.source.getState() > - while True: > - try: > - foo.unref() > - except: > - break > + foo.unref() > + foo.unref() > del foo > pass > > I unref the state twice because I know the ref_count will be 2. > After this, the memory just stop growing. So I think my original patch can fix > the leak. With your patch, if you remove the unref()'s altogether, does the leak still occur? I'm trying to figure out when we have to call unref --- I'm assuming we shouldn't have to.
I think the best way is to unref it on the Orca side. Because at-spi doesn't know when Orca doesn't use the state anymore.
Oh sorry, I answer your question in a wrong order :) Yes, we need to unref it to avoid leak. If Orca doesn't want to unref the state, at-spi can unref it when the next time Orca calls getState on the same object. Does this sound OK? The patch will be: Index: libspi/accessible.c =================================================================== --- libspi/accessible.c (revision 933) +++ libspi/accessible.c (working copy) @@ -325,12 +325,21 @@ CORBA_Environment *ev) { AtkObject *object = get_atkobject_from_servant (servant); - AtkStateSet *atk_set; + AtkStateSet *atk_set, *old_set; SpiStateSet *set; - Accessibility_StateSet retval; + Accessibility_StateSet retval, old_retval; bonobo_return_val_if_fail (object != NULL, NULL, ev); + old_set = g_object_get_data (object, "spi_state_set"); + if (old_set) + g_object_unref (old_set); + + old_retval = g_object_get_data (object, "return_state"); + if (old_retval) { + Bonobo_Unknown_unref (old_retval, ev); + Bonobo_Unknown_unref (old_retval, ev); + } atk_set = atk_object_ref_state_set (object); set = spi_state_set_new (atk_set); @@ -338,6 +347,8 @@ BONOBO_OBJREF(set), ev); + g_object_set_data (object, "spi_state_set", atk_set); + g_object_set_data (object, "return_state", retval); return retval; }
I'm not sure of the right thing to do here. Maybe Michael Meeks can help us? Michael?
What about at-spi reduces the ref_count to 1, and Orca unref it once? I think it is reasonable because only Orca itself knows when it doesn't use the state anymore.
So - yes, this is a rather simple resource lifecycle issue. If you call 'getStateSet' then you have to unref the handle later - period :-) It is of course, unfortunate that the StateSet is an object reference instead of a blob of data we can throw over in one chunk, but ... an unfortunate mistake of API design that can't easily be helped now [ burns valuable round-trips too ]
(In reply to comment #21) > So - yes, this is a rather simple resource lifecycle issue. If you call > 'getStateSet' then you have to unref the handle later - period :-) > > It is of course, unfortunate that the StateSet is an object reference instead > of a blob of data we can throw over in one chunk, but ... an unfortunate > mistake of API design that can't easily be helped now [ burns valuable > round-trips too ] > Yikes! I'm going to get getRelationSet has the same problem. Is the unref'ing of this kind of thing something that can be done by PyORBit (e.g., in the destructor for these objects)?
Willie - basically no; the ORB has no knowledge of such reference counts: indeed CORBA has no lifecycle awareness at all. IIWY I would wrap the 'getRelationSet' call - and return an 'OrcaRelationSet' (or whatever) that does unref the object on dispose. Of course - the same problem holds true for ~all handles to remote Accessible objects, so ... this question is a little concerning ;-) - it cuts to the heart of the a11y / lifecycle debate :-) [ in which I contend that using reference counting was a "really bad idea" (TM) ;-].
Created attachment 91486 [details] [review] at-spi patch
(In reply to comment #24) > Created an attachment (id=91486) [edit] > at-spi patch > Thanks! Do we need something similar for impl_accessibility_accessible_get_relation_set as well?
yes - you need to unref each member of the sequence.
The StateSet code for Orca does this: def __get_state(self): """Returns the Accessible StateSeq of this object, which is a sequence of Accessible StateTypes. """ try: stateSet = self.accessible.getState() except: stateSet = None if stateSet: try: state = stateSet._narrow(Accessibility.StateSet).getStates() except: state = [] else: state = [] # [[[WDW - we don't seem to always get appropriate state changed # information, so we will not cache state information.]]] # #if state and settings.cacheValues: # self.state = state return state I'm guessing we could just do a "stateSet.unref()" right before the "return state". Does that seem right? Do we also need to unref the 'state' thing, too? In addition, the RelationSet code for Orca does this: def __get_relations(self): """Returns the Accessible RelationSet of this object as a list. """ relations = [] relationSet = self.accessible.getRelationSet() for relation in relationSet: try: relations.append(relation._narrow(Accessibility.Relation)) except: pass return relations I'm guessing the same thing applies -- we could just do a "relationSet.unref()" right before the "return relations". Does this seem right? Do we also need to unref each element of relations after we've used it?
The stateset unref code looks fine - yes. Unfortunately the relationSet is itself a sequence; you can't unref that. The sequence however contains a set of 'Relation' objects - each of which need to be unreffed when 'relations' is itself freed: so if you can handle the lifecycle of 'relations' and unref each item in that on dispose you'll have it nailed I think. The answers to your questions though all live in at-spi/libspi/accessible.c and friends - and a few moments thought from the IDL :-)
Reassigning this back to Orca for the rest of the work.
Michael Meeks was very generous with his time at GUADEC and we had a good chat about this. The result is a summary of considerations I think we might need to make to reduce/eliminate leaks we might be causing in applications. It all tends to revolve around us not doing unref()'s on objects when we need to. We should only do ref() on a non-primitive type (e.g., a CORBA Accessible) if we got it via an event and we expect it to persist after we handle the event. If we get non-primitive types in other ways (e.g., method calls and field references), the ref() is implicitly done for us. This also includes the result of calls to queryInterface. We always need to unref() non-primitive types that we get via method calls that we make (e.g., obj.getChildAtIndex(i)). We also need to unref() fields that are non-primitive types (e.g., obj.parent) in objects we get via method calls. So...this, includes anything where we might get an Accessible. This includes the following method calls, and I suspect we can tackle most of this issue via our 'makeAccessible' method: add a incrementRefCount=False parameter to this method and only set it to True if we are calling this with an object we get via an Event. The Accessible.__del__ method will take care of the unref() for us. Accessible accessible.parent Accessible accessible.getChildAtIndex Accessible accessible.component.getAccessibleAtPoint Accessible accessible.hyperlink.getObject Accessible accessible.selection.getSelectedChild Accessible accessible.table.caption Accessible accessible.table.summary Accessible accessible.table.getAccessibleAt Accessible accessible.table.getRowHeader Accessible accessible.table.getColumnHeader Application accessible.getApplication We also need to free up non-primitive types we get directly. These include: The objects returned by all calls to queryInterface. We probably can check for these in the accessible's dictionary in the Accessible.__del__ method. The specific fields are: action, component, hyperlink, hypertext, image, selection, document, table, text, value StateSet accessible.getState -> Probably can fix this by calling stateSet.unref() just before returning from atspi.py:__get_state. BoundingBox accessible.component.getExtents -> We might consider copying the BoundingBox fields to a local object and then unref'ing the original. The problem is that this is a call on the component interface and not the accessible itself. We don't have a wrapper for a component. BoundingBox accessible.image.getImageExtents -> We might consider copying the BoundingBox fields to a local object and then unref'ing the original. The problem is that this is a call on the image interface and not the accessible itself. We don't have a wrapper for an image. Desktop registry.getDesktop -> Probably can fix this by creating a __del__ for atspi.py:Registry and doing an unref() of registry.desktop in it. DevEvtCtrlr registry.getDeviceEventController -> This is used in atspi.py:KeystrokeListener. Probably should save a single reference to the result of "d = self.registry.getDeviceEventController" and then do an unref() on it in the __del__ method for atspi.py:KeystrokeListener. RelationSet accessible.getRelationSet -> We don't need to unref() the result of the call (it's an array). Instead, we need to unref each Relation. I'm not sure of the best way to do this. We might consider just providing an atspi.py:deleteRelationSet or something like that that does the unref()'ing for us. DesktopSeq registry.getDesktopList -> Same command as RelationSet (need to unref each Desktop) RangeList accessible.text.getBoundedRanges -> Same comment as RelationSet (need to unref each Range), though we might copy each of the Range fields to a local object and then unref() the original. The problem, however, is similar to that of component.getExtents and image.getImageExtents -- we're working with the text interface and not a wrapped accessible.
We are going to postpone working on this bug until after the pyatspi integration is done. Then, Eitan will look at this for us. :-)
Created attachment 98203 [details] Test script This will probably only work in an English locale. The script takes an accessible object in gedit's "open files" dialog which has a relation and hammers it with getState() and getRelation() 100,000 times each. To use this test script: 1. Run gedit 2. Do ctrl+o to get "Open Files" dialog 3. To see gedit's memory usage run in a terminal: top -p `ps -elf | grep gedit | grep -v grep | awk '{ print $4 }'` 4. Execute given script. 5. Watch gedit's memory usage quadruple.
See bug 491845 for a fix that solves the test case above. Through trial and error I found out that everything needs to be unreffed twice. Any ideas why?
(In reply to comment #33) > See bug 491845 for a fix that solves the test case above. Through trial and > error I found out that everything needs to be unreffed twice. Any ideas why? In comment #15, Li also noticed the need to unref twice. Comment #30 might contain some information about this. In particular: "We should only do ref() on a non-primitive type (e.g., a CORBA Accessible) if we got it via an event and we expect it to persist after we handle the event. If we get non-primitive types in other ways (e.g., method calls and field references), the ref() is implicitly done for us. This also includes the result of calls to queryInterface. We always need to unref() non-primitive types that we get via method calls that we make (e.g., obj.getChildAtIndex(i)). We also need to unref() fields that are non-primitive types (e.g., obj.parent) in objects we get via method calls." So...if this is true, maybe pyatspi is doing one too many refs? However, maybe some debugging using the new ref count feature provided by bug 471391 will shed some insight on the accuracy of when things are already ref'd for us and when they are not.
Yep, pyatspi was doing one too many refs, it was a result of my erroneous patch in bug #490202. So StateSet only needs to be unreffed once now. For some reason Relations still need two unrefs(), anyone know why?
*** Bug 491845 has been marked as a duplicate of this bug. ***
Created attachment 98345 [details] [review] Proposed pyatspi fix Just copied this over from bug #491845.
Created attachment 98346 [details] [review] Proposed pyatspi fix The previous patch's changes were commented out :P Ok, last spam for today. The next post will be interesting.
The leakage existed prior to the pyatpi migration, so I'm removing this as a blocker for pyatspi migration bug 448848. It's still a blocker for performance bug 491756.
(In reply to comment #38) > Created an attachment (id=98346) [edit] > Proposed pyatspi fix This patch does a double unref() of the relation, which seems strange. Eitan and I discussed this today, and we're perplexed about where the extra ref is coming from. Michael and Li, we could use your expertise on this one. Li fixed an AccessibleState reference problem with the patch at http://bugzilla.gnome.org/attachment.cgi?id=91486. I'm wondering if something similar might be needed for AccessibleRelation stuff. Michael and Li, here's a link to accessible.c. The AccessibleState code starts at line 322 and the AccessibleRelation code starts at line 349. Would you be able to take a look and offer us some guidance? http://svn.gnome.org/viewvc/at-spi/trunk/libspi/accessible.c?view=annotate In particular, I'm wondering if anything glaring might be (or not be) in impl_accessibility_accessible_get_relation_set that might the source of the double ref count. Thanks! Will
From private e-mail -- This Member's Excellent Engineering Kickbutt Summary helps a lot. I'm withholding the name in the event this member doesn't want to be identified as the goto guy for speaking into the Bonobo/ORBit Michaelrophone: "The bug looks like it is in at-spi/libspi/accessible.c: retval->_buffer[i] = bonobo_object_dup_ref ( BONOBO_OBJREF ( spi_relation_new (atk_relation_set_get_relation(relation_set, i))), ev); Is broken - we create the relation with 1 ref; so this incs that to two; instead we should use: CORBA_Object_duplicate (BONOBO_OBJREF (spi_relation_new...), NULL); but not object_dup_ref." Thanks, Mysterious Enquiring Enigma of Keen Smarts! :-) Eitan, can you give this a shot with your test environment?
Hi Will, I made the patch. But I found no matter if I apply the patch, when I ran the test script, memory grows when getState, and stop growing when getRelationSet. I am a little confused about the bug, are we leaking memory from getState but not getRelationset? Or am I doing something wrong?
Created attachment 98989 [details] Test script This test script uses a better sample accessible for the test "Character Coding:". The "Location:" label is not always visible, so it has no relations.
Created attachment 99155 [details] [review] Fix This fixes the unref() issue for both StateSet and Relation
It seems as though this bug probably can be closed now. Thoughts?
This is fixed in at-spi head (both at-spi proper, and in pyatspi). We should see this in the next at-spi dev release.