GNOME Bugzilla – Bug 435199
Orca is bloating the swap partition, so the system is no more usable after a short time.
Last modified: 2008-07-22 19:27:38 UTC
Please describe the problem: While working with Orca and different programs a huge amount of data gets swapped. The system becomes more and more sluggish and in the end it is no more usable. I have to reboot my computer, or restart my X-session. Steps to reproduce: 1. Start Orca 2. Start several applications 3. Do a lot of work, writing and readign mail, browsing the web, reconfigure the Orca settings several times (general and application specific). Use programs that need a larger amount of memory, such as Open Office. Actual results: The system slows down, and in the end it does not respond and has to be rebooted. Expected results: The system should simply do the work without slowing down. Does this happen every time? It happens all the time. Other information: It happens since I've upgrated to Ubuntu Feisty. On Edgy there was a certain sluggishness after a few hours of work, but it cannot be compared to the behavior Feisty and Orca show after upgrading. I run a 1 Ghz machine with 240 MB RAM. I use Orca with braille and speech, but without a magnifier. I use Espeak as speech output with the Gnome-speech dirver.
I'm not sure it's just Orca that's bloating the swap partition. I think you'll find that it's other applications like Firefox, OpenOffice, Xorg and the speech synthesis driver: On my current system (Feisty, GNOME 2.18 and latest Orca), using top, I can see I have: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5305 richb 15 0 714m 188m 30m S 4.6 18.8 6:47.78 firefox-bin 4469 root 15 0 496m 143m 11m S 3.7 14.3 12:34.74 Xorg 4182 richb 15 0 251m 11m 8916 S 3.0 1.1 0:00.20 swift-synthesis 4178 richb 15 0 257m 28m 12m S 9.0 2.9 0:02.77 python ... Having just 240Mb's to run today's modern desktop is going to result in a lot of swapping.
Another thing that I think will help here is when Rich changes the default calc navigation reading mode from row to cell. I think I remember that this was increasing Orca's memory usage quite a bit.
I see Bugzilla is back now. The force-OOo-to-always-start-reading-by-cell change is checked in but has had minimal effect on this problem. This still needs further investigation.
> Having just 240Mb's to run today's modern desktop is going to result > in a lot of swapping. Yes, this is definitely very little memory. I believe it is at or below Ubuntu's recommended minimum, and that is for machines that are not running assistive technologies. In comparing numbers with Rich on the phone, I was not able to reproduce the problems Rich was seeing. But, in looking at the numbers from comment #1, it's easy to see that swapping will definitely occur. However, given that you did not seem to experience this problem with Edgy, but you experience it with Feisty, leads to some interesting thinking. Almost *everything* on the desktop changed between Edgy and Feisty. I'm curious what data has led to the conclusion that this is Orca causing the problem and not something else. If possible, would you be able to do some testing for us? Stay on Feisty, but try reverting to the Orca that you were using on Edgy and see if the problem goes away. With all things being the same, except for the version of Orca in use, we might be able to learn more.
Created attachment 89195 [details] [review] Patch to add a "dump memory stats" keyboard handler. Commentary to follow.
Created attachment 89196 [details] Output from a debug run testing OOo Writer with the "dump memory stats" keyboard handler. I've attached a patch that add in a keyboard handler for Insert-F12, that calls dumpMemoryStats() in default.py. Currently in there, there are print statements for: 1/ Printing the size of the accessible cache. 2/ Printing the number of known scripts. I then ran this (on my Ubuntu Feisty 64bit system) and did the following (see attached results): 1/ Started up Orca, pressed F12. dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 73 Number of known scripts: 6 dumpMemoryStats-----------------END--------------------- 2/ Started up OOo Writer, pressed F12. dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 448 Number of known scripts: 10 dumpMemoryStats-----------------END--------------------- 3/ Typed "a" into the OOo Writer window, pressed Alt-f, Up, Return, Tab to quit OOo Writer without saving the changed file, then pressed Insert-F12. dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 139 Number of known scripts: 6 dumpMemoryStats-----------------END--------------------- 4/ Repeated steps 2 and 3 dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 146 Number of known scripts: 6 dumpMemoryStats-----------------END--------------------- 5/ Repeated step 2 dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 460 Number of known scripts: 8 dumpMemoryStats-----------------END--------------------- 6/ Repeated step 3 dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 147 Number of known scripts: 6 dumpMemoryStats-----------------END--------------------- 7/ Repeated steps 2 and 3 dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 148 Number of known scripts: 6 dumpMemoryStats-----------------END--------------------- From this, I deduce that we are "leaking" one Accessible object each run, and the scripts are not "leaking". My intent is to do a similar exercise with the latest Firefox 3.0 build on my Ubuntu Feisty 32bit PC, and report the findings. Then I can add more debug in to try to track exactly what's going on.
Created attachment 89201 [details] Output from a debug run testing Firefox 3.0alpha with the "dump memory stats" keyboard handler. Well this is interesting. I tested the latest Firefox 3.0alpha nightly with a version of Orca with the "dump memory stats" keybinding patch (on my Ubuntu Feisty 32bit PC) and got the following: 1/ Started up Orca, pressed F12. dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 45 Number of known scripts: 4 dumpMemoryStats-----------------END--------------------- 2/ Started up Firefox (previous set to display www.google.com), and pressed F12. dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 88 Number of known scripts: 8 dumpMemoryStats-----------------END--------------------- 3/ Just typed Alt-f, Up, Return and Return to quit Firefox without doing anything, then pressed Insert-F12. dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 86 Number of known scripts: 8 dumpMemoryStats-----------------END--------------------- 4/ Repeated step 2 dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 105 Number of known scripts: 8 dumpMemoryStats-----------------END--------------------- 5/ Repeated step 3 dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 105 Number of known scripts: 8 dumpMemoryStats-----------------END--------------------- 6/ Repeated step 2 dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 129 Number of known scripts: 8 dumpMemoryStats-----------------END--------------------- 7/ Repeated step 3 dumpMemoryStats-----------------START--------------------- Size of Accessible._cache: 129 Number of known scripts: 8 dumpMemoryStats-----------------END--------------------- From this, I deduce that we are "leaking" at least 14 Accessible objects each run, and that's without really doing anything. The next step is to add in more debug to dumpMemoryStats() to try to find out exactly what isn't being removed from the _cache each time Firefox terminates.
Created attachment 89344 [details] [review] Revised patch to get more debug info. The revised patch makes the following changes: * In atspi.py, now keep a _cacheToStrings() dictionary as well, that contains <obj>.toString() values for each of the entries in the _cache dictionary (same key for both). * In default.py, extended the debug output in the dumpMemoryStats() routine. It wraps a try/except clause around trying to access each obj that is now in the _cache, but not in the _memoryStatsCache. It uses the values in the _cacheToStrings() dictionary to print out more useful information on each of these new _cache objects. * In focus_tracking_presenter.py, added a _cleanupCache() routine that gets called in _processObjectEvent() if we've just received a "object:children-changed:remove" event for the desktop.
Created attachment 89345 [details] Orca debug output using the revised patch. Here's the Orca debug output generated for a rerun of the previous tests with the new patch in place. (I should also note that the previous increase of 14-15 objects in the _cache was a user error. I was clicking on the desktop background after Firefox had exited and before I typed Insert-F12, and that was generating new _cache object for Nautilus.) From my new observations, there are only 1-2 bogus (DEFUNCT) objects left in the _cache after Firefox exits, and the new _cleanupCache() routine seems to be um, cleaning up the cache nicely. There are a few other interesting things going on here though. 1/ Everytime I startup Firefox, I get a load of COMM_FAILURE's. See lines 94-224 for example. Firefox still comes up fine, and Orca handles it, but why are we getting COMM_FAILURE's? 2/ When I do an Alt-f, Up, Return to quit Firefox, I get a traceback each time. See lines 601-622 for example. Looks like there needs to be a check for a None in the Gecko.py script. 3/ Here are some of objects that COMM_FAILURE's were caught for, in the dumpMemoryStats() routine: * Line 323: app.name='Minefield' name=None role='list' state='ENABLED FOCUSABLE OPAQUE SENSITIVE SHOWING VERTICAL VISIBLE' relations='' * Line 349: app.name='Minefield' name=None role='list' state='ENABLED OPAQUE SELECTABLE SENSITIVE SHOWING VERTICAL VISIBLE' relations='' * Line 460: app.name='Minefield' name=None role='list' state='ENABLED HORIZONTAL OPAQUE SELECTABLE SENSITIVE SHOWING VISIBLE' relations='' * Line 664: app.name=None name=None role='invalid' state='DEFUNCT' relations=' * Line 690: app.name='Minefield' name=None role='tool bar' state='ENABLED HORIZONTAL OPAQUE SENSITIVE SHOWING VISIBLE' relations='' * Line 940: app.name='Minefield' name=None role='list' state='ENABLED OPAQUE SELECTABLE SENSITIVE SHOWING VERTICAL VISIBLE' relations='' * Line 1096: app.name='Minefield' name=None role='list' state='ENABLED HORIZONTAL OPAQUE SELECTABLE SENSITIVE SHOWING VISIBLE' relations='' * Line 1142: app.name='Minefield' name=None role='list' state='ENABLED FOCUSABLE OPAQUE SENSITIVE SHOWING VERTICAL VISIBLE' relations=' * Line 1520: app.name='Minefield' name=None role='list' state='ENABLED OPAQUE SELECTABLE SENSITIVE SHOWING VERTICAL VISIBLE' relations='' * Line 1691: app.name='Minefield' name=None role='list' state='ENABLED HORIZONTAL OPAQUE SELECTABLE SENSITIVE SHOWING VISIBLE' relations='' * Line 1712: app.name='Minefield' name=None role='list' state='ENABLED FOCUSABLE OPAQUE SENSITIVE SHOWING VERTICAL VISIBLE' relations='' 4/ There are lots of other new items found in the _cache that weren't in the previous ._memoryStatsCache, but which don't cause COMM_FAILURES. Look for lines that begin with: "New item found".
Sorry, should have added [will] and [joanie] for their thoughts on the points above.
> 1/ Everytime I startup Firefox, I get a load of COMM_FAILURE's. See > lines 94-224 for example. Firefox still comes up fine, and Orca handles > it, but why are we getting COMM_FAILURE's? From your debug.out and some follow-up experimenting on my own, I'm wondering if it's due to the way Firefox chooses to deactivate windows. When I get the COMM_FAILURE's, it seems that Orca is trying to process an event for an object that was in the deactivated/defunct window. So I used Accerciser to compare deactivate events from a couple of other apps (gnome-terminal and Evolution) with those from the Profile Manager and Minefield. Good deactivates: window:deactivate(0, 0, jd@blockhead: ~) source: [frame | jd@blockhead: ~] application: [application | gnome-terminal] window:deactivate(0, 0, Inbox (32 total, 20 unread) - Evolution) source: [frame | Inbox (32 total, 20 unread) - Evolution] application: [application | evolution-2.10] Bad deactivates? window:deactivate(0, 0, Minefield - Choose User Profile) source: [invalid | ] application: [DEAD] window:deactivate(0, 0, Minefield) source: [invalid | ] application: [application | Minefield] How are we -- are we -- dealing with these "invalid" objects? > 2/ When I do an Alt-f, Up, Return to quit Firefox, I get a traceback each > time. See lines 601-622 for example. Looks like there needs to be a check > for a None in the Gecko.py script. Hopefully taken care of. Thanks!!
This is good work, and I think your "cleanup cache" patch is good -- please check it in. Based upon your experiments, it also seems as though we have come to a conclusion that Orca is doing anything bad to cause the bloating of the swap partition. It may really be what we've thought all along (i.e., the machine in use has way too little RAM). However, I think we should try to do a few more sanity checks to be sure. The Python gc module (http://docs.python.org/lib/module-gc.html) seems to provide some decent information. It seems to have a method for detecting leaks, which might be interesting: import gc gc.set_debug(gc.DEBUG_LEAK) I've not played with this, but the docs lead me to believe messages regarding leaks would be sent to sys.stderr. These messages might yield information for us. In looking at the docs, the note regarding the 'garbage' field is interesting. "garbage A list of objects which the collector found to be unreachable but could not be freed (uncollectable objects). By default, this list contains only objects with __del__() methods.26.1Objects that have __del__() methods and are part of a reference cycle cause the entire reference cycle to be uncollectable, including objects not necessarily in the cycle but reachable only from it. Python doesn't collect such cycles automatically because, in general, it isn't possible for Python to guess a safe order in which to run the __del__() methods. If you know a safe order, you can force the issue by examining the garbage list, and explicitly breaking cycles due to your objects within the list. Note that these objects are kept alive even so by virtue of being in the garbage list, so they should be removed from garbage too. For example, after breaking cycles, do del gc.garbage[:] to empty the list. It's generally better to avoid the issue by not creating cycles containing objects with __del__() methods, and garbage can be examined in that case to verify that no such cycles are being created." atspi.py:Accessible has a __del__ method, so I wonder if it might be causing garbage to gather. If so, we might consider removing the __del__ method, whose main purpose is to unref the CORBA object, and moving the CORBA unref somewhere else (deleteAccessible?)
> Based upon your experiments, it also seems as though we have come to a > conclusion that Orca is doing anything bad to cause the bloating of the swap > partition. s/is/isn't/ -- sorry
"cleanup cache" patch committed. I'll look into the gc stuff. Thanks.
(In reply to comment #11) > Good deactivates: ... > Bad deactivates? ... > How are we -- are we -- dealing with these "invalid" objects? We try to deal with COMM_FAILUREs and DEFUNCT objects n focus_tracking_presenter.py:_processObjectEvent. I don't think we deal with objects of role 'invalid', though. Instead, we probably handle them indirectly via the COMM_FAILURE handling. If we want to handle them directly, we might try looking for the 'invalid' role somewhere around the same place we handle DEFUNCT objects in _processObjectEvent and do the same thing -- delete the Accessible and return.
Created attachment 89499 [details] Compressed Orca debug output with gc debug turned on. I added: import gc gc.set_debug(gc.DEBUG_LEAK) near the beginning of orca.py and reinstalled Orca. I then ran the Firefox tests again. The generated output file is attached above. I compressed it to same some space. If you grab a copy of it and unpack it, you'll see the following interesting things: * A stack trace starting about line 531 when I Alt-f, "Q" to quit Firefox: ...
+ Trace 138834
obj.parent.child(index),
debug.printStack(debug.LEVEL_WARNING)
traceback.print_stack(None, 100, debugFile)
* Another stack trace right after it at line 569: ... Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/orca/Gecko.py", line 5172, in findPreviousCaretInOrder includeNonText) File "/usr/lib/python2.5/site-packages/orca/Gecko.py", line 5139, in findPreviousCaretInOrder elif obj.childCount and obj.child(obj.childCount - 1): AttributeError: 'NoneType' object has no attribute 'childCount' * gc debug output starting at line 657. The uncollectable lines start at line 4880: ... gc: uncollectable <Accessible instance at 0x8af992c> gc: uncollectable <dict 0x8b472d4> gc: uncollectable <Accessible 0x8b464c4> gc: uncollectable <Accessible 0x8ad2a94> gc: uncollectable <Text 0x8ad2e9c> gc: uncollectable <Hyperlink 0x8b46284> gc: uncollectable <Component 0x8b460bc> * Similar gc output at the very end of the Orca run (line 9263): ... gc: uncollectable <Accessible instance at 0x8b718ac> gc: uncollectable <dict 0x8b6f3e4> gc: uncollectable <Accessible 0x8b72404> gc: uncollectable <Accessible 0x8b46e24> gc: uncollectable <Text 0x8b7268c> gc: uncollectable <Hyperlink 0x8b725e4> gc: uncollectable <Component 0x8b726ec>
I tried with a full debug.out. Focus was in the Google entry. After pressing Return on Quit, I got a focus: event for the entry with all of the normal states, as if the entry were still there. During the processing of that event, I suspect its non-existence caught up with itself hence: Child at index 0 is not an Accessible Fixing the subsequent NoneType error is easy. But what (if anything) do we want to do about the fact that we're getting this focus: event?
> Fixing the subsequent NoneType error is easy. But what (if anything) do we > want to do about the fact that we're getting this focus: event? I think we need to open a separate bug for that one and focus on the memory usage problem in this bug. That is, unless you think the two are related.
I've opened bug #444860 on the Orca Firefox traceback(s) problem.
Will has been looking at the gc uncollected items, and has some items. Transferring to him for further analysis.
I'm still experimenting, but what I've noticed is this: 1) Based upon the work Rich did, we do seem to be dropping our references to Accessible's and Scripts as expected. This is good. 2) I've written some experimental code to analyze the Python garbage space for Accessible's and Scripts. What I'm finding is that the Script instances do not seem to be collected, meaning there is probably some cyclical reference somewhere. I'm not sure exactly what is going on, and it's going to take some more digging and looking at referrers and referents on the heap.
> 2) I've written some experimental code to analyze the Python garbage space for > Accessible's and Scripts. What I'm finding is that the Script instances do not > seem to be collected, meaning there is probably some cyclical reference > somewhere. I'm no longer able to reproduce this. :-( I'll keep banging, though.
Created attachment 89738 [details] [review] Patch to add utilities for debugging memory usage This patch adds Orca+Ctrl+F8 and Orca+Shift+Ctrl+F8 to get brief and detailed memory statistics dumped to the console if settings.debugMemoryUsage = True. I'm still trying to track down why the garbage collector doesn't seem to be collecting Script instances. The test case is to start Orca, start gcalctool, quit gcalctool, and then look at the memory usage using the F8 sequences above.
Created attachment 89917 [details] [review] Patch to help break self reference cycle of Accessible's for applications After beating on the only leak I could find and looking at it from many different angles, I think it all boiled down to this: * All Accessible instances refer to their applications (i.e., obj.app) * All applications are Accessible instances * The Accessible class has a __del__ method The result of the first two points is that Accessible instances for applications refer to themselves. This causes a reference cycle. The presence of the __del__ method for Accessible's means the Python garbage collector treats them specially. From http://docs.python.org/lib/module-gc.html: "Objects that have __del__() methods and are part of a reference cycle cause the entire reference cycle to be uncollectable, including objects not necessarily in the cycle but reachable only from it. Python doesn't collect such cycles automatically because, in general, it isn't possible for Python to guess a safe order in which to run the __del__() methods." Thus, Accessible's for applications are deemed uncollectable. I was able to fix this problem via a one line in focus_tracking_presenter.py:_reclaimScripts: just set app.app = None before deleting it. This breaks the cycle. Now, as for the comment that opened this whole thing, I think we've done a pretty good analysis here. From what I can tell, we couldn't find any major leaks in Orca that contribute to the problem, but we found some minor leaks and fixed them. The most severe leak when Orca is in use seems to be described in bug 446277, which I'm not really sure is an Orca bug. I've asked Li Yuan from the AT-SPI infrastructure side to help us out with this since I suspect it is an infrastructure problem. I'm tempted to close this bug as FIXED and keep 446277 open as the one to track. Thoughts?
I'm going to close this one out. I think I've fixed any Orca leaks I've been able to find, and the infrastructure leak is being handled in bug 446277. If new leaks are detected in the future, please open a new bug with specific application details. Thanks everyone! Extra thanks to Rich for his work on this problem, too.