GNOME Bugzilla – Bug 491756
Performance Metabug
Last modified: 2013-01-02 15:53:53 UTC
This bug is used for tracking all performance problems with Orca: speed, cpu, etc.
Created attachment 99159 [details] [review] reduce getApplication() calls This patch utilizes the host_application field in events to get the source's application. We do this for almost every event, so it could help.
Created attachment 99161 [details] [review] Make a ring buffer composing of passes locus of focuses This patch adds a ring buffer of five elements that keeps a reference to the last five locus of focuses. This helps with pyatspi's caching scheme which is based on references to local accessible objects. This should help improve the caching of context.
(In reply to comment #1) > Created an attachment (id=99159) [edit] > reduce getApplication() calls > > This patch utilizes the host_application field in events to get the source's > application. We do this for almost every event, so it could help. > This looks good to me. The assumption being made is that since people will need the latest pyatspi to work with Orca, we will definitely be getting the 'new' style of AT-SPI events that have the host_application field, right?
> This patch adds a ring buffer of five elements that keeps a reference to the > last five locus of focuses. This helps with pyatspi's caching scheme which is > based on references to local accessible objects. This should help improve the > caching of context. Ha! This one also squeaks in the "<= 256" patch. I think that one should go in regardless, though it probably should be "<= 255" or "< 256" so we're only creating masks with 8 significant bits. For the ring buffer, it would be cool if we could a orca.settings property that we could monkey with to set the size of the list rather than hardcoding it to 4. That way, we might be able to quickly experiment with how different numbers help us and/or whether or not the ring buffer idea is even useful.
> This looks good to me. The assumption being made is that since people will > need the latest pyatspi to work with Orca, we will definitely be getting the > 'new' style of AT-SPI events that have the host_application field, right? > No assumptions are really made, it uses host_application if it is not None, and if it is it does getApplication(). The host_application field has been in pyatspi Event wrappers since I remember.
> Ha! This one also squeaks in the "<= 256" patch. I think that one should go > in regardless, though it probably should be "<= 255" or "< 256" so we're only > creating masks with 8 significant bits. > I'll check that in today. > For the ring buffer, it would be cool if we could a orca.settings property that > we could monkey with to set the size of the list rather than hardcoding it to > 4. That way, we might be able to quickly experiment with how different numbers > help us and/or whether or not the ring buffer idea is even useful. > I'll do that, and attach a new patch.
Created attachment 99178 [details] [review] Limit amount of script listeners This patch assigns specific listeners for state-changed events. Also, every listener that was assigned to noOp no only gets assigned if the debug level is high enough.
Created attachment 99179 [details] [review] Make a ring buffer composing of passes locus of focuses This takes out the unrelated mask thing above. It also makes a setting called settings.focusHistoryLength for adjusting the buffer size.
Created attachment 99183 [details] Test results These are numbers of orca running with three consecutive tests from gtk-demo: role_menu.py, role_tree_table.py and role_combo_box2.py. I compared Pre-pyatspi Orca, Orca from trunk (with the new pyatspi caching changes), and Orca with all of the patches that are in this bug report. This is the sum of call counts/times: Pre-Pyatspi Orca trunk Orca patched 24541 3.5692630000 44009 5.9319880000 18899 2.9330390000
Created attachment 99184 [details] Test results Disregard the last results. It was my first attempt, so of course I failed. Anyway here are more "sane" results: Pre-Pyatspi Orca Trunk Orca Patched 24541 3.5692630000 44009 5.9319880000 29473 3.8884460000
Two patches look great. I need to look at the other one, which is probably the most important one, just a little more. The main concerns I have with it are: 1) It modifies behavior based upon debug level. I think it's probably OK to do this, but I wonder if we might end up in situations where we might be confused about getting/handling events one way, but not the other. I wonder if maybe we need a different setting (e.g., settings.optimizeEventListeners or something like that) to do the selective listening? 2) I don't see a way to turn on listening for all object:state-changed events -- we will also use the Orca debug logs as a means to help reverse engineer what the application is giving us, so it is nice to get as much as we can when we want it. Is there a way for us to listen for all object:state-changed events based upon a debug level setting? Anyhow, getting us faster is a good thing. :-) Good stuff.
Created attachment 99806 [details] [review] Limit listeners The new patch introduces settings.listenAllEvents, if set to True it will listen to all the useless events, except for "mouse:*" and whatever is in settings.ignoredEventsList. This should address both of the issues you brought up, Will.
(In reply to comment #12) > Created an attachment (id=99806) [edit] > Limit listeners > > The new patch introduces settings.listenAllEvents, if set to True it will > listen to all the useless events, except for "mouse:*" and whatever is in > settings.ignoredEventsList. > > This should address both of the issues you brought up, Will. > Looking really good. The one problem area I see is this: + if e.type in settings.ignoredEventsList: + return # We ignore defunct objects. # if e.type.startswith("object:state-changed:defunct"): Around this part of the code, there are several blocks that toss out events: if e.type.startswith("object:state-changed:defunct"): ... if e.type.startswith("object:property-change:accessible-parent"): ... if e.type.startswith("object:children-changed:remove") \ and (e.source != self.registry.getDesktop(0)): The general pattern is that we do this if we want to toss an event: if settings.debugEventQueue: self._enqueueEventCount -= 1 return So, unless you want to try to glom all the blocks into one just to be perty, the above change probably could just be: + if e.type in settings.ignoredEventsList: + if settings.debugEventQueue: + self._enqueueEventCount -= 1 + return In looking at the way all events are registered (via pyatspi.EVENT_TREE), I'm still curious if we can just prevent some events from being listened for at all in settings.listenAllEvents=True mode if they exist in settings.ignoredEventsList. It *seems* like traversing pyatspi.EVENT_TREE.values() might help us here. The only risk is if pyatspi.EVENT_TREE is incomplete. If that's the case, there might be the potential for us not to get events that we expect to get. In any case, things look fine as they are and we can get fancier if the performance ends up sucking rocks if listenAllEvents=True.
(In reply to comment #13) > It *seems* like traversing > pyatspi.EVENT_TREE.values() might help us here. The only risk is if > pyatspi.EVENT_TREE is incomplete. If that's the case, there might be the > potential for us not to get events that we expect to get. Exactly, the main problem is my view is that we could say have a blacklist with "object:bounds-changed" in it. With pyatspi.EVENT_TREE we could register all the leaf event types except for "object:bounds-changed". But what if we have in our blacklist "object:state-changed:stale"? All of the subtypes of object:state-changed are not in EVENT_TREE, so there is no way of registering other "object:state-changed" events without registering "object:state-changed:stale".
Created attachment 99921 [details] [review] Limit listeners This adds the stuff Will wanted above (the debug things).
Created attachment 99922 [details] [review] Limit listeners Got rid of conflict in gedit.py.
(In reply to comment #16) > Created an attachment (id=99922) [edit] > Limit listeners > > Got rid of conflict in gedit.py. > Looks good to me. Mike - please test. Eitan, please also run the regression tests and look for unexpected failures. If this is a good fix that boosts performance, we should try to get it in for 2.21.3. Thanks!
Created attachment 100201 [details] [review] Limit listeners This fixes expand/collapsed and checked/unchecked events.
After a few hours of testing this patch seems to be working pretty well. I think for non-mozilla applications we are now getting near a pre-pyat-spi state. One thing to perhaps try is the following: 1. Open up openoffice writer so that you have a blank document showing. 2. Press CTRL+o for the file open dialog. This action still seems to me to be just a bit slower. I'll leave my name on this bug and keep testing.
Thanks Mike for pointing out that hotspot. The Open File dialog is the same dialog used in gedit, and it seems to be slow there too, could you confirm that? From my experience I noticed that most openoffice dialogs, besides the open file dialog, are pretty quick, for example the spellcheck and printing dialog. Is the slugishness you are experiencing only with the Open File dialog, or with all openoffice dialogs?
It probably just seems a bit slower because of the overhead of office. It is probably a good idea to check this patch in so the community can test it as well.
Eitan, if you're confident with your testing of this fix, and it pylints OK (try running ./run_pylint.sh from the top level directory -- it will automatically pylint files you've modified/added -- everything should be a 10.00 before you check the files in), I say check it in.
Depending on the number of files in your home directory, the Open dialog will be slower. I have 16 files, and if takes 1.5 seconds for the Open dialog to open, also in pre-pyatspi this dialog feels heavy on Orca, but it's a couple tenths of a second faster. On my computer I get about 200 events when this dialog appears, as opposed to about 20 when a Print dialog appears. Will, I'll submit this patch now (after pylint).
Committed. The pylint was not perfect, it seems that default.py and OpenOffice.py have issues not related to this patch.
> The pylint was not perfect, it seems that default.py and OpenOffice.py have > issues not related to this patch. Very strange. They are 10.00 for me. richb@richb-laptop:~/gnome/orca/trunk$ ./run_pylint.sh src/orca/default.py Thank you for your attention to quality Checking default.py, sending output to default.pylint Your code has been rated at 10.00 richb@richb-laptop:~/gnome/orca/trunk$ ./run_pylint.sh src/orca/scripts/StarOffice.py Thank you for your attention to quality Checking scripts/StarOffice.py, sending output to StarOffice.pylint Your code has been rated at 10.00 richb@richb-laptop:~/gnome/orca/trunk$
(In reply to comment #25) > > The pylint was not perfect, it seems that default.py and OpenOffice.py have > > issues not related to this patch. > > Very strange. They are 10.00 for me. They are good for me, too. Eitan, what are the issues? PS - I'm pysched to get performance back to the pre-pyatspi days! Yeah! From here on, Mike is going to help us find specific slow areas that we can concentrate on. For one more performance opportunity before closing out this bug, we might consider changing the debug support to not make AT-SPI calls if we're guaranteed that there will be no related debug output for a given debug level. That should hopefully give us a nice performance boost.
> For one more performance opportunity before closing out this bug, we might > consider changing the debug support to not make AT-SPI calls if we're > guaranteed that there will be no related debug output for a given debug level. > That should hopefully give us a nice performance boost. Turns out we are already doing this. :-) I'm going to leave this bug open as a means to quickly find all the performance problems, but keep it assigned to orca-maint.
Out of curiosity, I ran some tests to compare where we are at the bleeding-edge moment with where we were before the line-related enhancements in Gecko.py. With no further ado: Revision 3387 (pre-Gecko performance) vs Current Orca plus latest patch to Bug 506360 (rev.4) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Reading the Wiki by Line ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4166839 function calls in 201.149 CPU seconds 958015 function calls in 139.656 CPU seconds ----------------------------------------------------------------- getLineContentsAtOffset ncalls tottime percall cumtime percall 372 0.956 0.003 50.363 0.135 148 0.007 0.000 4.011 0.027 ----------------------------------------------------------------- goPreviousLine ncalls tottime percall cumtime percall 66 0.014 0.000 43.315 0.656 72 0.005 0.000 13.542 0.188 ----------------------------------------------------------------- goNextLine ncalls tottime percall cumtime percall 86 0.014 0.000 39.971 0.465 72 0.005 0.000 12.087 0.168 ----------------------------------------------------------------- findPreviousCaretInOrder ncalls tottime percall cumtime percall 6469/5001 0.314 0.000 12.537 0.003 588/164 0.174 0.000 3.849 0.023 ----------------------------------------------------------------- findNextCaretInOrder ncalls tottime percall cumtime percall 8688/26781 1.128 0.000 37.050 0.001 362/146 0.116 0.000 2.582 0.018 ----------------------------------------------------------------- getExtents ncalls tottime percall cumtime percall 32972 11.495 0.000 17.228 0.001 1090 1.214 0.001 2.363 0.002 ----------------------------------------------------------------- updateBraille ncalls tottime percall cumtime percall 154 0.111 0.001 22.425 0.146 148 0.162 0.001 2.600 0.018 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2. Tabbing through the Bugzilla Advanced Search Form ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3675201 function calls in 193.921 CPU seconds 548400 function calls in 96.273 CPU seconds ----------------------------------------------------------------- onFocus ncalls tottime percall cumtime percall 115 0.006 0.000 75.807 0.659 115 0.006 0.000 10.105 0.088 ----------------------------------------------------------------- locusofFocusChanged ncalls tottime percall cumtime percall 58 0.005 0.000 75.714 1.305 59 0.009 0.000 9.397 0.159 ----------------------------------------------------------------- guessTheLabel ncalls tottime percall cumtime percall 35 0.007 0.000 25.423 0.726 34 0.006 0.000 2.538 0.075
Woo woo! You rock! Thanks Joanie! (In reply to comment #28) > Out of curiosity, I ran some tests to compare where we are at the bleeding-edge > moment with where we were before the line-related enhancements in Gecko.py. > With no further ado: > > Revision 3387 (pre-Gecko performance) > vs > Current Orca plus latest patch to Bug 506360 (rev.4) > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > 1. Reading the Wiki by Line > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > 4166839 function calls in 201.149 CPU seconds > 958015 function calls in 139.656 CPU seconds > > ----------------------------------------------------------------- > getLineContentsAtOffset > > ncalls tottime percall cumtime percall > 372 0.956 0.003 50.363 0.135 > 148 0.007 0.000 4.011 0.027 > ----------------------------------------------------------------- > goPreviousLine > > ncalls tottime percall cumtime percall > 66 0.014 0.000 43.315 0.656 > 72 0.005 0.000 13.542 0.188 > ----------------------------------------------------------------- > goNextLine > > ncalls tottime percall cumtime percall > 86 0.014 0.000 39.971 0.465 > 72 0.005 0.000 12.087 0.168 > ----------------------------------------------------------------- > findPreviousCaretInOrder > > ncalls tottime percall cumtime percall > 6469/5001 0.314 0.000 12.537 0.003 > 588/164 0.174 0.000 3.849 0.023 > ----------------------------------------------------------------- > findNextCaretInOrder > > ncalls tottime percall cumtime percall > 8688/26781 1.128 0.000 37.050 0.001 > 362/146 0.116 0.000 2.582 0.018 > ----------------------------------------------------------------- > getExtents > > ncalls tottime percall cumtime percall > 32972 11.495 0.000 17.228 0.001 > 1090 1.214 0.001 2.363 0.002 > ----------------------------------------------------------------- > updateBraille > > ncalls tottime percall cumtime percall > 154 0.111 0.001 22.425 0.146 > 148 0.162 0.001 2.600 0.018 > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > 2. Tabbing through the Bugzilla Advanced Search Form > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > 3675201 function calls in 193.921 CPU seconds > 548400 function calls in 96.273 CPU seconds > > ----------------------------------------------------------------- > onFocus > > ncalls tottime percall cumtime percall > 115 0.006 0.000 75.807 0.659 > 115 0.006 0.000 10.105 0.088 > > ----------------------------------------------------------------- > locusofFocusChanged > > ncalls tottime percall cumtime percall > 58 0.005 0.000 75.714 1.305 > 59 0.009 0.000 9.397 0.159 > ----------------------------------------------------------------- > guessTheLabel > > ncalls tottime percall cumtime percall > 35 0.007 0.000 25.423 0.726 > 34 0.006 0.000 2.538 0.075 >
Lovely! This reminds me that there is a further pyatspi change to make that would speed up event handling by utilizing borrowed references. I'll try to get it in fo the next release too.
(In reply to comment #30) > Lovely! > > This reminds me that there is a further pyatspi change to make that would speed > up event handling by utilizing borrowed references. I'll try to get it in fo > the next release too. > Just a nudge to Eitan to remind him to look into this. :-)
(In reply to comment #31) > Just a nudge to Eitan to remind him to look into this. :-) > Thanks for that, I needed it. Anyway, a patch for pyatspi is in bug 502924. No fancy pyorbit versions need for this one. I can't see a difference for small Orca stuff, but it might make a difference in more intensive use cases.
Created attachment 137225 [details] [review] Patch to cache displayedText and displayedLabel computation This patch takes advantage of the script.generatorCache dictionary that is cleared before we process each new AT-SPI event. The idea is that several code paths may ask for the same computation while processing a single event. For example, speech and braille may want the same value. With the generatorCache, we can cache values in script.generatorCache and re-use them.
Committed the last patch to master and the 570658 branch. The patch tests well. Pylint is crashing on my machine again, though, so I didn't pylint. :-( BTW, this cuts the number of times the getDisplayedText and getDisplayedLabel logic is performed in more than halF: getDisplayedText is used in other areas of the code to perform other logic. Hopefully, this will result in a perceivable difference by users.
Major performance improvements have been made in Orca, the accessibility libraries, and toolkits implementing accessibility support. Work will continue, of course. But I no longer think we need a Performance "metabug".