GNOME Bugzilla – Bug 321273
Problem where dogtail stops working altogether
Last modified: 2006-06-16 23:50:28 UTC
We've been running into problems where Dogtail stops working after a while, with tracebacks like the following; a restart of the session seems to fix it: Traceback (most recent call last):
+ Trace 64018
from dogtail import tree File "/home/boston/dmalcolm/coding/dogtail/active-copy/dogtail/dogtail/tree.py", line 1013, in ? test = root.children File "/home/boston/dmalcolm/coding/dogtail/active-copy/dogtail/dogtail/tree.py", line 336, in __getattr__ children.append (Node (self.__accessible.getChildAtIndex (i))) File "/home/boston/dmalcolm/coding/dogtail/active-copy/dogtail/dogtail/tree.py", line 263, in __init__ action = self.__accessible.getAction()
AssertionError
Looks like what is happening is that a single dodgy application can currently bring this initialization down: atspi.registry.getDesktop().getName() 'main' >>> atspi.registry.getDesktop().getChildCount() 11 >>> atspi.registry.getDesktop().getChildAtIndex(0) <atspi.Application object at 0xb7f81050> >>> atspi.registry.getDesktop().getChildAtIndex(0).getName() 'gnome-terminal' >>> atspi.registry.getDesktop().getChildAtIndex(1).getName() 'metacity' >>> atspi.registry.getDesktop().getChildAtIndex(2).getName() Traceback (most recent call last):
+ Trace 64019
AssertionError >>> atspi.registry.getDesktop().getChildAtIndex(3).getName() 'gnome-panel'
etc
Could you find out which application is at index 2?
I went through all of them and there were two indexes for which checkSelf failed (2 and 4 below). I couldn't spot any apps that were missing; maybe there's some kind of stale reference? >>> import atspi >>> atspi.registry.getDesktop().getChildCount() 11 >>> atspi.registry.getDesktop().getChildAtIndex(0).getName() 'gnome-terminal' >>> atspi.registry.getDesktop().getChildAtIndex(1).getName() 'metacity' >>> atspi.registry.getDesktop().getChildAtIndex(2).getName() Traceback (most recent call last):
+ Trace 64020
AssertionError >>> atspi.registry.getDesktop().getChildAtIndex(3).getName() 'gnome-panel' >>> atspi.registry.getDesktop().getChildAtIndex(4).getName() Traceback (most recent call last):
AssertionError >>> atspi.registry.getDesktop().getChildAtIndex(5).getName() 'notification-area-applet' >>> atspi.registry.getDesktop().getChildAtIndex(6).getName() 'nautilus' >>> atspi.registry.getDesktop().getChildAtIndex(7).getName() 'evolution' >>> atspi.registry.getDesktop().getChildAtIndex(8).getName() 'epiphany' >>> atspi.registry.getDesktop().getChildAtIndex(9).getName() 'mixer_applet2' >>> atspi.registry.getDesktop().getChildAtIndex(10).getName() 'gramps' >>> atspi.registry.getDesktop().getChildAtIndex(11).getName() Traceback (most recent call last):
AssertionError >>> atspi.registry.getDesktop().getChildAtIndex(4).getName() Traceback (most recent call last):
AssertionError >>> atspi.registry.getDesktop().getChildAtIndex(4)
>>> atspi.registry.getDesktop().getChildAtIndex(4) <atspi.Accessible object at 0xb7bebb80> >>> atspi.registry.getDesktop().getChildAtIndex(2) <atspi.Accessible object at 0xb7bebb70> All of the methods on those instances seem to fail with __checkSelf assertions (meaning a underlying NULL pointer in the C level, right?) I guess we can bulletproof things against this until we figure out what the udnerlying problem is.
This is a pyspi issue.
Created attachment 54662 [details] watch-applications.py Here's a little daemon script I just hacked up that uses pyspi to print out the indices and names of all the applications it sees every N seconds, where N defaults to 10 but can be specified on the command line. It also prints the date and time. If it encounters an AssertionError, it will drop to pdb, the python debugger, in case you happen to notice it soon after it happens and want to investigate. Even if you don't use the debugger, look at the last completed list and see what name corresponds to the index that just failed. Please report your results, everyone :)
Comment on attachment 54662 [details] watch-applications.py Fixing name
*** Bug 336136 has been marked as a duplicate of this bug. ***
Here's the output: $ python watch-applications.py Fri May 5 23:26:47 2006 00 gnome-session 01 gnome-power-manager 02 metacity 03 gnome-volume-manager 04 gnome-panel 05 nautilus 06 bluez-pin 07 gedit 08 gnome-terminal 09 eggcups 10 nm-applet 11 pam-panel-icon 12 gaim Warning: AT-SPI error: pre method check: add: Unknown CORBA exception id: 'IDL:omg.org/CORBA/COMM_FAILURE:1.0' 13 > /home/marius/Desktop/watch-applications.py(38)?() -> print c.getName() (Pdb) What other information could I provide? Thanks!
Well, it's only really useful if you know which application was at #13 previously, and you haven't been opening/closing apps between those two times. For example: ... 5 some-app ... 5 <breakage> then some-app would probably be the one failing.
Created attachment 67511 [details] [review] Workaround patch to make "NULL" return codes for children from the C code return as None child accessibles to the Python code
Created attachment 67512 [details] [review] Don't add "None" children to the list when computing the children attribute of a Node
I've been using the two patches I attached (the first to pyspi, the second to dogtail) as a workaround for this bug. The first patch makes pyspi survive getting a NULL back from the atspi-registryd for a child of the desktop, and handles it by returning None back to dogtail. The second patch then handles the None by ignoring the child, and not adding anything for it to the computed list of children, when computing the "children" attribute of Node. So both patches need to be applied to fix this, and it's something of a band-aid, rather than a fix for the underlying problem. It does work, though - I've been able to use dogtail for long periods without a restart with these patches, and given how annoying this bug has been, I suggest applying the patches - perhaps with comments.
Awesome, Dave! My own session is currently in that screwed-up state. Your patches made it usable again. I'd say we're good for now. I added a brief comment to each patch, tweak the formatting and applied them. I'll post the updated patches in a second.
Created attachment 67517 [details] [review] Workaround patch to make "NULL" return codes for children from the C code return as None child accessibles to the Python code Here's the version of the pyspi patch that I just committed.
Created attachment 67518 [details] [review] Don't add "None" children to the list when computing the children attribute of a Node Here's the version of the dogtail patch that I just committed