GNOME Bugzilla – Bug 342614
Hang when using Python, ORBit, Bonobo, and AT-SPI
Last modified: 2006-08-18 10:40:49 UTC
I need some help trying to track down some random hangs in Orca. I've been trying to debug these by analyzing gdb traces and such. These have led little insight, except to say "yep, looks like someone's waiting to get a lock". I was lucky to catch an actual hang and was able to get a stack trace in the debugger. Here's a snippet:
+ Trace 68374
I'm curious if anyone might be able to lend some insight here - our users get very frustrated when the entire desktop hangs and goes silent. :-(
Created attachment 66014 [details] Standalone application that demonstrates the problem Run this application in an xterm. Start a gnome-terminal. Click on the file menu of the gnome-terminal and you will see the hang.
Willie: You need to call gobject.threads_init() if you're going to use threads in an application which uses gobject or gtk. It should be called before doing any other gobject/gtk calls. PyORBit might not be aware of the so called ThreadState API present in Python 2.3 or newer (pygobject/pygtk uses it), it may need to be updated to take that into account, which shouldn't be too difficult to do.
Johan: Thanks! In the code, there are various calls: #gobject.threads_init() #gtk.gdk.threads_init() #gtk.gdk.threads_enter() bonobo.main() #gtk.gdk.threads_leave() I've tried various combinations of {gobject,gtk.gdk}.threads_init() and could not get rid of the hang. :-( I've also tried calling bonobo.activate() followed by gtk.main() instead of bonobo.main(), but that didn't seem to help, either. :-(
What is the pyorbit version? And python version?
Sorry about that: Python 2.4.3 libbonobo, libbonoboui, ORBit2, and pyorbit all from GNOME CVS HEAD Running on Ubuntu Dapper Drake.
I haven't yet been able to test. It seems to require accessibility to be active in gnome settings, maybe next login I'll test. But I can make a few comments: 1- In your first stack trace, it looks like orbit2 is locking, during a remote call from pyorbit; Take a look at [1], you probably need to initialize the ORB in threaded mode; 2- In the second stack trace it looks like a normal python locking when calling threading.Lock.acquire(); I don't see any such call in your test program, but it's a fact... 3- Johan's comments about "ThreadState API" are not an issue since pyorbit 2.14. [1] http://mail.gnome.org/archives/orbit-list/2005-December/msg00001.html
Thanks for the feedback on the first stack trace - it is the worst of our problems and I've not been able to come up with a small test case to show it. I've added the "orbit-io-thread" to initialize the ORB, and it seems to have helped. Now that I have a string I can latch on to and do some searching with, I googled and grep'd. I came across: http://mail.gnome.org/archives/orbit-list/2005-December/msg00004.html and this in gnome-python/pyorbit/examples/threaded/thread-hints-server.py: thread_policy = rootpoa.create_thread_policy(PortableServer.ORB_CTRL_MODEL); global threaded_poa threaded_poa = rootpoa.create_POA("ThreadedPOA", rootpoa.the_POAManager, [thread_policy]) threaded_poa.set_thread_hint(ORBit.THREAD_HINT_PER_REQUEST) I'm not quite sure what we should do here with Orca. The nature of the work is: we will potentially get lots and lots of objects (mostly coming with each one in a separate event), as well as keystrokes (each coming in an event). We're also dealing with gnome-speech, which will also send us events with speech progress. Will the default values for the poa help us, or should we do something different as in thread-hints-server.py? Thanks again for your help. I'll keep you informed as we do more thorough testing with "orbit-io-thread" in place. Will
Created attachment 66212 [details] Smaller standalone app to show a hang Well..we're still seeing the hangs even after attempting to use "orbit-io-thread" as well as calling gobject.threads_init(). :-( Here's another application that can cause a hang. I'm hard pressed to find something that is attempting to acquire a lock in this example, but here's what I think is going on: 1) In the AT-SPI world, all applications offer themselves up as services and register themselves with the AT-SPI Registry. 2) Assistive technologies, such as the one in this attachement, contact the AT-SPI Registry and express an interest in various event types (e.g., "focus:"). 3) When an application has an event of interest, it notifies the AT-SPI Registry, which then notifies the assistive technology. I'm not sure, but I think the notifications being done in #3 are synchronous: if the assistive technology doesn't return from the notify, then the AT-SPI Registry hangs and so does anything talking to it. So...I'm guessing that somewhere within the bowels of the Python support for Bonobo or ORBit, the thing calling notifyEvent method of the EventListener is not completing. Before anything hangs, I also did a gdb attach on the threads7.py app while it was running, and I still noticed this odd thread thing doing a wait:
+ Trace 68443
Thread 1 (Thread -1210726720 (LWP 5428))
I'm not what is doing this.
Created attachment 66213 [details] Stack trace of threads7.py when hung
Created attachment 66215 [details] Stack trace of AT-SPI Registry when hung
Created attachment 66216 [details] Stack trace of gnome-about (I used this instead of gnome-terminal to cause the hang) when hung
I've reproduced the bug. Here's what I found: [Switching to thread 1 (Thread -1212843360 (LWP 9773))]#0 0xffffe410 in __kernel_vsyscall ()
+ Trace 68478
So, in your test program threads7.py, which may not be reproducing the real bug, what seems to happen is: 1. main thread starts 2. a thread is spawned which runs bonobo.main() 3. (guessing) bonobo.main() spawns yet another thread, probably because of "orbit-io-thread", to handle CORBA I/O 4. The main thread ends (because there's no more instructions left to execute) 5. the atexit module is triggered and runs the registered exit funcs 6. the threading module runs an exit func that tries to join() all threads 7. the thread that is running bonobo.main() keeps running, since it didn't receive a shutdown request.
Thanks for the analysis of this. Looks like it was a red herring, though my understanding is that non-daemon threads should just continue to run without issue. In any case, I'm learning more about the GIL and the clunky threading model of Python. I'm going to dig deeper into what assumptions we've made in teh Orca code about Python providing a reliable threading model. It may be that we need to move to a single threaded model somehow.
Well, in my tests, I didn't see anything 'unreliable' about python threading. All threads are running normally, no deadlocks, except the main thread, which is waiting for the other threads to end. I'm not sure what is the real problem, but I think it is not related to Python GIL. It could be more related to ORBit2 internals... In any case, you should know that pyorbit 2.14 also supports asynchronous calls, at least for the client side[1], although i'm not aware of an async interface for servers. [1] example: http://cvs.gnome.org/viewcvs/gnome-python/pyorbit/examples/echo/echo-client-async.py?view=markup
I'm going to close this out as not a bug. We've been able to work around these things in Orca, and it may have been more of an Orca issue than anything. Thanks!