GNOME Bugzilla – Bug 660319
ThreadExecutor interruption tests fail sporadically
Last modified: 2011-11-01 02:59:47 UTC
[ Originally opened by Kai Willadsen as http://www.reinteract.org/trac/ticket/63 ] On my machine (a standard up-to-date install of Fedora 10) the ThreadExecutor? tests fail approximately 4 times out of 10 with: Traceback (most recent call last):
+ Trace 228613
("z = 1", Statement.COMPILE_SUCCESS, None)
raise AssertionError("Interrupting ThreadExecutor failed")
Occasionally the following line is appended: User defined signal 1 If the tests pass, they pass within just over one second. When the tests fail, they take different lengths of time to fail, typically between 2 and 8 seconds. 12/28/08 10:18:52 changed by otaylor ==================================== I'm quite mystified by this ... the test is absolutely reliable on this Fedora 10 system. I guess it's some sort of timing issue. This is a Pentium M 1.7GHz. When I get back after the holidays I'll do some tests on my other system, which is a faster dual-core x86_64 machine. Would you characterize your machine as especially fast or slow? From reading the back trace, it looks like the failure is occurring in the 'for x in xrange(0,100000000):' test, which is even more surprising to me, since I would expect that to be quite reliable and the syscall test to be potentially less reliable. (I don't see any reason for unreliability when rereading the relevant bits of the Python codebase.) (in reply to: ↑ 1 ) 12/28/08 11:39:33 changed by kaiw ===================================================== Replying to otaylor: I'm quite mystified by this ... the test is absolutely reliable on this Fedora 10 system. As it turns out, I had pygtk2 from updates-testing installed, but downgrading made no difference. I guess it's some sort of timing issue. This is a Pentium M 1.7GHz. When I get back after the holidays I'll do some tests on my other system, which is a faster dual-core x86_64 machine. Would you characterize your machine as especially fast or slow? It's a first-gen Macbook (dual-core 32 bit intel 1.83GHz) so not particularly slow, but not speedy by today's standards From reading the back trace, it looks like the failure is occurring in the 'for x in xrange(0,100000000):' test, which is even more surprising to me, since I would expect that to be quite reliable and the syscall test to be potentially less reliable. Commenting out the xrange test, the syscall test passes very reliably for me, without a single interruption failure. I did once get it to fail with: Traceback (most recent call last): File "lib/reinteract/thread_executor.py", line 326, in <module> ("z = 1", Statement.COMPILE_SUCCESS, None) File "lib/reinteract/thread_executor.py", line 288, in test_execute assert_equals(s._got_state, s._expected_state) File "/home/kaiw/Hacking/reinteract/reinteract/lib/reinteract/test_utils.py", line 11, in assert_equals raise AssertionError("Got %r, expected %r" % (result, expected)) AssertionError: Got 4, expected 6 but that looks like a completely different problem. 03/27/09 20:47:45 changed by lamby ================================== If it helps, I am reliably seeing (what appears to be) the same failure with the xrange(0,100000000) test. I am running Debian's Python 2.5.4 on a Q6600 quad-core machine. The test actually fails with the pthread_kill call causing the process to segfault. However, a quick glance of the code suggests you are guarding this call in the usual way, so I'm not sure what's going on. I can also reproduce the problem inside the normal workbook by running some long xrange and trying to interrupt it; naturally, the SIGSEGV takes out the entire program.
When I run 'python thread_ececutor.py', execution seems to hang. Whenever, I kill it, I seem to be in the same place: ^CTraceback (most recent call last):
+ Trace 228614
loop.run() KeyboardInterrupt
(This is on Ubuntu 10.04 32 bit.) This seems to be in the same (sort of) test that was producing the problem before.
Found two problems: commit f13ffab18cd314a2f8901d137f5f94c25fd9f6b5 Author: Owen W. Taylor <otaylor@fishsoup.net> Date: Sun Oct 30 18:18:15 2011 -0400 thread_executor: avoid creating long integers Repeated multiplication by 2 triggers Python long integers, and eventually that will result in the execution thread holding the GIL for execessive amounts of time. This is I think what Kai was hitting. commit a725b22ac46ce48e77b96649fe8f93cb772ea63f Author: Owen W. Taylor <otaylor@fishsoup.net> Date: Sun Oct 30 18:17:13 2011 -0400 thread_executor: avoid locking up due to a pygobject bug If we create more than one glib.MainLoop(), we will lock up on signal due to a pygobject bug, so create one glib.Mainloop() and use it for all the tests. This resulted in the test never working for recent pygobject. Things seem to be working well for me with the test with these two fixes - will close this bug now.
I can confirm that this fixes the issues on both Ubuntu 10.04 32 bit (hanging) and 64 bit (AssertionError).