Bug 660319 – ThreadExecutor interruption tests fail sporadically

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 660319 - ThreadExecutor interruption tests fail sporadically


Summary:	ThreadExecutor interruption tests fail sporadically


Status:	RESOLVED FIXED

Product:	reinteract
Classification:	Other
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	reinteract-maint
QA Contact:	reinteract-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-09-28 03:33 UTC by Owen Taylor
Modified:	2011-11-01 02:59 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Owen Taylor 2011-09-28 03:33:19 UTC

[ Originally opened by Kai Willadsen as http://www.reinteract.org/trac/ticket/63 ]

On my machine (a standard up-to-date install of Fedora 10) the ThreadExecutor? tests fail approximately 4 times out of 10 with:

Traceback (most recent call last):

+ Trace 228613

File "lib/reinteract/thread_executor.py", line 315 in <module>
```
("z = 1", Statement.COMPILE_SUCCESS, None)
```
File "lib/reinteract/thread_executor.py", line 283 in test_execute
```
raise AssertionError("Interrupting ThreadExecutor failed")
```

AssertionError: Interrupting ThreadExecutor failed


Occasionally the following line is appended:

User defined signal 1

If the tests pass, they pass within just over one second. When the tests fail, they take different lengths of time to fail, typically between 2 and 8 seconds. 

12/28/08 10:18:52 changed by otaylor
====================================

I'm quite mystified by this ... the test is absolutely reliable on this Fedora 10 system. I guess it's some sort of timing issue. This is a Pentium M 1.7GHz. When I get back after the holidays I'll do some tests on my other system, which is a faster dual-core x86_64 machine. Would you characterize your machine as especially fast or slow?

From reading the back trace, it looks like the failure is occurring in the 'for x in xrange(0,100000000):' test, which is even more surprising to me, since I would expect that to be quite reliable and the syscall test to be potentially less reliable.

(I don't see any reason for unreliability when rereading the relevant bits of the Python codebase.)

(in reply to: ↑ 1 ) 12/28/08 11:39:33 changed by kaiw
=====================================================

Replying to otaylor:

    I'm quite mystified by this ... the test is absolutely reliable on this Fedora 10 system.

As it turns out, I had pygtk2 from updates-testing installed, but downgrading made no difference.

    I guess it's some sort of timing issue. This is a Pentium M 1.7GHz. When I get back after the holidays I'll do some tests on my other system, which is a faster dual-core x86_64 machine. Would you characterize your machine as especially fast or slow?

It's a first-gen Macbook (dual-core 32 bit intel 1.83GHz) so not particularly slow, but not speedy by today's standards

    From reading the back trace, it looks like the failure is occurring in the 'for x in xrange(0,100000000):' test, which is even more surprising to me, since I would expect that to be quite reliable and the syscall test to be potentially less reliable.

Commenting out the xrange test, the syscall test passes very reliably for me, without a single interruption failure. I did once get it to fail with:

Traceback (most recent call last):
  File "lib/reinteract/thread_executor.py", line 326, in <module>
    ("z = 1", Statement.COMPILE_SUCCESS, None)
  File "lib/reinteract/thread_executor.py", line 288, in test_execute
    assert_equals(s._got_state, s._expected_state)
  File "/home/kaiw/Hacking/reinteract/reinteract/lib/reinteract/test_utils.py", line 11, in assert_equals
    raise AssertionError("Got %r, expected %r" % (result, expected))
AssertionError: Got 4, expected 6

but that looks like a completely different problem.

03/27/09 20:47:45 changed by lamby
==================================

If it helps, I am reliably seeing (what appears to be) the same failure with the xrange(0,100000000) test. I am running Debian's Python 2.5.4 on a Q6600 quad-core machine.

The test actually fails with the pthread_kill call causing the process to segfault. However, a quick glance of the code suggests you are guarding this call in the usual way, so I'm not sure what's going on.

I can also reproduce the problem inside the normal workbook by running some long xrange and trying to interrupt it; naturally, the SIGSEGV takes out the entire program.

Comment 1 Robert Schroll 2011-09-28 05:14:25 UTC

When I run 'python thread_ececutor.py', execution seems to hang.  Whenever, I kill it, I seem to be in the same place:

^CTraceback (most recent call last):

+ Trace 228614

File "thread_executor.py", line 342 in <module>
```
("z = 1", Statement.COMPILE_SUCCESS, None)
```
File "thread_executor.py", line 304 in test_execute
```
loop.run() KeyboardInterrupt
```


(This is on Ubuntu 10.04 32 bit.)  This seems to be in the same (sort of) test that was producing the problem before.

Comment 2 Owen Taylor 2011-10-30 22:23:17 UTC

Found two problems:

commit f13ffab18cd314a2f8901d137f5f94c25fd9f6b5
Author: Owen W. Taylor <otaylor@fishsoup.net>
Date:   Sun Oct 30 18:18:15 2011 -0400

    thread_executor: avoid creating long integers
    
    Repeated multiplication by 2 triggers Python long integers, and
    eventually that will result in the execution thread holding the
    GIL for execessive amounts of time.

This is I think what Kai was hitting.

commit a725b22ac46ce48e77b96649fe8f93cb772ea63f
Author: Owen W. Taylor <otaylor@fishsoup.net>
Date:   Sun Oct 30 18:17:13 2011 -0400

    thread_executor: avoid locking up due to a pygobject bug
    
    If we create more than one glib.MainLoop(), we will lock up on
    signal due to a pygobject bug, so create one glib.Mainloop() and
    use it for all the tests.

This resulted in the test never working for recent pygobject.

Things seem to be working well for me with the test with these two fixes - will close this bug now.

Comment 3 Robert Schroll 2011-11-01 02:59:47 UTC

I can confirm that this fixes the issues on both Ubuntu 10.04 32 bit (hanging) and 64 bit (AssertionError).