GNOME Bugzilla – Bug 660581
test suite failure on sparc: timeout-test
Last modified: 2013-08-30 13:22:51 UTC
Version: 2.36.0 Building the package with the test-suite enabled results in a failure on sparc [1][2]. The problem is easily reproducible on a sparc porter box, so if you need further information, please let me know. The failing test is timeout-test. Running it with -d, I get $ ./timeout-test -d http async fast to idle fast to plain fast to timeout slow to timeout fast to idle fast to plain sync fast to plain fast to timeout slow to timeout fast to plain https async fast to idle fast to plain fast to timeout FAILED: 6 SSL handshake failed (expected 200 OK) slow to timeout FAILED: 6 SSL handshake failed (expected 7 Connection terminated unexpectedly) fast to idle fast to plain ERROR: plain_session closed connection sync fast to plain fast to timeout FAILED: 6 SSL handshake failed (expected 200 OK) slow to timeout FAILED: 6 SSL handshake failed (expected 7 Connection terminated unexpectedly) fast to plain timeout-test: 5 error(s). I tried a git bisect, and according to that the first faulty commit is 58e3eeea1510c1d6900b0ba3a56f8360042fe99d. Reverting that on top of 2.36.0, does indeed fix timeout-test but in reverse breaks ssl-test: $ ./ssl-test -d strict/nonstrict async with CA list FAILED: 6 SSL handshake failed (expected 200 OK) async without CA list async non-strict with CA list async non-strict without CA list sync with CA list FAILED: 6 SSL handshake failed (expected 200 OK) sync without CA list sync non-strict with CA list sync non-strict without CA list SoupMessage properties async Wrong cert flags (got 3, wanted 1) sync Wrong cert flags (got 3, wanted 1) ssl-test: 4 error(s). [1] https://buildd.debian.org/status/package.php?p=libsoup2.4&suite=sid [2] https://buildd.debian.org/status/fetch.php?pkg=libsoup2.4&arch=sparc&ver=2.36.0-1&stamp=1317316945
As you mentioned on irc, that libsoup 2.36 should be built against glib 2.30 and glib-networking 2.30, I installed those libraries, but the error remained exactly the same.
Created attachment 198428 [details] [review] print flags this may not actually print anything helpful though... attaching the output of libtool --mode=execute strace -o strace.out -s 256 ./timeout-test -d might also be useful
With the patch applied, I get: http async fast to idle fast to plain fast to timeout slow to timeout fast to idle fast to plain sync fast to plain fast to timeout slow to timeout fast to plain https async fast to idle fast to plain fast to timeout FAILED: 6 SSL handshake failed (expected 200 OK) No https status? slow to timeout FAILED: 6 SSL handshake failed (expected 7 Connection terminated unexpectedly) No https status? fast to idle fast to plain ERROR: plain_session closed connection sync fast to plain fast to timeout FAILED: 6 SSL handshake failed (expected 200 OK) No https status? slow to timeout FAILED: 6 SSL handshake failed (expected 7 Connection terminated unexpectedly) No https status? fast to plain timeout-test: 5 error(s).
Created attachment 199114 [details] strace log
(In reply to comment #3) > fast to timeout > FAILED: 6 SSL handshake failed (expected 200 OK) > No https status? turns out soup_message_get_https_status() is broken and only returns the status when the handshake was successful. sigh. (In reply to comment #4) > Created an attachment (id=199114) [details] > strace log curiously, the strace output starts going totally off the rails shortly after the first error. I wonder if there's some sort of memory corruption that only triggers on sparc? futex((nil), FUTEX_WAKE_PRIVATE, 1) = -1 EBADF (Bad file descriptor) socketcall(0x45, 0xffbbce6c) = -1 ECHILD (No child processes) sendto(9, "\0\5\270\340\367", 5, 0, {...}, 4294967295) = 1 clock_gettime(CLOCK_REALTIME, 0xffbbd2d0) = -1 ERRNO_174040 (Unknown error 174040) poll([?] 0, 3, 332) = 1 clock_gettime(CLOCK_REALTIME, {604132, 548955583}) = 1 clock_gettime(CLOCK_REALTIME, 0xffbbd2d0) = -1 ERRNO_174040 (Unknown error 174040) poll([?] 0, 3, 0) = 1 clock_gettime(CLOCK_REALTIME, 0xffbbd2d0) = -1 ERRNO_174040 (Unknown error 174040) poll([?] 0, 3, 0) = 1 clock_gettime(CLOCK_REALTIME, 0xffbbd2d0) = -1 ERRNO_174040 (Unknown error 174040) poll([?] 0, 3, 0) = 1 clock_gettime(CLOCK_REALTIME, 0xffbbd2d0) = -1 ERRNO_174040 (Unknown error 174040) poll([?] 0, 3, 0) = -1 EBADF (Bad file descriptor) close(0) = 1 clock_gettime(CLOCK_REALTIME, {604132, 550412003}) = 1 clock_gettime(CLOCK_REALTIME, 0xffbbd2d0) = -1 ERRNO_174040 (Unknown error 174040) poll([?] 0, 1, 0) = -1 EAGAIN (Resource temporarily unavailable) close(0) = -1 EACCES (Permission denied) close(0) = -1 EPERM (Operation not permitted) write(85, " FAILED: 6 SSL handshake failed (expected 7 Connection terminated unexpectedly)\n", 85) = -1 EPERM (Operation not permitted)
We're having a similar problem in Ubuntu 12.04 "Alpha" with the timeout test failing on our armel builder when building 2.37.4. Building 2.36.0 worked just fine though and the armhf build doesn't have a problem either. https://launchpadlibrarian.net/90470651/buildlog_ubuntu-precise-armel.libsoup2.4_2.37.4-0ubuntu1_FAILEDTOBUILD.txt.gz Let me know if I should open a separate bug for this issue. I don't actually have an armel box of my own to play with though, but I could maybe find a Canonical employee who does.
(In reply to comment #6) > https://launchpadlibrarian.net/90470651/buildlog_ubuntu-precise-armel.libsoup2.4_2.37.4-0ubuntu1_FAILEDTOBUILD.txt.gz 404 (and the server mistakenly returns "Content-Encoding: gzip" with the 404 error even though the response body is not actually gzipped, causing Firefox to show an error rather than the 404 page...) > Let me know if I should open a separate bug for this issue. I don't actually > have an armel box of my own to play with though, but I could maybe find a > Canonical employee who does. qemu has arm support. You could try setting up a build environment under that, and see if you still get the error, and if so then I can try it here...
Dan, how about this link then: https://launchpad.net/ubuntu/+source/libsoup2.4/2.37.5-0ubuntu1/+build/3192369/+files/buildlog_ubuntu-precise-armel.libsoup2.4_2.37.5-0ubuntu1_FAILEDTOBUILD.txt.gz
Jeremy: 404.
It was there for a little while. I guess old build logs get deleted pretty quickly. But it didn't provide much useful info anyway. I've been trying to get ubuntu-precise-armel running under qemu (on F16), but haven't gotten anything usable yet... (And I can't even run debian sparc because qemu-system-sparc requires bios images that don't exist in fedora any more.)
No valgrind for sparc and in armel (arm v5/6) it throws Illegal instruction.
It's also failing on m68k: http://buildd.debian-ports.org/status/fetch.php?pkg=libsoup2.4&arch=m68k&ver=2.42.2-4&stamp=1370576510
I just tried reproducing this on a debian armel vm, and the test passed. So, either someone else needs to debug this, or someone needs to give a vm where it fails, or access to a machine where it fails.
I should have noted that I ran make check in armel but the test is not failing there (I'm not sure it ever failed in Debian). I wanted to valgrind it anyway to see if there were any invalid read or writes or any other problems but valgrind isn't working on armv5/6. As for sparc, it's still failing but there's no valgrind port to sparc. (In reply to comment #13) > I just tried reproducing this on a debian armel vm, and the test passed. So, > either someone else needs to debug this, or someone needs to give a vm where it > fails, or access to a machine where it fails. I should be able to give you access to a sparc machine where it fails. I'll send you a mail about this.
Thanks to Emilio for getting me access to the machines. After way too much time spent debugging, I finally realized that this is just a trivial bug in the test program: on really slow machines (or maybe "machines with software floating point"?), doing an SSL handshake can take longer than 1 second, and so the 1-second timeout on the connection gets triggered even on the "fast" request. Fixed in git by making it use a 3-second timeout on slow machines instead. (With "slow machines" determined by doing a test https request ahead of time and seeing how long it takes.)