After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 660581 - test suite failure on sparc: timeout-test
test suite failure on sparc: timeout-test
Status: RESOLVED FIXED
Product: libsoup
Classification: Core
Component: Misc
2.36.x
Other Linux
: Normal major
: ---
Assigned To: libsoup-maint@gnome.bugs
libsoup-maint@gnome.bugs
Depends on:
Blocks:
 
 
Reported: 2011-09-30 17:17 UTC by Michael Biebl
Modified: 2013-08-30 13:22 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
print flags (1.18 KB, patch)
2011-10-06 12:52 UTC, Dan Winship
none Details | Review
strace log (272.95 KB, application/octet-stream)
2011-10-16 11:07 UTC, Michael Biebl
  Details

Description Michael Biebl 2011-09-30 17:17:42 UTC
Version: 2.36.0

Building the package with the test-suite enabled results in a failure on sparc [1][2].

The problem is easily reproducible on a sparc porter box, so if you need further
information, please let me know.
The failing test is timeout-test. Running it with -d, I get

$ ./timeout-test -d
http
  async
    fast to idle
    fast to plain
    fast to timeout
    slow to timeout
    fast to idle
    fast to plain
  sync
    fast to plain
    fast to timeout
    slow to timeout
    fast to plain

https
  async
    fast to idle
    fast to plain
    fast to timeout
      FAILED: 6 SSL handshake failed (expected 200 OK)
    slow to timeout
      FAILED: 6 SSL handshake failed (expected 7 Connection terminated unexpectedly)
    fast to idle
    fast to plain
      ERROR: plain_session closed connection
  sync
    fast to plain
    fast to timeout
      FAILED: 6 SSL handshake failed (expected 200 OK)
    slow to timeout
      FAILED: 6 SSL handshake failed (expected 7 Connection terminated unexpectedly)
    fast to plain

timeout-test: 5 error(s).



I tried a git bisect, and according to that the first faulty commit is
58e3eeea1510c1d6900b0ba3a56f8360042fe99d. Reverting that on top of 2.36.0, does
indeed fix timeout-test but in reverse breaks ssl-test:

$ ./ssl-test -d
strict/nonstrict
  async with CA list
      FAILED: 6 SSL handshake failed (expected 200 OK)
  async without CA list
  async non-strict with CA list
  async non-strict without CA list
  sync with CA list
      FAILED: 6 SSL handshake failed (expected 200 OK)
  sync without CA list
  sync non-strict with CA list
  sync non-strict without CA list

SoupMessage properties
  async
    Wrong cert flags (got 3, wanted 1)
  sync
    Wrong cert flags (got 3, wanted 1)

ssl-test: 4 error(s).


[1] https://buildd.debian.org/status/package.php?p=libsoup2.4&suite=sid
[2]
https://buildd.debian.org/status/fetch.php?pkg=libsoup2.4&arch=sparc&ver=2.36.0-1&stamp=1317316945
Comment 1 Michael Biebl 2011-10-05 23:10:09 UTC
As you mentioned on irc, that libsoup 2.36 should be built against glib 2.30 and glib-networking 2.30, I installed those libraries, but the error remained exactly the same.
Comment 2 Dan Winship 2011-10-06 12:52:34 UTC
Created attachment 198428 [details] [review]
print flags

this may not actually print anything helpful though...

attaching the output of

    libtool --mode=execute strace -o strace.out -s 256 ./timeout-test -d

might also be useful
Comment 3 Michael Biebl 2011-10-16 11:06:38 UTC
With the patch applied, I get:

http
  async
    fast to idle
    fast to plain
    fast to timeout
    slow to timeout
    fast to idle
    fast to plain
  sync
    fast to plain
    fast to timeout
    slow to timeout
    fast to plain

https
  async
    fast to idle
    fast to plain
    fast to timeout
      FAILED: 6 SSL handshake failed (expected 200 OK)
      No https status?
    slow to timeout
      FAILED: 6 SSL handshake failed (expected 7 Connection terminated unexpectedly)
      No https status?
    fast to idle
    fast to plain
      ERROR: plain_session closed connection
  sync
    fast to plain
    fast to timeout
      FAILED: 6 SSL handshake failed (expected 200 OK)
      No https status?
    slow to timeout
      FAILED: 6 SSL handshake failed (expected 7 Connection terminated unexpectedly)
      No https status?
    fast to plain

timeout-test: 5 error(s).
Comment 4 Michael Biebl 2011-10-16 11:07:20 UTC
Created attachment 199114 [details]
strace log
Comment 5 Dan Winship 2011-10-18 20:00:59 UTC
(In reply to comment #3)
>     fast to timeout
>       FAILED: 6 SSL handshake failed (expected 200 OK)
>       No https status?

turns out soup_message_get_https_status() is broken and only returns the status when the handshake was successful. sigh.

(In reply to comment #4)
> Created an attachment (id=199114) [details]
> strace log

curiously, the strace output starts going totally off the rails shortly after the first error. I wonder if there's some sort of memory corruption that only triggers on sparc?

futex((nil), FUTEX_WAKE_PRIVATE, 1)     = -1 EBADF (Bad file descriptor)
socketcall(0x45, 0xffbbce6c)            = -1 ECHILD (No child processes)
sendto(9, "\0\5\270\340\367", 5, 0, {...}, 4294967295) = 1
clock_gettime(CLOCK_REALTIME, 0xffbbd2d0) = -1 ERRNO_174040 (Unknown error 174040)
poll([?] 0, 3, 332)                     = 1
clock_gettime(CLOCK_REALTIME, {604132, 548955583}) = 1
clock_gettime(CLOCK_REALTIME, 0xffbbd2d0) = -1 ERRNO_174040 (Unknown error 174040)
poll([?] 0, 3, 0)                       = 1
clock_gettime(CLOCK_REALTIME, 0xffbbd2d0) = -1 ERRNO_174040 (Unknown error 174040)
poll([?] 0, 3, 0)                       = 1
clock_gettime(CLOCK_REALTIME, 0xffbbd2d0) = -1 ERRNO_174040 (Unknown error 174040)
poll([?] 0, 3, 0)                       = 1
clock_gettime(CLOCK_REALTIME, 0xffbbd2d0) = -1 ERRNO_174040 (Unknown error 174040)
poll([?] 0, 3, 0)                       = -1 EBADF (Bad file descriptor)
close(0)                                = 1
clock_gettime(CLOCK_REALTIME, {604132, 550412003}) = 1
clock_gettime(CLOCK_REALTIME, 0xffbbd2d0) = -1 ERRNO_174040 (Unknown error 174040)
poll([?] 0, 1, 0)                       = -1 EAGAIN (Resource temporarily unavailable)
close(0)                                = -1 EACCES (Permission denied)
close(0)                                = -1 EPERM (Operation not permitted)
write(85, "      FAILED: 6 SSL handshake failed (expected 7 Connection terminated unexpectedly)\n", 85) = -1 EPERM (Operation not permitted)
Comment 6 Jeremy Bicha 2012-01-31 03:47:42 UTC
We're having a similar problem in Ubuntu 12.04 "Alpha" with the timeout test failing on our armel builder when building 2.37.4. Building 2.36.0 worked just fine though and the armhf build doesn't have a problem either.

https://launchpadlibrarian.net/90470651/buildlog_ubuntu-precise-armel.libsoup2.4_2.37.4-0ubuntu1_FAILEDTOBUILD.txt.gz

Let me know if I should open a separate bug for this issue. I don't actually have an armel box of my own to play with though, but I could maybe find a Canonical employee who does.
Comment 7 Dan Winship 2012-01-31 13:44:29 UTC
(In reply to comment #6)
> https://launchpadlibrarian.net/90470651/buildlog_ubuntu-precise-armel.libsoup2.4_2.37.4-0ubuntu1_FAILEDTOBUILD.txt.gz

404 (and the server mistakenly returns "Content-Encoding: gzip" with the 404 error even though the response body is not actually gzipped, causing Firefox to show an error rather than the 404 page...)

> Let me know if I should open a separate bug for this issue. I don't actually
> have an armel box of my own to play with though, but I could maybe find a
> Canonical employee who does.

qemu has arm support. You could try setting up a build environment under that, and see if you still get the error, and if so then I can try it here...
Comment 9 Jordi Mallach 2012-02-23 23:28:11 UTC
Jeremy: 404.
Comment 10 Dan Winship 2012-02-24 00:02:26 UTC
It was there for a little while. I guess old build logs get deleted pretty quickly. But it didn't provide much useful info anyway.

I've been trying to get ubuntu-precise-armel running under qemu (on F16), but haven't gotten anything usable yet... (And I can't even run debian sparc because qemu-system-sparc requires bios images that don't exist in fedora any more.)
Comment 11 Emilio Pozuelo Monfort 2013-06-07 20:14:11 UTC
No valgrind for sparc and in armel (arm v5/6) it throws Illegal instruction.
Comment 12 Emilio Pozuelo Monfort 2013-06-07 21:10:38 UTC
It's also failing on m68k:

http://buildd.debian-ports.org/status/fetch.php?pkg=libsoup2.4&arch=m68k&ver=2.42.2-4&stamp=1370576510
Comment 13 Dan Winship 2013-06-09 18:58:33 UTC
I just tried reproducing this on a debian armel vm, and the test passed. So, either someone else needs to debug this, or someone needs to give a vm where it fails, or access to a machine where it fails.
Comment 14 Emilio Pozuelo Monfort 2013-06-09 20:04:02 UTC
I should have noted that I ran make check in armel but the test is not failing there (I'm not sure it ever failed in Debian). I wanted to valgrind it anyway to see if there were any invalid read or writes or any other problems but valgrind isn't working on armv5/6.

As for sparc, it's still failing but there's no valgrind port to sparc.

(In reply to comment #13)
> I just tried reproducing this on a debian armel vm, and the test passed. So,
> either someone else needs to debug this, or someone needs to give a vm where it
> fails, or access to a machine where it fails.

I should be able to give you access to a sparc machine where it fails. I'll send you a mail about this.
Comment 15 Dan Winship 2013-08-30 13:22:51 UTC
Thanks to Emilio for getting me access to the machines.

After way too much time spent debugging, I finally realized that this is just a trivial bug in the test program: on really slow machines (or maybe "machines with software floating point"?), doing an SSL handshake can take longer than 1 second, and so the 1-second timeout on the connection gets triggered even on the "fast" request.

Fixed in git by making it use a 3-second timeout on slow machines instead. (With "slow machines" determined by doing a test https request ahead of time and seeing how long it takes.)