GNOME Bugzilla – Bug 631525
libsoup 2.32.0 sometimes fail to load pages
Last modified: 2010-11-28 15:02:35 UTC
libsoup 2.32.0 with libwebkit 1.2.4 causes some pages not to be loaded properly rendering only a blank page, hitting enter on the address bar or refreshing the page works. I'm not behind a proxy and this is not http-https related, happens on both. This problem usually appears after php search scripts (ie, forums) or even bugtrackers etc. I've downgraded only libsoup from 2.32 to 2.32.2 and the problem was gone.
Sry, downgraded from 2.32 to 2.30.2...
need either a somewhat-reliable test case or more detailed information about what's going wrong (eg, tcpdumps maybe), or a bisection down to a specific commit
Bisect came down to this: dc6395ccdb50e930bf71cd789bcb06e9b25aec44 is the first bad commit commit dc6395ccdb50e930bf71cd789bcb06e9b25aec44 Author: Dan Winship <danw@gnome.org> Date: Sat May 29 16:37:34 2010 +0200 Add SoupMessageQueueItemState, remove SoupMessageIOStatus SoupMessageIOStatus was always really more about the session than the message. (SoupServer I/O didn't use it at all.) Replace it with a new SoupMessageQueueItemState, on the queue item rather than the message. I can't give you reliable test case because the test isn't reliable at all and I can't reproduce it 100% (sometimes works, sometimes doesn't). For example: Pressing the View new posts calls this http://www.ubuntu-rs.org/forum/search.php?action=getnew Then it waits a lil' while and then tries to open this http://www.ubuntu-rs.org/forum/search.php?action=results&sid=1148e250d65d704164ca04582eb08859 and that leaves me with blank page, if I hit enter on the address bar it gets loaded. Or this on the Arch Forums https://bbs.archlinux.org/search.php?action=show_new
Caveat: i did not try to downgrade or anything like it, so it's possible that is a different problem. Here is what you wrote in the archlinux thread at https://bbs.archlinux.org/viewtopic.php?pid=838666#p838666 (slightly edited): Since i upgraded to libsoup 2.32.0, i also get blank pages that i need to refresh manually with midori. I did catch one instance with tcpdump and it turns out that what happened, in this specific case, seems to be this: 1/ libsoup send a GET request through a connection that has seen no traffic since 4mins or so (but is still theorically alive, i.e no "Connection: close" from either end or any other indication that the connection is dead). 2/ the remote end does not like this and answer with a tcp reset (it SHOULD NOT, i guess, but there are reasons/excuses for this behaviour). 3/ then, libsoup gives up instead of doing what it MUST[1], that is: automatically try to open a new connection to send the failed request. In that case the remote end was 66.249.92.104, which is www.google.com as resolved by the local isp. And, indeed, i see a lot of blank pages with my google searches, lately. I still have the tcpdump capture btw, but i would have to check/clean it a bit before putting it on a public place. [1] actually, i am not absolutely certain whose responsability it is to retry the request in a new connection (libsoup/webkit/midori?). But i am pretty sure it's not that of the human operator.
Created attachment 174212 [details] [review] Patch that could fix the bug As discussed on IRC, this bug is probably caused by the fact that messages in STARTING state that get a socket EOF (most likely because the server closes a reused active connection) are not restarted anymore since the asynchronous session is implemented using a state machine. This patch restores the previous behaviour, i.e, those messages are restarted.
I applied Sergio's patch on top of the 2.32.0 tarball (building with the Arch Linux PKGBUILD). It seems to fix the problem i had with my google searches often returning a blank page in midori, as well as a similar problem with google reader (though it didn't manifest as blank pages, there). tcpdump confirms that when reusing an old connection and getting a tcp reset, libsoup will now reissue the request on a new connection.
Thanks for figuring out the source of the problem, Sergio. The fix was even simpler than what you had.
Created attachment 174215 [details] [review] soup-message-io: fix retry-after-unexpected-connection-close When sending a request on a previously-used connection, we have to deal with the possibility of the server deciding to time out the connection right as we start sending data (which sounds like a crazy race condition, but is in fact pretty much standard behavior). This got broken in the connection/session reorg earlier in the year. Fix it. Also, add a test to misc-test for this. Based on patch from Sergio Villar.
Just tested the patch, works great, thanks.
(In reply to comment #7) > Thanks for figuring out the source of the problem, Sergio. The fix > was even simpler than what you had. Oh yeah, all the conditions of that if seemed to be saying "the problem is here, the problem is here" :)
However, misc-test sometimes (or often) fails now. Try to run this test several times and it exactly fails. tests]$ ./misc-test -d Host handling Callback unref handling SoupMessage reuse First message Redirect message Auth message Last message OPTIONS * Testing with no handler Testing with handler Abort with pending connection Invalid Content-Length framing tests Content-Length larger than message body length Server claims 'Connection: close' but doesn't Automatic Accept-Language processing LANGUAGE=C LANGUAGE=fr_FR LANGUAGE=fr_FR:de:en_US Unexpected timing out of persistent connections Async session First message Second message Message was not retried after disconnect Sync session First message Second message Message was not retried after disconnect misc-test: 2 error(s).
Hm... weird. That particular set of errors just indicates a bug in misc-test itself though; for some reason it's not closing the connection between the two messages, and so it's not managing to test the retry-after-connection-is-closed fix. So libsoup itself is still (presumably) fixed, there's just a bug in my test case for it. I'll try to figure out what's going on. (It works correctly for me.) Running: libtool --mode=execute strace -f -s 2048 -o strace.txt ./misc-test -d -d and then attaching the "strace.txt" file might help me figure it out.
Created attachment 174753 [details] "libtool --mode=execute strace -f -s 2048 -o strace.txt ./misc-test -d -d" output
Created attachment 174775 [details] [review] misc-test: fix up the new persistent connection timeout test Can you try this patch?
Created attachment 174778 [details] "libtool --mode=execute strace -f -s 2048 -o strace.txt ./misc-test -d -d" output Patched misc-test also sometimes fails.
there's no strace output in that attachment
Created attachment 174779 [details] "libtool --mode=execute strace -f -s 2048 -o strace.txt ./misc-test -d -d" output Sorry, see new attachment
Created attachment 174799 [details] [review] misc-test: fix up the new persistent connection timeout test Take 3
Yes, 100 times misc-test passed
Comment on attachment 174799 [details] [review] misc-test: fix up the new persistent connection timeout test woo. thanks Attachment 174799 [details] pushed as 81d7447 - misc-test: fix up the new persistent connection timeout test