GNOME Bugzilla – Bug 73179
Connection Timeout Incorrectly Handled
Last modified: 2004-12-22 21:47:04 UTC
Connection timeout on any of the established connections seems to cause stopping or failure of all of the other established connections. I normally have 4 connections, and a server timeout on any of them causes all connections to timeout or fatally block. This causes the network connection to chunk badly, in turn exacerbating the problem.
I have the same problem i think. It causes another problem: Despite having pan configured to use 3 connections (1 reserved for interactive use), I get ** WARNING **: Legitimerung fehlgeschlagen: 502 Authentication failed: max sessions per user (3) exceeded from my newsserver. Apparently, pan uses more than 3 connections at a time (maybe it doesn't always close connections properly before opening a new one?)... I see connection in state FIN_WAIT2 in "netstat" output. Tested with 0.11.2.91 on RH7.2. Newsprovider news.clara.net. Please change severity to "major". Let me know if you need logs or something!
Confirmed, I'm seeing this too. Looks like Pan is somehow dropping a connection before it's fully closed.
Actually what I'm seeing is sockets in a SOCK_WAIT connection. This means that the server has closed the socket, but Pan hasn't closed yet...? To duplicate: rev up Pan to four connections doing a quick task like reading an image. After that, just do one image at a time so that one connection gets exercised but three sit idle. After a few minutes when the idle-disconnect code comes along, the sockets seem to be closed, but netstat shows that one or two are in SOCK_WAIT. Possibly-related-but-maybe-not dept: the SO_KEEPALIVE that we do on each socket seems to be misguided. SO_KEEPALIVE sends pinglets down the wire every once in awhile, but according to rfc 1122 section 4.2.3.6, the ping defaults to no less than once every two hours, which is far past our three-minute timeout. We may want to remove that code from Pan. Agree/disagree?
Regarding SO_KEEPALIVE: agreed. Regarding FIN_WAIT2 / TIMEOUT: I see a couple of issues here. Firstly, queue.c::socket_cleanup() does the following in case of an error: pan_socket_putline (socket, "QUIT\r\n"); pan_object_unref (PAN_OBJECT(socket)); So, we issue the 'quit' command and immediately close the socket, without waiting for the reply. I guess the newsserver could wait on this (now invalid) socket for a while, trying to send the reply. This could explain the 'max. sessions' errors. Unfortunately, my ISP upgraded their newsserver, and I can't reproduce this problem (so, unfortunate for this bug, good for me :-)). Secondly, I see a possible design problem: to close a session, we issue a 'QUIT' through nntp_disconnect() and then close the socket. However, the server will also close the socket upon receiving the quit. So, both ends will try to close at the same time. Not sure if that's a good idea (my TCP bible's at work, so I can't validate this right now). I've managed to get rid of the FIN_WAIT's by introducing a 50msec sleep between the nntp_disconnect() and the closing of the socket, though.
Committed on the pan-0-11-fix branch. Sven: do you use CVS ? If so, could you update to the latest versions on this branch and see if these changes improve the situation ? If you're still having problems, send in a run log. You can find instructions to do this is http://pan.rebelbase.com/bugreport.html.
John, Sven, are you still seeing this behavior in CVS?
Sorry, don't have a build environment at the moment. Can you point me to some devel RPMs for RH7.2 so I can build CVS again?
What do you need/have ? You should be able to build the pan-0-11-fix branch on a RH7.2 system without installing too many packages (I do). Are you referring to gtk2 ? You only need that to build the HEAD branch. The pan-0-11-fix branch is still using gtk+-1.2. You can check out a copy on that branch with the following command: 'cvs co -r pan-0-11-fix pan'.
From: Sven Neuhaus Date: 16 Apr 2002 13:04:40 +0200 Hi, The bugzilla for gnome is hosed this morning. I built the pan-0-11-fix branch from CVS today and it seems to have fixed the problem. Thanks! -Sven
OK, bad news.. the error occurred again a bit later. However, this time "netstat" showed only 3 connections, and all 3 in State "established". So, I'm not sure it is a pan problem or if the news server is just overly sensitive.
Any suggestions how we can nail this down properly? Is there a debug log where pan writes the timestamp of every connection opened and closed?