GNOME Bugzilla – Bug 747342
Short hardwired socket timeout prevents large emails from being sent through slow ssh tunnels
Last modified: 2015-05-26 07:22:51 UTC
SSH tunnels are sometimes useful for sending email, for example when the desired outgoing server is not directly accessible, e.g. ssh -L 25:outgoingserver:25 remote_host_near_outgoingserver. Then evolution is configured to send with 'localhost' as the email server and the mail is actually sent through the tunnel. When the SSH tunnel is fairly slow and you are sending a large email, the mail send operation fails with a "socket i/o timed out" dialog box after about a minute and a half. The problem is that the SSH tunnel has a large buffer that accepts the entire email very quickly. So evolution thinks it has sent the entire email almost instantaneously, and then starts waiting for an acknowledgement from the remote server. That acknowledgement will not come until the data has actually made its way through the tunnel. What happens in the case of a large mail, is that the wait for an acknowledgement times out, causing evolution to abandon attempting to send the mail. With typical upload speeds on residential ADSL in the US still less than 1 MBps (in some cases much less than 1 MBps), a 10MB email could easily take 100 seconds or more to upload. network_service_connect_sync() in camel/camel-network-service.c defaults the socket timeout to 90 seconds, and connect_to_server() in camel/providers/smtp/camel-smtp-transport.c does not change that default. The result is that the 10 MB email sent through tunnel the fails to send because the timeout is too short. Increasing the default timeout in camel/camel-network-service.c does seem to work around the problem.
Thanks for a bug report. I wouldn't tweak the timeout, it might not know the difference between really stuck connection and this tunneling. I mean that the real issue here is this: (In reply to Steve Holland from comment #0) > The problem is that the SSH tunnel has a large buffer that accepts the > entire email very quickly. So evolution thinks it has sent the entire email > almost instantaneously, and then starts waiting for an acknowledgement from > the remote server. That acknowledgement will not come until the data has > actually made its way through the tunnel. What happens in the case of a > large mail, is that the wait for an acknowledgement times out, causing > evolution to abandon attempting to send the mail. Thus, from my point of view, the real fix should be to change the tunnel to accept/buffer data in less size and make sure to keep the connection alive. I didn't find any direct options for this. The ssh option 'ServerAliveInterval' looks like the one of the closest, together with 'TCPKeepAlive', but if I read the documentation properly, then these options influence connection between ssh and the destination server, not the source client (in this case evolution).
Just found out this openssh FAQ: http://www.openssh.com/faq.html#2.12 which mentions ClientAliveInterval. In doesn't show in mine `man ssh_config` for some reason (OpenSSH_6.6.1p1, OpenSSL 1.0.1j-fips 15 Oct 2014). In any case, this is the right way to fix the issue.
Adjusting ClientAliveInterval does not solve the problem. ClientAliveInterval keeps the SSH connection open (that wasn't the problem to begin with) but does not directly affect the tunnel within the SSH connection. The problem is that evolution sends all of the data instantly into the buffer, then waits on reading the response. And the delay before the response exceeds the fixed timeout built into evolution. Agreed that the SSH tunnel shouldn't have such a large buffer, but it does. I think this is to speed data along under very fast connections. One possible workaround would be to temporarily bump up the buffer size when transmitting a large file. I've been using Evolution with the default socket timeout changed from 90s to 600s for a few weeks now with no ill effects. This is on a laptop with numerous disconnect/reconnect cycles.
(In reply to Steve Holland from comment #3) > Adjusting ClientAliveInterval does not solve the problem. It is meant, as I understood it, to keep the connection alive between the client and the ssh agent. The ssh agent is meant to provide some activity on the connection to ensure the client that the connection is still working. If it doesn't work for you, then it might be a bug in the ssh. > One possible workaround would be to temporarily bump up the buffer size when > transmitting a large file. > > I've been using Evolution with the default socket timeout changed from 90s > to 600s for a few weeks now with no ill effects. This is on a laptop with > numerous disconnect/reconnect cycles. I would not do that, because: a) how does eds know you are using an ssh tunnel? b) how does eds differentiate between truly lost connection and the ssh tunnel buffering issue? Waiting 10 minutes for a timeout is unacceptable by many users. The 1.5 minute wait is also sometimes too long. I still believe that the right fix is to patch ssh, not evolution-data-server.
I made some changes related to this in bug #749292, it turned out that the timeout can happen also without the SSH tunnel. I'm marking this as a duplicate of it, because it's more general. *** This bug has been marked as a duplicate of bug 749292 ***