Bug 747342 – Short hardwired socket timeout prevents large emails from being sent through slow ssh tunnels

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 747342 - Short hardwired socket timeout prevents large emails from being sent through slow ssh tunnels


Summary:	Short hardwired socket timeout prevents large emails from being sent through ...


Status:	RESOLVED DUPLICATE of bug 749292

Product:	evolution-data-server
Classification:	Platform
Component:	Mailer
Version:	3.12.x (obsolete)
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	evolution-mail-maintainers
QA Contact:	Evolution QA team

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-04-04 16:39 UTC by Steve Holland
Modified:	2015-05-26 07:22 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Steve Holland 2015-04-04 16:39:06 UTC

SSH tunnels are sometimes useful for sending email, for example when the desired outgoing server is not directly accessible, e.g. ssh -L 25:outgoingserver:25 remote_host_near_outgoingserver. Then evolution is configured to send with 'localhost' as the email server and the mail is actually sent through the tunnel.

When the SSH tunnel is fairly slow and you are sending a large email, the mail send operation fails with a "socket i/o timed out" dialog box after about a minute and a half.

The problem is that the SSH tunnel has a large buffer that accepts the entire email very quickly. So evolution thinks it has sent the entire email almost instantaneously, and then starts waiting for an acknowledgement from the remote server. That acknowledgement will not come until the data has actually made its way through the tunnel. What happens in the case of a large mail, is that the wait for an acknowledgement times out, causing evolution to abandon attempting to send the mail.

With typical upload speeds on residential ADSL in the US still less than 1 MBps (in some cases much less than 1 MBps), a 10MB email could easily take 100 seconds or more to upload.

network_service_connect_sync() in camel/camel-network-service.c defaults the socket timeout to 90 seconds, and connect_to_server() in camel/providers/smtp/camel-smtp-transport.c does not change that default.

The result is that the 10 MB email sent through tunnel the fails to send because
the timeout is too short.

Increasing the default timeout in camel/camel-network-service.c does seem to work around the problem.

Comment 1 Milan Crha 2015-04-07 10:13:00 UTC

Thanks for a bug report. I wouldn't tweak the timeout, it might not know the difference between really stuck connection and this tunneling. I mean that the real issue here is this:

(In reply to Steve Holland from comment #0)
> The problem is that the SSH tunnel has a large buffer that accepts the
> entire email very quickly. So evolution thinks it has sent the entire email
> almost instantaneously, and then starts waiting for an acknowledgement from
> the remote server. That acknowledgement will not come until the data has
> actually made its way through the tunnel. What happens in the case of a
> large mail, is that the wait for an acknowledgement times out, causing
> evolution to abandon attempting to send the mail. 

Thus, from my point of view, the real fix should be to change the tunnel to accept/buffer data in less size and make sure to keep the connection alive. I didn't find any direct options for this. The ssh option 'ServerAliveInterval' looks like the one of the closest, together with 'TCPKeepAlive', but if I read the documentation properly, then these options influence connection between ssh and the destination server, not the source client (in this case evolution).

Comment 2 Milan Crha 2015-04-07 10:44:23 UTC

Just found out this openssh FAQ:
  http://www.openssh.com/faq.html#2.12

which mentions ClientAliveInterval. In doesn't show in mine `man ssh_config` for some reason (OpenSSH_6.6.1p1, OpenSSL 1.0.1j-fips 15 Oct 2014). In any case, this is the right way to fix the issue.

Comment 3 Steve Holland 2015-04-25 19:59:41 UTC

Adjusting ClientAliveInterval does not solve the problem. 

ClientAliveInterval keeps the SSH connection open (that wasn't the problem to begin with) but does not directly affect the tunnel within the SSH connection. 

The problem is that evolution sends all of the data instantly into the buffer, then waits on reading the response. And the delay before the response exceeds the fixed timeout built into evolution. 

Agreed that the SSH tunnel shouldn't have such a large buffer, but it does. I think this is to speed data along under very fast connections. 

One possible workaround would be to temporarily bump up the buffer size when transmitting a large file. 

I've been using Evolution with the default socket timeout changed from 90s to 600s for a few weeks now with no ill effects. This is on a laptop with numerous disconnect/reconnect cycles.

Comment 4 Milan Crha 2015-04-27 10:31:51 UTC

(In reply to Steve Holland from comment #3)
> Adjusting ClientAliveInterval does not solve the problem. 

It is meant, as I understood it, to keep the connection alive between the client and the ssh agent. The ssh agent is meant to provide some activity on the connection to ensure the client that the connection is still working. If it doesn't work for you, then it might be a bug in the ssh.

> One possible workaround would be to temporarily bump up the buffer size when
> transmitting a large file. 
> 
> I've been using Evolution with the default socket timeout changed from 90s
> to 600s for a few weeks now with no ill effects. This is on a laptop with
> numerous disconnect/reconnect cycles.

I would not do that, because:
a) how does eds know you are using an ssh tunnel?
b) how does eds differentiate between truly lost connection and the ssh
   tunnel buffering issue? Waiting 10 minutes for a timeout is unacceptable
   by many users. The 1.5 minute wait is also sometimes too long.

I still believe that the right fix is to patch ssh, not evolution-data-server.

Comment 5 Milan Crha 2015-05-26 07:22:51 UTC

I made some changes related to this in bug #749292, it turned out that the timeout can happen also without the SSH tunnel. I'm marking this as a duplicate of it, because it's more general.

*** This bug has been marked as a duplicate of bug 749292 ***