After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 588187 - Large FTP transfers hang gvfsd-ftp after the first completed file
Large FTP transfers hang gvfsd-ftp after the first completed file
Status: RESOLVED OBSOLETE
Product: gvfs
Classification: Core
Component: ftp backend
1.3.x
Other All
: Normal minor
: ---
Assigned To: gvfs-maint
gvfs-maint
Depends on:
Blocks:
 
 
Reported: 2009-07-09 19:02 UTC by Calorì Alessandro
Modified: 2018-09-21 16:50 UTC
See Also:
GNOME target: ---
GNOME version: 2.27/2.28


Attachments
gvfsd log (340 bytes, text/plain)
2009-07-09 19:04 UTC, Calorì Alessandro
  Details
Complete gvfsd 1.2.3 log (4.28 KB, text/plain)
2009-07-14 14:59 UTC, Calorì Alessandro
  Details
Complete gvfsd 1.3.2 (211.05 KB, text/plain)
2009-07-16 16:46 UTC, Calorì Alessandro
  Details
Bug 588187 – Large FTP transfers hang gvfsd-ftp (2.09 KB, patch)
2009-07-31 14:17 UTC, Benjamin Otte (Company)
committed Details | Review

Description Calorì Alessandro 2009-07-09 19:02:09 UTC
Please describe the problem:
When I transfer large files (>= 1GB) from my server gvfsd tells me it gets an "invalid reply" from the server.

At the end of the transfer the file isn't corrupted (I checked with an MD5 hash) so it's not a big issue but it's quite annoying when I trasfer multiple large files (it stops transferring).

Steps to reproduce:
1. Connect to an FTP server (with login, I didn't tried anonymous transfers)
2. Select a large file to download
3. Wait...

Actual results:
The file is downloaded correctly but gvfs stops the following transfers (I have to cancel and redo the transfer).

Expected results:
gvfs should transfer all the files with no errors regardlessy to their size.

Does this happen every time?
Yes.

Other information:
I guessed it may be a problem with the distro I was using locally because Filezilla and other clients don't show anything so I moved from Gentoo (stable, GNOME 2.24) to Arch Linux (stable, GNOME 2.26) but the problem persists.

I tried changing the FTP server from proftpd to pure-ftpd but I keep having the same issue.
Comment 1 Calorì Alessandro 2009-07-09 19:04:02 UTC
Created attachment 138136 [details]
gvfsd log

This is the gvfsd log's tail.
Comment 2 Andreas Henriksson 2009-07-11 15:02:47 UTC
Seems like you're triggering the "broken EPSV, fall back on PASV" workaround... This code has been changed during gvfs 1.3.x ... It would be great if you could possibly test 1.3.x or even git master.... Please also provide full gvfsd log to be able to determine how many gvfsd-ftp are running in parallell.

Comment 3 Calorì Alessandro 2009-07-11 16:04:31 UTC
Ok, first of all: what is the "broken EPSV, fall back on PASV" workaround? Can I workaround this issue with the version I have right now (1.2.3)? I could use also a backport patch and test it.

I work on this system so I prefer staying "stable" as much as possible. On the other hand I have a system I would like to use for testing GNOME (I would like to help GNOME developers and join the bug squad): what distro do you reccomend which uses testing version of GNOME? Do I need LFS?

Then, how can I provide full gvfsd log? You mean the whole gvfsd's console log?

Thank you for your patience.
Comment 4 Andreas Henriksson 2009-07-13 02:16:08 UTC
(In reply to comment #3)
> Ok, first of all: what is the "broken EPSV, fall back on PASV" workaround? Can
> I workaround this issue with the version I have right now (1.2.3)? I could use
> also a backport patch and test it.

If you fix the server it won't be used. Usually it's because the server is behind some kind of NAT device that doesn't understand EPSV and opens/redirects the port used for EPSV. So even when the server _says_ "epsv is ok" we need to actually check if the connection succeds and if it doesn't, which looks to be the case for you, we use PASV instead (which many NAT devices do understand).
It's still a bug in gvfs though, but as I said the code has been changed and it would be nice to know if the workaround is related at all to the problem or if we're chasing a bug which possibly doesn't exist anymore.

> 
> I work on this system so I prefer staying "stable" as much as possible. On the
> other hand I have a system I would like to use for testing GNOME (I would like
> to help GNOME developers and join the bug squad): what distro do you reccomend
> which uses testing version of GNOME? Do I need LFS?

Ubuntu (karmic) or Fedora (maybe rawhide is needed) should both have enough bleeding edge software. I don't have any particular recommendation. I think the gnome developer recommendation is to use "jhbuild" for building everything from scratch. I don't do that though...

> 
> Then, how can I provide full gvfsd log? You mean the whole gvfsd's console log?

Yes, the snippet you posted was only part of it. Hard to determine exactly what's going on without the full log from the entire FTP session. Please unmount and then start collecting the output and reconnect... It's easier to understand what's going on with the full log (ie. how many simultaneous connections has been opened. The development version has improved logging for this too.)

> 
> Thank you for your patience.
> 
No problem, thanks for following up on your report. Pretty hard to fix issues when you can't reproduce them yourself. Followup from the people who can is very much needed.
Comment 5 Calorì Alessandro 2009-07-13 07:24:03 UTC
(In reply to comment #4)
> If you fix the server it won't be used. Usually it's because the server is
> behind some kind of NAT device that doesn't understand EPSV and opens/redirects
> the port used for EPSV. So even when the server _says_ "epsv is ok" we need to
> actually check if the connection succeds and if it doesn't, which looks to be
> the case for you, we use PASV instead (which many NAT devices do understand).
> It's still a bug in gvfs though, but as I said the code has been changed and it
> would be nice to know if the workaround is related at all to the problem or if
> we're chasing a bug which possibly doesn't exist anymore.

Ok, I'm going to attach the full GVFS log.

However I did other tests and the EPSV command does work. I own a public server without any NAT; the client instead is behind a NAT (my ISP has a private MAN network).

> Ubuntu (karmic) or Fedora (maybe rawhide is needed) should both have enough
> bleeding edge software. I don't have any particular recommendation. I think the
> gnome developer recommendation is to use "jhbuild" for building everything from
> scratch. I don't do that though...

I use arch linux that is quite bleeding edge but the packaging is quite bad (no debug symbols and no versioning) but it's quite stable even if it provides the latest stable version.

I'm going to learn how to use jhbuild and try to make it work on a basic system.

> No problem, thanks for following up on your report. Pretty hard to fix issues
> when you can't reproduce them yourself. Followup from the people who can is
> very much needed.

If you want I can provide you a test account on my server and copy some distros' images on the home dir.
Comment 6 Calorì Alessandro 2009-07-14 14:59:58 UTC
Created attachment 138399 [details]
Complete gvfsd 1.2.3 log
Comment 7 Calorì Alessandro 2009-07-15 18:34:53 UTC
Tested with gvfs 1.3.2. With the development version the download just hangs after the end of the first file with no error messages. Unfortunately I wasn't able to get a log because "gvfsd -r" doesn't print anything on the console now. You said that "the development version has improved logging" but how can I enable it?
Comment 8 Andreas Henriksson 2009-07-16 14:52:32 UTC
(In reply to comment #7)
> how can I enable it?

The verbose mode is disabled by default nowadays.... try:
GVFS_DEBUG=1 gvfsd -r




Comment 9 Calorì Alessandro 2009-07-16 16:46:02 UTC
Created attachment 138543 [details]
Complete gvfsd 1.3.2

As I said before, it simply gets stucked after the end of the first download.
Comment 10 Calorì Alessandro 2009-07-16 16:49:37 UTC
In the server's syslog I only found:

pure-ftpd: (alessandro@blackstone) [NOTICE] /home/alessandro//foresight-2.1.1-x86_64-dvd1.iso downloaded  (1547386880 bytes, 280.68KB/sec)
pure-ftpd: (alessandro@blackstone) [INFO] Timeout - try typing a little faster next time
Comment 11 Benjamin Otte (Company) 2009-07-17 20:26:30 UTC
What's the exact error message, just "Invalid reply"?
Comment 12 Calorì Alessandro 2009-07-17 20:45:11 UTC
Yes, with GNOME 2.26 gvfsd finishes the first download then it tells me that an error occurred; the details just contain "Invalid reply". On the other hand, with GNOME 2.27 gvfsd just hangs with no error messages.
Comment 13 Benjamin Otte (Company) 2009-07-19 14:25:25 UTC
Could you attach a debug log and a "thr a a bt" from gdb of the 2.27 gvfsd-ftp when the deadlock happens?
I'm not having any clue as to why this would deadlock from just reading the code.

I'm still looking for pure-ftpd installations on the web so I can test this myself, but I'm not finding any. They're all using vsftpd it seems...
Comment 14 Calorì Alessandro 2009-07-19 15:33:20 UTC
(In reply to comment #13)
> Could you attach a debug log and a "thr a a bt" from gdb of the 2.27 gvfsd-ftp
> when the deadlock happens?

Isn't the one I provided a debug log? By the way can you explain exactly how can I provide you the other "thing" you need?

> I'm still looking for pure-ftpd installations on the web so I can test this
> myself, but I'm not finding any. They're all using vsftpd it seems...

As I said before I tried either proftpd and pure-ftpd but not vsftpd. I'm going to try this one too. Also I can provide you a pure-ftpd account on my server if you need to debug the ftp backend.
Comment 15 Calorì Alessandro 2009-07-19 17:09:54 UTC
Tried vsftpd also... same issue. It's definitely something wrong with the ftp backend because I can download with no problems at all with other clients (i.e. FileZilla). If you need the account let me know.
Comment 16 Andreas Henriksson 2009-07-19 21:47:53 UTC
(In reply to comment #15)
> Tried vsftpd also... same issue. It's definitely something wrong with the ftp
> backend because I can download with no problems at all with other clients (i.e.
> FileZilla). 

I've tried vsftpd and pure-ftpd locally... different configurations, but can't reproduce your problem. I've discussed it with Benjamin Otte on IRC and I think we're both pretty much unknowing about what the issue might be. All extra info you can provide is very valuable.
There are two things which could give more hints, a gdb backtrace from when gvfsd-ftp "freezes". You can launch gvfsd-ftp directly if you give "host=foo user=bar password=quux" as arguments. Please reproduce under gdb, press control-C and then type the command Benjamin Otte provided before.
(gdb gvfsd-ftp ... set args host=foo user=bar password=quux .... run .... <reproduce problem>... control-c ... thread apply all bt ... )
Another useful thing would be to verify that the server actually sends the "226" transfer completed message on the control channel. This could be verified with wireshark. The client doesn't seem to receive in the debug log you sent atleast (but it does in my tests)....


> If you need the account let me know.

Please send account details to the email addresses listed under CC of this bug report (me and otte).

Comment 17 Calorì Alessandro 2009-07-20 12:27:32 UTC
So, I re-enabled proftpd since it seems not server-related and I disabled 'EPSV' support from the server (only PASV). Unfortunately the problem persists so my ISP NAT should be excluded from the list of the curpirts (FileZilla only uses PASV and it works).

I also tried to run gvfsd-ftp from the command-line: it connected to the server then it shutted down. This is the log:

Added new job source 0x10ef040 (GVfsBackendFtp)
Queued new job 0x10f0000 (GVfsJobMount)
<- 0 --  220 ProFTPD 1.3.2 Server [::ffff:94.23.49.188]
-- 0 ->  FEAT
<- 0 --  211-Funzionalità:
 MDTM
 MFMT
 LANG it-IT
 UTF8
 AUTH TLS
 MFF modify;UNIX.group;UNIX.mode;
 MLST modify*;perm*;size*;type*;unique*;UNIX.group*;UNIX.mode*;UNIX.owner*;
 PBSZ
 PROT
 REST STREAM
 MODE Z
 SIZE
<- 0 --  211 Fine
send_reply, failed: 1
Mount failed: Password dialog cancelled

What does it mean? I provided the host, the port, the username and the password as parameters...
Comment 18 Benjamin Otte (Company) 2009-07-20 16:06:50 UTC
It means that passwords given on the command line are ignored by gvfsd-ftp. So it tries to ask for them and fails.

I've downloaded files from your server twice and it worked fine every time. I've found (and fixed) different issues, but could not make out any problem with your ftp server.
FWIW, the last download took 29m57.065s so it would have definitely cancelled, but it did not.

I'm still testing various things, but so far nothing suspicious is happening.
Comment 19 Calorì Alessandro 2009-07-20 19:36:55 UTC
(In reply to comment #18)
> It means that passwords given on the command line are ignored by gvfsd-ftp. So
> it tries to ask for them and fails.

How can I make it work?

> I've downloaded files from your server twice and it worked fine every time.
> I've found (and fixed) different issues, but could not make out any problem
> with your ftp server.

I don't have any problem with other clients too, only with gvfs. Tomorrow I'm gonna try to dump the stream with wireshark hoping it can provide userful information about this issue.
Comment 20 Calorì Alessandro 2009-07-21 21:19:38 UTC
I tried to upload a big file with gvfsd, I aborted the process after a while (~10 minutes) and then I tried to unmount the connection. The result:

DBus error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

Hope it helps out...
Comment 21 Calorì Alessandro 2009-07-30 07:31:12 UTC
Is there any news about this bug?
Comment 22 Benjamin Otte (Company) 2009-07-30 08:10:24 UTC
No, unfortunately there isn't, at least not from my side. I downloaded stuff fine from your ftp server, so it doesn't seem to happen here. I looked at your wireshark dump and that looks like it just stops sending stuff for no reason I can make up.
I'm all out of ideas for what could cause this. Maybe someone else knows...
Comment 23 Calorì Alessandro 2009-07-30 08:52:37 UTC
Is there anything else I can provide? I would like to test gvfsd-ftp directly in debug mode but as I stated in comment #17 it doesn't work as expected.
Comment 24 Benjamin Otte (Company) 2009-07-30 09:01:41 UTC
I do debug mode like this:
- install debug gvfs to /usr
- run GVFS_DEBUG=1 /usr/libexec/gvfsd -r
- run nautilus or gvfs-mount ftp://...
- optionally: gdb --pid `pidof gvfsd-ftp`
- start tests here
Comment 25 Calorì Alessandro 2009-07-30 14:31:36 UTC
Ok guys, I found the source of the evil: THE PORT!

As explained here [1] running a FTP server on a port different than 21 can cause problems on the client side if the client is firewalled or NAT-ed. It seems that my ISP's routers don't know how to handle FTP connections on non-standard ports and this leads the connection to fall on timeout even if it's still a valid connection.

Now we should understand why other clients doesn't stop between files' download and fix this is gvfs' ftp backend. Maybe gvfsd's ftp backend needs a "keep-alive" function to keep the connection from falling.

[1] http://www.ncftp.com/ncftpd/doc/misc/ftp_and_firewalls.html
Comment 26 Benjamin Otte (Company) 2009-07-30 15:49:29 UTC
Interesting.
This also got me an idea: Could you do a wireshark log (only the data connection should be enough) with a different ftp client that works and upload it somewhere?
THat way it should be easy to figure out if they do something different.
Comment 27 Calorì Alessandro 2009-07-30 18:17:42 UTC
Ok, I was able to get the same error with another client (NcFTP):

ncftp / > get foresight-2.1.1-x86_64-dvd1.iso foresight-2.1.1-x86-dvd1.iso  
foresight-2.1.1-x86_64-dvd1.iso:  ETA:   0:00    1,44/  1,44 GB  443,17 kB/s
Could not read reply from control connection -- timed out.
foresight-2.1.1-x86_64-dvd1.iso:                         1,44 GB  426,27 kB/s  
get foresight-2.1.1-x86_64-dvd1.iso: could not retrieve remote file.
get foresight-2.1.1-x86-dvd1.iso: remote host closed control connection.

It's definitely a timeout problem. Now I'm trying with Filezilla.
Comment 28 Calorì Alessandro 2009-07-30 20:13:14 UTC
The only difference between the gvfsd dump and the filezilla's one is that the client detects the control connection as "timed out" (FTP error code 421) and it opens a new one when it finishes the first download (right before the second data connection is opened).

The problems seems to be the following:
1) gvfsd open a control connection;
2) gvfsd open a data connection for the first file to transfer;
3) the control connection falls (connection timeout);
4) gvfsd finishes the first file and tries to write on the control connection socket (shouldn't it throw a "broken pipe" error?);
5) gvfsd waits indefinitely the server response on a broken pipe.
Comment 29 Benjamin Otte (Company) 2009-07-30 21:36:51 UTC
Nope, that's not what happens. What happens is:
4) gvfsd waits for the "file transmitted" reply indefinitely.
Actually, it doesn't wait indefinitely, but only until the tcp connection times out (usually 10 minutes I think), but I think nobody wants to wait that long...

But then, even if gvfs caught the timeout signal properly, that still doesn't help, as it'd still generate a timeout error and not a success.

We could of course add a workaround if it'd turn out that this is a common error on ftp servers, but I think it's a problem with your firewall, and should be fixed there.
Comment 30 Calorì Alessandro 2009-07-30 23:28:20 UTC
As I said I'm NAT-ed behind the ISP router, not mine, so I can actually do nothing about it.

The simplest workaround should just be a NOOP message from gvfsd to the FTP server every (say) five minutes just to keep the control connection alive while the data are being downloaded.
Comment 31 Benjamin Otte (Company) 2009-07-31 14:08:43 UTC
Could you still upload a log of a working FTP client, so I can see what they do?
I assume you have the "KeepAlive system" enabled there? (see http://filezilla.sourceforge.net/documentation/connection.htm )

I'm very unhappy about sending data down the control connection while an operation is in process, as that is prone to races, and I don't want to open another can of worms.

I think this problem would best be solved in lower layers than gvfs - like by instructing the kernel to always send keepalive packets. Luckily, that involves changing the Linux kernel source code. (yay!)
I guess what we could do is enable keepalive in the ftp client, and then you'd need to configure the keepalive kernel parameters to send updates after less than 10 minutes of inactivity (just before the router resets the connection). See http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/ for how to do that.
Comment 32 Benjamin Otte (Company) 2009-07-31 14:17:02 UTC
Created attachment 139627 [details] [review]
Bug 588187 – Large FTP transfers hang gvfsd-ftp

Enable keepalive on the command connection. See the comment in the
source code or the bug report for a detailed explanation.
Comment 33 Benjamin Otte (Company) 2009-07-31 14:20:19 UTC
Could you try applying this patch and configuring according to the howto I posted above and run the test again?

If it doesn't work, you should probably check with wireshark to see if a keepalive packet is sent down the control connection.

After reporting success, I'll apply the patch to git master.
Comment 34 Calorì Alessandro 2009-07-31 17:01:23 UTC
As explained on the Filezilla support forum [1], that client has a different design: it uses a primary control connection for browsing and a secondary one for file transfers. At the end of each download Filezilla checks if the secondary connection is still available and uses it otherwise it reconnect to the server avoiding the user to reconnect manually. In my Filezilla configuration I don't have the keepalive setting enabled but it does work. I'm testing your patch right now and I'll make a complete wireshark dump ASAP.

[1] http://forum.filezilla-project.org/viewtopic.php?f=2&t=4834
Comment 35 Calorì Alessandro 2009-07-31 18:08:04 UTC
Ok, I used these settings system-wide:

net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 20

Well, it doesn't work yet but at least now gvfsd detects that the connection
isn't working. The last part of the gvfsd debug log contains the following
messages:

<- 0 --  421 Timeout di sessione (3600 secondi): chiusura della connessione di
controllo
send_reply(0x209fb20), failed=1 (Host closed connection)
backend_dbus_handler org.gtk.vfs.Mount:QueryInfo
Queued new job 0x209fa80 (GVfsJobQueryInfo)
send_reply(0x209fa80), failed=0 ()

I'm preparing the Filezilla dump.
Comment 36 Benjamin Otte (Company) 2009-07-31 18:52:16 UTC
I guess we could detect a closing of the control connection and ignore it when receiving the final reply. I'm not entirely convinced it's a good idea, but if Filezilla does that, too, it might be fine. 
We'd not detect broken transfers if the network connection went down though, because that'd timeout the control connection and the data connection and then we'd ignore that and send an "ok". Hrm...
Comment 37 GNOME Infrastructure Team 2018-09-21 16:50:29 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gvfs/issues/102.