GNOME Bugzilla – Bug 588187
Large FTP transfers hang gvfsd-ftp after the first completed file
Last modified: 2018-09-21 16:50:29 UTC
Please describe the problem: When I transfer large files (>= 1GB) from my server gvfsd tells me it gets an "invalid reply" from the server. At the end of the transfer the file isn't corrupted (I checked with an MD5 hash) so it's not a big issue but it's quite annoying when I trasfer multiple large files (it stops transferring). Steps to reproduce: 1. Connect to an FTP server (with login, I didn't tried anonymous transfers) 2. Select a large file to download 3. Wait... Actual results: The file is downloaded correctly but gvfs stops the following transfers (I have to cancel and redo the transfer). Expected results: gvfs should transfer all the files with no errors regardlessy to their size. Does this happen every time? Yes. Other information: I guessed it may be a problem with the distro I was using locally because Filezilla and other clients don't show anything so I moved from Gentoo (stable, GNOME 2.24) to Arch Linux (stable, GNOME 2.26) but the problem persists. I tried changing the FTP server from proftpd to pure-ftpd but I keep having the same issue.
Created attachment 138136 [details] gvfsd log This is the gvfsd log's tail.
Seems like you're triggering the "broken EPSV, fall back on PASV" workaround... This code has been changed during gvfs 1.3.x ... It would be great if you could possibly test 1.3.x or even git master.... Please also provide full gvfsd log to be able to determine how many gvfsd-ftp are running in parallell.
Ok, first of all: what is the "broken EPSV, fall back on PASV" workaround? Can I workaround this issue with the version I have right now (1.2.3)? I could use also a backport patch and test it. I work on this system so I prefer staying "stable" as much as possible. On the other hand I have a system I would like to use for testing GNOME (I would like to help GNOME developers and join the bug squad): what distro do you reccomend which uses testing version of GNOME? Do I need LFS? Then, how can I provide full gvfsd log? You mean the whole gvfsd's console log? Thank you for your patience.
(In reply to comment #3) > Ok, first of all: what is the "broken EPSV, fall back on PASV" workaround? Can > I workaround this issue with the version I have right now (1.2.3)? I could use > also a backport patch and test it. If you fix the server it won't be used. Usually it's because the server is behind some kind of NAT device that doesn't understand EPSV and opens/redirects the port used for EPSV. So even when the server _says_ "epsv is ok" we need to actually check if the connection succeds and if it doesn't, which looks to be the case for you, we use PASV instead (which many NAT devices do understand). It's still a bug in gvfs though, but as I said the code has been changed and it would be nice to know if the workaround is related at all to the problem or if we're chasing a bug which possibly doesn't exist anymore. > > I work on this system so I prefer staying "stable" as much as possible. On the > other hand I have a system I would like to use for testing GNOME (I would like > to help GNOME developers and join the bug squad): what distro do you reccomend > which uses testing version of GNOME? Do I need LFS? Ubuntu (karmic) or Fedora (maybe rawhide is needed) should both have enough bleeding edge software. I don't have any particular recommendation. I think the gnome developer recommendation is to use "jhbuild" for building everything from scratch. I don't do that though... > > Then, how can I provide full gvfsd log? You mean the whole gvfsd's console log? Yes, the snippet you posted was only part of it. Hard to determine exactly what's going on without the full log from the entire FTP session. Please unmount and then start collecting the output and reconnect... It's easier to understand what's going on with the full log (ie. how many simultaneous connections has been opened. The development version has improved logging for this too.) > > Thank you for your patience. > No problem, thanks for following up on your report. Pretty hard to fix issues when you can't reproduce them yourself. Followup from the people who can is very much needed.
(In reply to comment #4) > If you fix the server it won't be used. Usually it's because the server is > behind some kind of NAT device that doesn't understand EPSV and opens/redirects > the port used for EPSV. So even when the server _says_ "epsv is ok" we need to > actually check if the connection succeds and if it doesn't, which looks to be > the case for you, we use PASV instead (which many NAT devices do understand). > It's still a bug in gvfs though, but as I said the code has been changed and it > would be nice to know if the workaround is related at all to the problem or if > we're chasing a bug which possibly doesn't exist anymore. Ok, I'm going to attach the full GVFS log. However I did other tests and the EPSV command does work. I own a public server without any NAT; the client instead is behind a NAT (my ISP has a private MAN network). > Ubuntu (karmic) or Fedora (maybe rawhide is needed) should both have enough > bleeding edge software. I don't have any particular recommendation. I think the > gnome developer recommendation is to use "jhbuild" for building everything from > scratch. I don't do that though... I use arch linux that is quite bleeding edge but the packaging is quite bad (no debug symbols and no versioning) but it's quite stable even if it provides the latest stable version. I'm going to learn how to use jhbuild and try to make it work on a basic system. > No problem, thanks for following up on your report. Pretty hard to fix issues > when you can't reproduce them yourself. Followup from the people who can is > very much needed. If you want I can provide you a test account on my server and copy some distros' images on the home dir.
Created attachment 138399 [details] Complete gvfsd 1.2.3 log
Tested with gvfs 1.3.2. With the development version the download just hangs after the end of the first file with no error messages. Unfortunately I wasn't able to get a log because "gvfsd -r" doesn't print anything on the console now. You said that "the development version has improved logging" but how can I enable it?
(In reply to comment #7) > how can I enable it? The verbose mode is disabled by default nowadays.... try: GVFS_DEBUG=1 gvfsd -r
Created attachment 138543 [details] Complete gvfsd 1.3.2 As I said before, it simply gets stucked after the end of the first download.
In the server's syslog I only found: pure-ftpd: (alessandro@blackstone) [NOTICE] /home/alessandro//foresight-2.1.1-x86_64-dvd1.iso downloaded (1547386880 bytes, 280.68KB/sec) pure-ftpd: (alessandro@blackstone) [INFO] Timeout - try typing a little faster next time
What's the exact error message, just "Invalid reply"?
Yes, with GNOME 2.26 gvfsd finishes the first download then it tells me that an error occurred; the details just contain "Invalid reply". On the other hand, with GNOME 2.27 gvfsd just hangs with no error messages.
Could you attach a debug log and a "thr a a bt" from gdb of the 2.27 gvfsd-ftp when the deadlock happens? I'm not having any clue as to why this would deadlock from just reading the code. I'm still looking for pure-ftpd installations on the web so I can test this myself, but I'm not finding any. They're all using vsftpd it seems...
(In reply to comment #13) > Could you attach a debug log and a "thr a a bt" from gdb of the 2.27 gvfsd-ftp > when the deadlock happens? Isn't the one I provided a debug log? By the way can you explain exactly how can I provide you the other "thing" you need? > I'm still looking for pure-ftpd installations on the web so I can test this > myself, but I'm not finding any. They're all using vsftpd it seems... As I said before I tried either proftpd and pure-ftpd but not vsftpd. I'm going to try this one too. Also I can provide you a pure-ftpd account on my server if you need to debug the ftp backend.
Tried vsftpd also... same issue. It's definitely something wrong with the ftp backend because I can download with no problems at all with other clients (i.e. FileZilla). If you need the account let me know.
(In reply to comment #15) > Tried vsftpd also... same issue. It's definitely something wrong with the ftp > backend because I can download with no problems at all with other clients (i.e. > FileZilla). I've tried vsftpd and pure-ftpd locally... different configurations, but can't reproduce your problem. I've discussed it with Benjamin Otte on IRC and I think we're both pretty much unknowing about what the issue might be. All extra info you can provide is very valuable. There are two things which could give more hints, a gdb backtrace from when gvfsd-ftp "freezes". You can launch gvfsd-ftp directly if you give "host=foo user=bar password=quux" as arguments. Please reproduce under gdb, press control-C and then type the command Benjamin Otte provided before. (gdb gvfsd-ftp ... set args host=foo user=bar password=quux .... run .... <reproduce problem>... control-c ... thread apply all bt ... ) Another useful thing would be to verify that the server actually sends the "226" transfer completed message on the control channel. This could be verified with wireshark. The client doesn't seem to receive in the debug log you sent atleast (but it does in my tests).... > If you need the account let me know. Please send account details to the email addresses listed under CC of this bug report (me and otte).
So, I re-enabled proftpd since it seems not server-related and I disabled 'EPSV' support from the server (only PASV). Unfortunately the problem persists so my ISP NAT should be excluded from the list of the curpirts (FileZilla only uses PASV and it works). I also tried to run gvfsd-ftp from the command-line: it connected to the server then it shutted down. This is the log: Added new job source 0x10ef040 (GVfsBackendFtp) Queued new job 0x10f0000 (GVfsJobMount) <- 0 -- 220 ProFTPD 1.3.2 Server [::ffff:94.23.49.188] -- 0 -> FEAT <- 0 -- 211-Funzionalità: MDTM MFMT LANG it-IT UTF8 AUTH TLS MFF modify;UNIX.group;UNIX.mode; MLST modify*;perm*;size*;type*;unique*;UNIX.group*;UNIX.mode*;UNIX.owner*; PBSZ PROT REST STREAM MODE Z SIZE <- 0 -- 211 Fine send_reply, failed: 1 Mount failed: Password dialog cancelled What does it mean? I provided the host, the port, the username and the password as parameters...
It means that passwords given on the command line are ignored by gvfsd-ftp. So it tries to ask for them and fails. I've downloaded files from your server twice and it worked fine every time. I've found (and fixed) different issues, but could not make out any problem with your ftp server. FWIW, the last download took 29m57.065s so it would have definitely cancelled, but it did not. I'm still testing various things, but so far nothing suspicious is happening.
(In reply to comment #18) > It means that passwords given on the command line are ignored by gvfsd-ftp. So > it tries to ask for them and fails. How can I make it work? > I've downloaded files from your server twice and it worked fine every time. > I've found (and fixed) different issues, but could not make out any problem > with your ftp server. I don't have any problem with other clients too, only with gvfs. Tomorrow I'm gonna try to dump the stream with wireshark hoping it can provide userful information about this issue.
I tried to upload a big file with gvfsd, I aborted the process after a while (~10 minutes) and then I tried to unmount the connection. The result: DBus error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. Hope it helps out...
Is there any news about this bug?
No, unfortunately there isn't, at least not from my side. I downloaded stuff fine from your ftp server, so it doesn't seem to happen here. I looked at your wireshark dump and that looks like it just stops sending stuff for no reason I can make up. I'm all out of ideas for what could cause this. Maybe someone else knows...
Is there anything else I can provide? I would like to test gvfsd-ftp directly in debug mode but as I stated in comment #17 it doesn't work as expected.
I do debug mode like this: - install debug gvfs to /usr - run GVFS_DEBUG=1 /usr/libexec/gvfsd -r - run nautilus or gvfs-mount ftp://... - optionally: gdb --pid `pidof gvfsd-ftp` - start tests here
Ok guys, I found the source of the evil: THE PORT! As explained here [1] running a FTP server on a port different than 21 can cause problems on the client side if the client is firewalled or NAT-ed. It seems that my ISP's routers don't know how to handle FTP connections on non-standard ports and this leads the connection to fall on timeout even if it's still a valid connection. Now we should understand why other clients doesn't stop between files' download and fix this is gvfs' ftp backend. Maybe gvfsd's ftp backend needs a "keep-alive" function to keep the connection from falling. [1] http://www.ncftp.com/ncftpd/doc/misc/ftp_and_firewalls.html
Interesting. This also got me an idea: Could you do a wireshark log (only the data connection should be enough) with a different ftp client that works and upload it somewhere? THat way it should be easy to figure out if they do something different.
Ok, I was able to get the same error with another client (NcFTP): ncftp / > get foresight-2.1.1-x86_64-dvd1.iso foresight-2.1.1-x86-dvd1.iso foresight-2.1.1-x86_64-dvd1.iso: ETA: 0:00 1,44/ 1,44 GB 443,17 kB/s Could not read reply from control connection -- timed out. foresight-2.1.1-x86_64-dvd1.iso: 1,44 GB 426,27 kB/s get foresight-2.1.1-x86_64-dvd1.iso: could not retrieve remote file. get foresight-2.1.1-x86-dvd1.iso: remote host closed control connection. It's definitely a timeout problem. Now I'm trying with Filezilla.
The only difference between the gvfsd dump and the filezilla's one is that the client detects the control connection as "timed out" (FTP error code 421) and it opens a new one when it finishes the first download (right before the second data connection is opened). The problems seems to be the following: 1) gvfsd open a control connection; 2) gvfsd open a data connection for the first file to transfer; 3) the control connection falls (connection timeout); 4) gvfsd finishes the first file and tries to write on the control connection socket (shouldn't it throw a "broken pipe" error?); 5) gvfsd waits indefinitely the server response on a broken pipe.
Nope, that's not what happens. What happens is: 4) gvfsd waits for the "file transmitted" reply indefinitely. Actually, it doesn't wait indefinitely, but only until the tcp connection times out (usually 10 minutes I think), but I think nobody wants to wait that long... But then, even if gvfs caught the timeout signal properly, that still doesn't help, as it'd still generate a timeout error and not a success. We could of course add a workaround if it'd turn out that this is a common error on ftp servers, but I think it's a problem with your firewall, and should be fixed there.
As I said I'm NAT-ed behind the ISP router, not mine, so I can actually do nothing about it. The simplest workaround should just be a NOOP message from gvfsd to the FTP server every (say) five minutes just to keep the control connection alive while the data are being downloaded.
Could you still upload a log of a working FTP client, so I can see what they do? I assume you have the "KeepAlive system" enabled there? (see http://filezilla.sourceforge.net/documentation/connection.htm ) I'm very unhappy about sending data down the control connection while an operation is in process, as that is prone to races, and I don't want to open another can of worms. I think this problem would best be solved in lower layers than gvfs - like by instructing the kernel to always send keepalive packets. Luckily, that involves changing the Linux kernel source code. (yay!) I guess what we could do is enable keepalive in the ftp client, and then you'd need to configure the keepalive kernel parameters to send updates after less than 10 minutes of inactivity (just before the router resets the connection). See http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/ for how to do that.
Created attachment 139627 [details] [review] Bug 588187 – Large FTP transfers hang gvfsd-ftp Enable keepalive on the command connection. See the comment in the source code or the bug report for a detailed explanation.
Could you try applying this patch and configuring according to the howto I posted above and run the test again? If it doesn't work, you should probably check with wireshark to see if a keepalive packet is sent down the control connection. After reporting success, I'll apply the patch to git master.
As explained on the Filezilla support forum [1], that client has a different design: it uses a primary control connection for browsing and a secondary one for file transfers. At the end of each download Filezilla checks if the secondary connection is still available and uses it otherwise it reconnect to the server avoiding the user to reconnect manually. In my Filezilla configuration I don't have the keepalive setting enabled but it does work. I'm testing your patch right now and I'll make a complete wireshark dump ASAP. [1] http://forum.filezilla-project.org/viewtopic.php?f=2&t=4834
Ok, I used these settings system-wide: net.ipv4.tcp_keepalive_time = 300 net.ipv4.tcp_keepalive_intvl = 60 net.ipv4.tcp_keepalive_probes = 20 Well, it doesn't work yet but at least now gvfsd detects that the connection isn't working. The last part of the gvfsd debug log contains the following messages: <- 0 -- 421 Timeout di sessione (3600 secondi): chiusura della connessione di controllo send_reply(0x209fb20), failed=1 (Host closed connection) backend_dbus_handler org.gtk.vfs.Mount:QueryInfo Queued new job 0x209fa80 (GVfsJobQueryInfo) send_reply(0x209fa80), failed=0 () I'm preparing the Filezilla dump.
I guess we could detect a closing of the control connection and ignore it when receiving the final reply. I'm not entirely convinced it's a good idea, but if Filezilla does that, too, it might be fine. We'd not detect broken transfers if the network connection went down though, because that'd timeout the control connection and the data connection and then we'd ignore that and send an "ok". Hrm...
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gvfs/issues/102.