GNOME Bugzilla – Bug 574968
gvfs ftp backend appears to not wait for ftpd return code on STOR
Last modified: 2018-09-21 16:39:13 UTC
Please describe the problem: It appears that the gvfs fuse ftp backend caching is broken in some unknown (to me) way. If you try to create a file it tells you it failed and then doesn't show it in the directory until you issue a 'ls' for the entire directory. This causes applications to think they have failed to write. In openoffice.org's case it writes a lock file first and thinks it has failed so does not write the document itself and just gives an error complaining that it could not save to the ftp location. Steps to reproduce: $ touch foo; ls foo; ls | grep foo touch: cannot touch `foo': No such file or directory ls: cannot access foo: No such file or directory foo Actual results: touch claims to have failed, ls foo shows nothing, but ls | grep foo then shows the file Expected results: touch should work, ls foo should work Does this happen every time? yes Other information: gvfs 1.1.8 in Ubuntu Jaunty (9.04)
Created attachment 130468 [details] log of ftp transaction I did the following during this log: mounted the ftp location cd into the .gvfs/path/ then ran: "rm foo; touch foo; ls foo; ls | grep foo" The output was: rm: cannot remove `foo': No such file or directory touch: cannot touch `foo': No such file or directory ls: cannot access foo: No such file or directory foo Note: I did the rm just to make sure the file wasn't already there...
Not reproducible on vsftpd, Chris gave me an account on his pure-ftpd server for testing where I could see the problem. Conclusion: The server is lying to us. It's visible in the previously attached debug trace, STOR /foo -> ok -> LIST -> list doesn't include foo. Maybe gvfs can work around this somehow, but ultimately it's likely a pure-ftpd bug.
Andreas, This still doesn't explain the issue between "ls -l foo" and "ls -l | grep foo" does the "ls -l | grep foo" update the cache where "ls -l foo" does not? I have tried before issueing "ls -l foo" repeatedly and it always said the file did not exist, but as soon as I do a "ls -l | grep foo" it shows up, even immediately after touching the file. eg for i in `seq 1 20` ; do (rm foo; touch foo; ls -l foo) ; done failed every time, but for i in `seq 1 20` ; do (rm foo; touch foo; ls -l | grep foo) ; done passed every time. If "ls -l foo" doesn't update the cache then perhaps it is a timing issue of some sort? I don't know what is going on the backend of my webhost ftp server but it might be scanning the uploaded file before displaying it or something similarly odd.
Andreas just said this on IRC: 13:10 < fatal_> seems like the server stores new files as .pureftpd-upload.<random> -> returns success when transfer completes (which makes gvfs request a list) -> then renames the .pureftpd-upload.<random> file to it's proper name. 13:12 < fatal_> the upload operation isn't "atomic" which means the client can't be too fast requesting a file list after upload. 13:12 < fatal_> pure-ftpd should delay sending the success reply until after it has renamed the file IMHO.
This bug appears to be a bug in pureftpd not in gvfs. Closing invalid.
Actually I just looked at the log I attached here again... This is the same issue as bug 577160 (I think) in that the ftp backend is checking on the status of the file before the ftpd server returns that the file is properly uploaded.
Notice this part of the log, it did not "226 File successfully transferred" until after the ftp backend already did a NOOP, a CWD, and a LIST: <-- 229 Extended Passive mode OK (|||33536|) --> STOR /foo <-- 150 Accepted data connection --> NOOP <-- 200 Zzz... --> CWD / <-- 250 OK. Current directory is / --> EPSV <-- 229 Extended Passive mode OK (|||40264|) --> LIST -a <-- 150 Accepted data connection --- drwx--x--x 15 cheney cheney 4096 Mar 11 12:46 . --- drwx--x--x 15 cheney cheney 4096 Mar 11 12:46 .. --- -rw-r--r-- 1 cheney cheney 24 Mar 22 2007 .bash_logout --- -rw-r--r-- 1 cheney cheney 191 Mar 22 2007 .bash_profile --- -rw-r--r-- 1 cheney cheney 124 Mar 22 2007 .bashrc --- -rw------- 1 cheney cheney 22 Aug 27 2008 .contactemail --- drwxr-xr-x 5 cheney cheney 4096 Aug 30 2008 .cpanel --- drwx------ 2 cheney cheney 4096 Aug 9 2007 .cpanel-datastore --- -rw------- 1 cheney cheney 459733 Apr 9 2007 .cpanel-ducache --- -rw-r--r-- 1 cheney cheney 383 Mar 22 2007 .emacs --- drwxr-xr-x 5 cheney cheney 4096 Mar 26 2007 .fantasticodata --- -rw------- 1 cheney cheney 17 Mar 1 22:04 .ftpquota --- -rw-r--r-- 1 cheney cheney 26 Aug 11 2008 .htpasswd.mbw080908 --- -rw-r--r-- 1 cheney cheney 25 Aug 12 2008 .htpasswd.mbw081008 --- drwxr-xr-x 2 cheney cheney 4096 Mar 27 2007 .htpasswds --- -rw------- 1 cheney cheney 14 Mar 1 23:54 .lastlogin --- -rw-r--r-- 1 cheney cheney 0 Mar 11 12:46 .pureftpd-upload.49b7eacc.15.4f11.31aa1ec4 --- drwx------ 2 cheney cheney 4096 Mar 11 12:16 .spamassassin --- -rw-r--r-- 1 cheney cheney 0 Mar 22 2007 .spamassassinenable --- drwx------ 2 cheney cheney 4096 May 2 2008 .sqmaildata --- drwx------ 3 cheney cheney 4096 Apr 4 2007 .trash --- lrwxrwxrwx 1 cheney cheney 32 Sep 4 2007 access-logs -> /usr/local/apache/domlogs/cheney --- drwxr-xr-x 3 cheney cheney 4096 Mar 27 2007 data --- drwxr-x--- 9 cheney 12 4096 Jul 4 2008 etc --- -rw-rw-rw- 1 cheney cheney 0 Mar 28 2007 gallery2.logfile --- drwxrwx--- 18 cheney cheney 4096 Mar 1 23:54 mail --- drwxr-xr-x 3 cheney cheney 4096 Mar 22 2007 public_ftp --- drwxr-x--- 18 cheney 99 4096 Dec 10 12:11 public_html --- drwx------ 7 cheney cheney 4096 Mar 1 23:51 tmp --- lrwxrwxrwx 1 cheney cheney 11 Mar 22 2007 www -> public_html <-- 226-ASCII <-- 226-Options: -a -l <-- 226 30 matches total <-- 226 File successfully transferred
According to the ftp spec I am looking at you aren't supposed to do anything else after receiving a 1xx (eg 150) code until you get another code back... That would include the NOOP code that is being sent currently.
*** Bug 577160 has been marked as a duplicate of this bug. ***
Someone probably needs to audit the ftp backend to see where else it goes wrong with respect to following the specification.
See page 37 of rfc 959 " 1yz Positive Preliminary reply The requested action is being initiated; expect another reply before proceeding with a new command. (The user-process sending another command before the completion reply would be in violation of protocol; but server-FTP processes should queue any commands that arrive while a preceding command is in progress.) This type of reply can be used to indicate that the command was accepted and the user-process may now pay attention to the data connections, for implementations where simultaneous monitoring is difficult. The server-FTP process may send at most, one 1yz reply per command. " http://www.ietf.org/rfc/rfc959.txt So it seems that ftpd should not respond to further requests but that a user process issuing further requests is a violation of the protocol to begin with.
This is happening becaue the ftp backend is multithreaded and we do not list the thread id in the debug output. Every thread uses a separate connection to the ftp server. The storing thread does not do anything until it has gotten the 226 response, but another thread is doing the querying. So everything should be fine here. If you would limit the ftp server to oly one connection (or edited gvfs' Makefile.am to only allow 1 thread in the ftp daemon) you'd see that any operation on the ftp mount will return G_IO_ERROR_WOULD_BLOCK while the write still goes on. It's also bad that the debug output doesn't list job start/finish anymore since we got the --debug flag, otherwise we could follow requests coming in and going out more easily.
There has been some discussions on IRC and I'll try to summarize, the two separate issues we think are happening: * STOR vs creat() semantics The gvfsd-ftp backend clears its list cache directly after sending STOR, but some servers (read: pure-ftpd) stores the new file under a temporary filename until the entire transfer is finished. This could be solved by having the do_close_write() function clear the cache. This one is a minor issue, triggered by the issue below. * Multiple threads issue. As described by Company in comment #12; something is making gvfsd-ftp open up a (separate!) second connection, where the list happens (before the STOR is complete). They should happen after each other on the same connection, not in parallell. No solid ideas who or why yet... some ideas includes pointing fingers at the fuse daemon, but noone knows for sure. Would be great if the fuse daemon could be excluded from this equation by using the gvfs-save and gvfs-ls commands (instead of ~/.gvfs/).
I wrote a test program that appears to show the same problem. It does the following: open() write() fsync() fstat() close() It completes the entire sequence long before the file is uploaded, pretty much instantaneously even when the ftpd is rate limited to take ~ 60s for the upload to complete, and without any errors thrown. So it appears that anything that wants to check on the status of the file it wrote will end up breaking. This is because the fstat() call completes long before the write really finishes, since the calls are threaded and don't block as they probably should for posix compliance. I will test the gvfs-save/gvfs-ls commands now to see if they have similar issues.
when using the following: cat foo.txt | gvfs-save ftp://somepath ; gvfs-ls ftp://somepath It blocks gvfs-ls until gvfs-save actually has completely finished uploading the file. You can of course run gvfs-ls in another terminal and view the fact the file isn't done uploading.
Andreas Henriksson, I seem to not have mentioned that in the dupe bug report that I tested also with vsftpd so this is definitely not just a bug with pureftpd. The testing I have been doing over the past two days have been with using a vsftpd server. It seems the problem is primarily due to the multi-threading issue of executing fstat() (eg LIST in ftp backend case) immediately after the write() (eg STOR in ftp backend case) call, before the write() is even actually done on the backend side, which causes a program checking the results from fstat() to think that the write has failed when in fact it just hasn't finished yet.
Created attachment 131645 [details] test writing program This program along with using vsftpd with "local_max_rate=1024" should make it very easy to reproduce the problem. As it takes ~ 60s to write the file when that option is set.
As I noted on IRC, I think the close() and/or fsync() codepaths should also be checking that the file uploaded successfully as well before returning to the user app. Otherwise the user app has no way of knowing if the write completed successfully.
Andreas mentioned that close() should be doing this already but that fuse is breaking it somehow.
Created attachment 131862 [details] [review] gvfs-fuse-support-flush.patch Patch to support the FUSE flush() method.
This seems to have done something useful as the test program took longer to complete but there is also some other problem here because afterwards ls does not work any longer. ccheney@x200:~/.gvfs/ftp as ccheney on 127.0.0.1/home/ccheney/Desktop$ ~/test-write ccheney@x200:~/.gvfs/ftp as ccheney on 127.0.0.1/home/ccheney/Desktop$ ls ls: reading directory .: Input/output error
I also am not sure if this would solve the problem unless fstat() reports the information from cache instead of the actual server. It seems to only take a long time before close() returns now, so afaict it would seem that fstat() probably still reports what the ftp server thinks of the situation. And there is still the issue of the mount is broken after the file is done writing (eg ls fails). ccheney@x200:~/.gvfs/ftp as ccheney on 127.0.0.1/home/ccheney/Desktop$ ~/test-write calloc'ing memory open'ing file write'ing file fsync'ing file fstat'ing file close'ing file (takes a long time here)
If the mount breaks, that means gvfs-fuse-daemon crashed. To debug this, - Make sure you have debugging info installed/compiled in. - Killall gvfs-fuse-daemon. - gdb /usr/lib/gvfs/gvfs-fuse-daemon - (gdb) run -d ~/.gvfs - Provoke the crash in another terminal. - You should get the (gdb) prompt. - (gdb) t a a bt - Attach the output to this bug. It'd be nice to have the spew leading up to the crash too.
Also, I've got an additional patch that should make it wait on fsync() properly. You may want to try that after getting the crash trace (I haven't been able to make it crash here).
Created attachment 131891 [details] [review] gvfs-fuse-support-fsync.patch
Created attachment 131892 [details] [review] gvfs-fuse-sync-data.patch This diff from SVN trunk combines the previous two patches and adds extra logic that likely fixes your crash too.
Thanks, I'll try to find time tomorrow (Friday) to test out the new patch.
This still seems to give input/output error, and appears to be part of gvfs 1.2.1, as I compared the patch and it seemed to be already applied... " ccheney@x200:~/.gvfs/ftp as ccheney on 127.0.0.1/home/ccheney/Desktop$ ~/test-write calloc'ing memory open'ing file write'ing file fsync'ing file fstat'ing file close'ing file ccheney@x200:~/.gvfs/ftp as ccheney on 127.0.0.1/home/ccheney/Desktop$ ls -atrl ls: reading directory .: Input/output error total 0 "
Oh but I did notice it seemed to pause at fsync this time, it also seemed to pause at fstat but that might have been due to the io error, but I am uncertain.
When attempting to get a backtrace via gdb it doesn't seem to crash any longer, maybe some sort of race/thread issue?
I spoke too soon, I still got an io error but no crash when I changed up to .../home/ccheney from .../home/ccheney/Desktop. The log in gdb shows this, but no crash, note this is not the entire log, just the part that looked relevant. unique: 167, opcode: READDIR (28), nodeid: 29, insize: 80 unique: 167, error: 0 (Success), outsize: 16 unique: 168, opcode: RELEASEDIR (29), nodeid: 29, insize: 64 unique: 168, error: 0 (Success), outsize: 16 unique: 169, opcode: GETATTR (3), nodeid: 1, insize: 56 unique: 169, error: 0 (Success), outsize: 112 unique: 170, opcode: LOOKUP (1), nodeid: 1, insize: 68 LOOKUP /ftp as ccheney on 127.0.0.1 NODEID: 2 unique: 170, error: 0 (Success), outsize: 136 unique: 171, opcode: LOOKUP (1), nodeid: 2, insize: 45 LOOKUP /ftp as ccheney on 127.0.0.1/home NODEID: 7 unique: 171, error: 0 (Success), outsize: 136 unique: 172, opcode: LOOKUP (1), nodeid: 7, insize: 48 LOOKUP /ftp as ccheney on 127.0.0.1/home/ccheney NODEID: 28 unique: 172, error: 0 (Success), outsize: 136 unique: 173, opcode: LOOKUP (1), nodeid: 28, insize: 48 LOOKUP /ftp as ccheney on 127.0.0.1/home/ccheney/Desktop NODEID: 29 unique: 173, error: 0 (Success), outsize: 136 unique: 174, opcode: ACCESS (34), nodeid: 28, insize: 48 ACCESS /ftp as ccheney on 127.0.0.1/home/ccheney 01 unique: 174, error: 0 (Success), outsize: 16 unique: 175, opcode: OPENDIR (27), nodeid: 28, insize: 48 unique: 175, error: 0 (Success), outsize: 32 unique: 176, opcode: GETATTR (3), nodeid: 28, insize: 56 unique: 176, error: 0 (Success), outsize: 112 unique: 177, opcode: READDIR (28), nodeid: 28, insize: 80 unique: 177, error: -5 (Input/output error), outsize: 16 unique: 178, opcode: RELEASEDIR (29), nodeid: 28, insize: 64 unique: 178, error: 0 (Success), outsize: 16
Pinging about this bug again...
*** Bug 585992 has been marked as a duplicate of this bug. ***
Hello, this bug exists with Pure-FTPd FTP server with MySQL user authentication Version 1.0.21-11.4. I hope this helps. Best regards from germany, Peter Majmesku
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gvfs/issues/82.