Bug 574968 – gvfs ftp backend appears to not wait for ftpd return code on STOR

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 574968 - gvfs ftp backend appears to not wait for ftpd return code on STOR


Summary:	gvfs ftp backend appears to not wait for ftpd return code on STOR


Status:	RESOLVED OBSOLETE

Product:	gvfs
Classification:	Core
Component:	fuse
Version:	1.2.x
Hardware:	Other All

Importance:	Normal critical
Target Milestone:	---
Assigned To:	Hans Petter Jansson
QA Contact:	gvfs-maint

URL:
Whiteboard:

Duplicates:	577160 585992 (view as bug list)
Depends on:
Blocks:

Reported:	2009-03-11 16:40 UTC by Chris
Modified:	2018-09-21 16:39 UTC

See Also:
GNOME target:	---
GNOME version:	2.25/2.26

Attachments
log of ftp transaction (9.70 KB, text/plain) 2009-03-11 16:51 UTC, Chris		Details
test writing program (825 bytes, text/plain) 2009-03-29 20:23 UTC, Chris		Details
gvfs-fuse-support-flush.patch (619 bytes, patch) 2009-04-01 19:33 UTC, Hans Petter Jansson	committed	Details \| Review
gvfs-fuse-support-fsync.patch (988 bytes, patch) 2009-04-02 04:42 UTC, Hans Petter Jansson	committed	Details \| Review
gvfs-fuse-sync-data.patch (2.82 KB, patch) 2009-04-02 04:59 UTC, Hans Petter Jansson	committed	Details \| Review

Description Chris 2009-03-11 16:40:30 UTC

Please describe the problem:
It appears that the gvfs fuse ftp backend caching is broken in some unknown (to me) way. If you try to create a file it tells you it failed and then doesn't show it in the directory until you issue a 'ls' for the entire directory.

This causes applications to think they have failed to write. In openoffice.org's case it writes a lock file first and thinks it has failed so does not write the document itself and just gives an error complaining that it could not save to the ftp location.

Steps to reproduce:
$ touch foo; ls foo; ls | grep foo
touch: cannot touch `foo': No such file or directory
ls: cannot access foo: No such file or directory
foo


Actual results:
touch claims to have failed, ls foo shows nothing, but ls | grep foo then shows the file

Expected results:
touch should work, ls foo should work

Does this happen every time?
yes

Other information:
gvfs 1.1.8 in Ubuntu Jaunty (9.04)

Comment 1 Chris 2009-03-11 16:51:15 UTC

Created attachment 130468 [details]
log of ftp transaction

I did the following during this log:

mounted the ftp location
cd into the .gvfs/path/
then ran: "rm foo; touch foo; ls foo; ls | grep foo"
The output was:
rm: cannot remove `foo': No such file or directory
touch: cannot touch `foo': No such file or directory
ls: cannot access foo: No such file or directory
foo

Note: I did the rm just to make sure the file wasn't already there...

Comment 2 Andreas Henriksson 2009-03-13 18:01:20 UTC

Not reproducible on vsftpd, Chris gave me an account on his pure-ftpd server for testing where I could see the problem.

Conclusion: The server is lying to us. It's visible in the previously attached debug trace, STOR /foo -> ok -> LIST -> list doesn't include foo.

Maybe gvfs can work around this somehow, but ultimately it's likely a pure-ftpd bug.

Comment 3 Chris 2009-03-13 18:14:08 UTC

Andreas,

This still doesn't explain the issue between "ls -l foo" and "ls -l | grep foo" does the "ls -l | grep foo" update the cache where "ls -l foo" does not? I have tried before issueing "ls -l foo" repeatedly and it always said the file did not exist, but as soon as I do a "ls -l | grep foo" it shows up, even immediately after touching the file.

eg

for i in `seq 1 20` ; do (rm foo; touch foo; ls -l foo) ; done

failed every time, but

for i in `seq 1 20` ; do (rm foo; touch foo; ls -l | grep foo) ; done

passed every time.

If "ls -l foo" doesn't update the cache then perhaps it is a timing issue of some sort? I don't know what is going on the backend of my webhost ftp server but it might be scanning the uploaded file before displaying it or something similarly odd.

Comment 4 Chris 2009-03-13 18:14:51 UTC

Andreas just said this on IRC:

13:10 < fatal_> seems like the server stores new files as 
                .pureftpd-upload.<random> -> returns success when transfer 
                completes (which makes gvfs request a list) -> then renames the 
                .pureftpd-upload.<random> file to it's proper name.
13:12 < fatal_> the upload operation isn't "atomic" which means the client 
                can't be too fast requesting a file list after upload.
13:12 < fatal_> pure-ftpd should delay sending the success reply until after it 
                has renamed the file IMHO.

Comment 5 Chris 2009-03-29 04:31:31 UTC

This bug appears to be a bug in pureftpd not in gvfs.

Closing invalid.

Comment 6 Chris 2009-03-29 04:33:36 UTC

Actually I just looked at the log I attached here again... This is the same issue as bug 577160 (I think) in that the ftp backend is checking on the status of the file before the ftpd server returns that the file is properly uploaded.

Comment 7 Chris 2009-03-29 04:36:00 UTC

Notice this part of the log, it did not "226 File successfully transferred" until after the ftp backend already did a NOOP, a CWD, and a LIST:

<-- 229 Extended Passive mode OK (|||33536|)
--> STOR /foo
<-- 150 Accepted data connection
--> NOOP
<-- 200 Zzz...
--> CWD /
<-- 250 OK. Current directory is /
--> EPSV
<-- 229 Extended Passive mode OK (|||40264|)
--> LIST -a
<-- 150 Accepted data connection
--- drwx--x--x   15 cheney   cheney       4096 Mar 11 12:46 .
--- drwx--x--x   15 cheney   cheney       4096 Mar 11 12:46 ..
--- -rw-r--r--    1 cheney   cheney         24 Mar 22  2007 .bash_logout
--- -rw-r--r--    1 cheney   cheney        191 Mar 22  2007 .bash_profile
--- -rw-r--r--    1 cheney   cheney        124 Mar 22  2007 .bashrc
--- -rw-------    1 cheney   cheney         22 Aug 27  2008 .contactemail
--- drwxr-xr-x    5 cheney   cheney       4096 Aug 30  2008 .cpanel
--- drwx------    2 cheney   cheney       4096 Aug  9  2007 .cpanel-datastore
--- -rw-------    1 cheney   cheney     459733 Apr  9  2007 .cpanel-ducache
--- -rw-r--r--    1 cheney   cheney        383 Mar 22  2007 .emacs
--- drwxr-xr-x    5 cheney   cheney       4096 Mar 26  2007 .fantasticodata
--- -rw-------    1 cheney   cheney         17 Mar  1 22:04 .ftpquota
--- -rw-r--r--    1 cheney   cheney         26 Aug 11  2008 .htpasswd.mbw080908
--- -rw-r--r--    1 cheney   cheney         25 Aug 12  2008 .htpasswd.mbw081008
--- drwxr-xr-x    2 cheney   cheney       4096 Mar 27  2007 .htpasswds
--- -rw-------    1 cheney   cheney         14 Mar  1 23:54 .lastlogin
--- -rw-r--r--    1 cheney   cheney          0 Mar 11 12:46 .pureftpd-upload.49b7eacc.15.4f11.31aa1ec4
--- drwx------    2 cheney   cheney       4096 Mar 11 12:16 .spamassassin
--- -rw-r--r--    1 cheney   cheney          0 Mar 22  2007 .spamassassinenable
--- drwx------    2 cheney   cheney       4096 May  2  2008 .sqmaildata
--- drwx------    3 cheney   cheney       4096 Apr  4  2007 .trash
--- lrwxrwxrwx    1 cheney   cheney         32 Sep  4  2007 access-logs -> /usr/local/apache/domlogs/cheney
--- drwxr-xr-x    3 cheney   cheney       4096 Mar 27  2007 data
--- drwxr-x---    9 cheney   12           4096 Jul  4  2008 etc
--- -rw-rw-rw-    1 cheney   cheney          0 Mar 28  2007 gallery2.logfile
--- drwxrwx---   18 cheney   cheney       4096 Mar  1 23:54 mail
--- drwxr-xr-x    3 cheney   cheney       4096 Mar 22  2007 public_ftp
--- drwxr-x---   18 cheney   99           4096 Dec 10 12:11 public_html
--- drwx------    7 cheney   cheney       4096 Mar  1 23:51 tmp
--- lrwxrwxrwx    1 cheney   cheney         11 Mar 22  2007 www -> public_html
<-- 226-ASCII
<-- 226-Options: -a -l 
<-- 226 30 matches total
<-- 226 File successfully transferred

Comment 8 Chris 2009-03-29 04:41:03 UTC

According to the ftp spec I am looking at you aren't supposed to do anything else after receiving a 1xx (eg 150) code until you get another code back... That would include the NOOP code that is being sent currently.

Comment 9 Chris 2009-03-29 04:44:25 UTC

*** Bug 577160 has been marked as a duplicate of this bug. ***

Comment 10 Chris 2009-03-29 04:45:46 UTC

Someone probably needs to audit the ftp backend to see where else it goes wrong with respect to following the specification.

Comment 11 Chris 2009-03-29 04:49:52 UTC

See page 37 of rfc 959

"  1yz   Positive Preliminary reply

               The requested action is being initiated; expect another
               reply before proceeding with a new command.  (The
               user-process sending another command before the
               completion reply would be in violation of protocol; but
               server-FTP processes should queue any commands that
               arrive while a preceding command is in progress.)  This
               type of reply can be used to indicate that the command
               was accepted and the user-process may now pay attention
               to the data connections, for implementations where
               simultaneous monitoring is difficult.  The server-FTP
               process may send at most, one 1yz reply per command.
"

http://www.ietf.org/rfc/rfc959.txt

So it seems that ftpd should not respond to further requests but that a user process issuing further requests is a violation of the protocol to begin with.

Comment 12 Benjamin Otte (Company) 2009-03-29 11:15:36 UTC

This is happening becaue the ftp backend is multithreaded and we do not list the thread id in the debug output. Every thread uses a separate connection to the ftp server.
The storing thread does not do anything until it has gotten the 226 response, but another thread is doing the querying. So everything should be fine here.
If you would limit the ftp server to oly one connection (or edited gvfs' Makefile.am to only allow 1 thread in the ftp daemon) you'd see that any operation on the ftp mount will return G_IO_ERROR_WOULD_BLOCK while the write still goes on.

It's also bad that the debug output doesn't list job start/finish anymore since we got the --debug flag, otherwise we could follow requests coming in and going out more easily.

Comment 13 Andreas Henriksson 2009-03-29 18:07:38 UTC

There has been some discussions on IRC and I'll try to summarize, the two separate issues we think are happening:

* STOR vs creat() semantics
The gvfsd-ftp backend clears its list cache directly after sending STOR, but some servers (read: pure-ftpd) stores the new file under a temporary filename until the entire transfer is finished.
This could be solved by having the do_close_write() function clear the cache.
This one is a minor issue, triggered by the issue below.

* Multiple threads issue.
As described by Company in comment #12; something is making gvfsd-ftp open up a (separate!) second connection, where the list happens (before the STOR is complete). They should happen after each other on the same connection, not in parallell.
No solid ideas who or why yet... some ideas includes pointing fingers at the fuse daemon, but noone knows for sure. Would be great if the fuse daemon could be excluded from this equation by using the gvfs-save and gvfs-ls commands (instead of ~/.gvfs/).

Comment 14 Chris 2009-03-29 19:35:19 UTC

I wrote a test program that appears to show the same problem.

It does the following:

open()
write()
fsync()
fstat()
close()

It completes the entire sequence long before the file is uploaded, pretty much instantaneously even when the ftpd is rate limited to take ~ 60s for the upload to complete, and without any errors thrown. So it appears that anything that wants to check on the status of the file it wrote will end up breaking. This is because the fstat() call completes long before the write really finishes, since the calls are threaded and don't block as they probably should for posix compliance.

I will test the gvfs-save/gvfs-ls commands now to see if they have similar issues.

Comment 15 Chris 2009-03-29 19:49:23 UTC

when using the following:

cat foo.txt | gvfs-save ftp://somepath ; gvfs-ls ftp://somepath

It blocks gvfs-ls until gvfs-save actually has completely finished uploading the file. You can of course run gvfs-ls in another terminal and view the fact the file isn't done uploading.

Comment 16 Chris 2009-03-29 20:13:49 UTC

Andreas Henriksson,

I seem to not have mentioned that in the dupe bug report that I tested also with vsftpd so this is definitely not just a bug with pureftpd. The testing I have been doing over the past two days have been with using a vsftpd server. It seems the problem is primarily due to the multi-threading issue of executing fstat() (eg LIST in ftp backend case) immediately after the write() (eg STOR in ftp backend case) call, before the write() is even actually done on the backend side, which causes a program checking the results from fstat() to think that the write has failed when in fact it just hasn't finished yet.

Comment 17 Chris 2009-03-29 20:23:17 UTC

Created attachment 131645 [details]
test writing program

This program along with using vsftpd with "local_max_rate=1024" should make it very easy to reproduce the problem. As it takes ~ 60s to write the file when that option is set.

Comment 18 Chris 2009-03-29 20:48:55 UTC

As I noted on IRC, I think the close() and/or fsync() codepaths should also be checking that the file uploaded successfully as well before returning to the user app. Otherwise the user app has no way of knowing if the write completed successfully.

Comment 19 Chris 2009-03-29 20:52:58 UTC

Andreas mentioned that close() should be doing this already but that fuse is breaking it somehow.

Comment 20 Hans Petter Jansson 2009-04-01 19:33:48 UTC

Created attachment 131862 [details] [review]
gvfs-fuse-support-flush.patch

Patch to support the FUSE flush() method.

Comment 21 Chris 2009-04-01 21:18:04 UTC

This seems to have done something useful as the test program took longer to
complete but there is also some other problem here because afterwards ls does
not work any longer.

ccheney@x200:~/.gvfs/ftp as ccheney on 127.0.0.1/home/ccheney/Desktop$
~/test-write 
ccheney@x200:~/.gvfs/ftp as ccheney on 127.0.0.1/home/ccheney/Desktop$ ls
ls: reading directory .: Input/output error

Comment 22 Chris 2009-04-01 21:24:03 UTC

I also am not sure if this would solve the problem unless fstat() reports the information from cache instead of the actual server. It seems to only take a long time before close() returns now, so afaict it would seem that fstat() probably still reports what the ftp server thinks of the situation. And there is still the issue of the mount is broken after the file is done writing (eg ls fails).

ccheney@x200:~/.gvfs/ftp as ccheney on 127.0.0.1/home/ccheney/Desktop$ ~/test-write 
calloc'ing memory
open'ing file
write'ing file
fsync'ing file
fstat'ing file
close'ing file
(takes a long time here)

Comment 23 Hans Petter Jansson 2009-04-02 03:44:19 UTC

If the mount breaks, that means gvfs-fuse-daemon crashed. To debug this,

- Make sure you have debugging info installed/compiled in.
- Killall gvfs-fuse-daemon.
- gdb /usr/lib/gvfs/gvfs-fuse-daemon 
- (gdb) run -d ~/.gvfs
- Provoke the crash in another terminal.
- You should get the (gdb) prompt.
- (gdb) t a a bt
- Attach the output to this bug. It'd be nice to have the spew leading up to the crash too.

Comment 24 Hans Petter Jansson 2009-04-02 04:41:47 UTC

Also, I've got an additional patch that should make it wait on fsync() properly. You may want to try that after getting the crash trace (I haven't been able to make it crash here).

Comment 25 Hans Petter Jansson 2009-04-02 04:42:47 UTC

Created attachment 131891 [details] [review]
gvfs-fuse-support-fsync.patch

Comment 26 Hans Petter Jansson 2009-04-02 04:59:21 UTC

Created attachment 131892 [details] [review]
gvfs-fuse-sync-data.patch

This diff from SVN trunk combines the previous two patches and adds extra logic that likely fixes your crash too.

Comment 27 Chris 2009-04-03 04:50:47 UTC

Thanks, I'll try to find time tomorrow (Friday) to test out the new patch.

Comment 28 Chris 2009-04-07 14:38:36 UTC

This still seems to give input/output error, and appears to be part of gvfs 1.2.1, as I compared the patch and it seemed to be already applied...

"
ccheney@x200:~/.gvfs/ftp as ccheney on 127.0.0.1/home/ccheney/Desktop$ ~/test-write 
calloc'ing memory
open'ing file
write'ing file
fsync'ing file
fstat'ing file
close'ing file
ccheney@x200:~/.gvfs/ftp as ccheney on 127.0.0.1/home/ccheney/Desktop$ ls -atrl
ls: reading directory .: Input/output error
total 0
"

Comment 29 Chris 2009-04-07 14:39:21 UTC

Oh but I did notice it seemed to pause at fsync this time, it also seemed to pause at fstat but that might have been due to the io error, but I am uncertain.

Comment 30 Chris 2009-04-07 14:43:31 UTC

When attempting to get a backtrace via gdb it doesn't seem to crash any longer, maybe some sort of race/thread issue?

Comment 31 Chris 2009-04-07 14:45:54 UTC

I spoke too soon, I still got an io error but no crash when I changed up to .../home/ccheney from .../home/ccheney/Desktop. The log in gdb shows this, but no crash, note this is not the entire log, just the part that looked relevant.

unique: 167, opcode: READDIR (28), nodeid: 29, insize: 80
   unique: 167, error: 0 (Success), outsize: 16
unique: 168, opcode: RELEASEDIR (29), nodeid: 29, insize: 64
   unique: 168, error: 0 (Success), outsize: 16
unique: 169, opcode: GETATTR (3), nodeid: 1, insize: 56
   unique: 169, error: 0 (Success), outsize: 112
unique: 170, opcode: LOOKUP (1), nodeid: 1, insize: 68
LOOKUP /ftp as ccheney on 127.0.0.1
   NODEID: 2
   unique: 170, error: 0 (Success), outsize: 136
unique: 171, opcode: LOOKUP (1), nodeid: 2, insize: 45
LOOKUP /ftp as ccheney on 127.0.0.1/home
   NODEID: 7
   unique: 171, error: 0 (Success), outsize: 136
unique: 172, opcode: LOOKUP (1), nodeid: 7, insize: 48
LOOKUP /ftp as ccheney on 127.0.0.1/home/ccheney
   NODEID: 28
   unique: 172, error: 0 (Success), outsize: 136
unique: 173, opcode: LOOKUP (1), nodeid: 28, insize: 48
LOOKUP /ftp as ccheney on 127.0.0.1/home/ccheney/Desktop
   NODEID: 29
   unique: 173, error: 0 (Success), outsize: 136
unique: 174, opcode: ACCESS (34), nodeid: 28, insize: 48
ACCESS /ftp as ccheney on 127.0.0.1/home/ccheney 01
   unique: 174, error: 0 (Success), outsize: 16
unique: 175, opcode: OPENDIR (27), nodeid: 28, insize: 48
   unique: 175, error: 0 (Success), outsize: 32
unique: 176, opcode: GETATTR (3), nodeid: 28, insize: 56
   unique: 176, error: 0 (Success), outsize: 112
unique: 177, opcode: READDIR (28), nodeid: 28, insize: 80
   unique: 177, error: -5 (Input/output error), outsize: 16
unique: 178, opcode: RELEASEDIR (29), nodeid: 28, insize: 64
   unique: 178, error: 0 (Success), outsize: 16

Comment 32 Chris 2009-04-13 16:39:49 UTC

Pinging about this bug again...

Comment 33 Enrico Tröger 2009-06-17 14:19:27 UTC

*** Bug 585992 has been marked as a duplicate of this bug. ***

Comment 34 Peter Majmesku 2009-06-18 12:24:36 UTC

Hello,

this bug exists with Pure-FTPd FTP server with MySQL user authentication Version 1.0.21-11.4. I hope this helps.

Best regards from germany,
Peter Majmesku

Comment 35 GNOME Infrastructure Team 2018-09-21 16:39:13 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gvfs/issues/82.