GNOME Bugzilla – Bug 144726
silent abort when copying to vfat filesystem
Last modified: 2007-01-18 18:39:15 UTC
I have a set of directories that contain digital photo images, which I tried to copy to a different partition using Nautilus. The copy seemed to work, but was way too quick, so I looked in the directory, and it turns out that only about 5% of the files actually copied across. The copying aborted partway through, without giving any warning or error, and for no visible reason. Attached is a list of files (generated from 'find') from the original directory, then the target directory.
Created attachment 28891 [details] Original dir hierarchy
Created attachment 28892 [details] Hierarchy after copy operation
I marked this top priority, because of the potential for file loss. I was making backups, and was about to delete the originals when this happened.
Could you try copying them manually in a terminal to see if that is successful? Also, it appears you are copying to a vfat filesystem, could you check for symlinks or something similar that may have caused the copy to fail? Also, could you clarify the version? It's marked as 2.6.x, but the Gnome Version listed is 2.7/2.8. This looks like it might be a duplicate of bug 138491.
"cp -R" copies the whole hierarchy fine, so it can't be a symlink problem, since -R doesn't follow links. I listed it as 2.7/2.8 since it probably should be fixed by the release of 2.8 :-) It is nautilus-2.6.0-4, the FC2 default RPM. I am copying to a vfat filesystem. I can confirm that the problem does not exist when copying to an ext3 filesystem. I can also copy the individual files around the point where it stops (IMG_0041.JPG/img_0399.jpg) without a problem, so it seems to be a cumulative problem caused by copying many files in a directory. A race condition? Memory leak? File handle leak? A problem with Nautilus not expecting the case of files to change (vfat is all lower case)? It could certainly be a GNOME-VFS problem -- it seems that a GNOME-VFS xfer is used to actually do the transfer. It dies silently when the progress indicator is at about 10%. How can I see what it is doing when it dies? I tried attaching gdb, but no signal is emitted when the copy dies. Bug 138491 looks like a different problem to me, at least if I'm interpreting it properly. I think he's saying that only the dir/file that you actually drag is copied during multi-selection drags.
The bug also survives a reboot.
The version numbers are used to mark what versions the bug exist in, not when they should be fixed (the target milestones are for that, but are meant for the maintainer to set). The reason we have both a 'Version' field and a 'GNOME Version' is because not all products have version numbers that match the Gnome ones and we want to make it easy for the release team to 'query all bugs that still affect version 2.7'. Anyway, I'm setting Gnome version to 2.5/2.6 since this was nautilus 2.6.x. I'm also shortening the summary slightly.
You could try using gnomevfs-copy from the command line and see if that works or not.
Did you get a chance to try gnomevfs-copy?
I have problem with vfat filesystem too. Mount in fstab: /dev/hdb5 /media/DiskD vfat umask=0 0 0 Nautilus open location, show directories, but can't open any one. Can't create new direcory, and copy files. In terminal with mkdir, and cp, all working fine. I check partition with windows, no errors.
Same here. I Recently moved 250,000 dir/files from an ext3 to a vfat partition and had to copy them in small chunks else it would silently abort (bit by bit worked). I too nearly lost data and this is a high probability for those not paying attention to the copy dialogue! I see the bug has been about for nearly 9 months without a resolve. Do you need someone to do some testing? If so please specify and I will happily narrow the problem down for you. I must use vfat partitions and would like to continue to use nautilus to copy data to and from them.
M.Allchin: If you could run some tests, that would be great -- I just haven't had time to really investigate this deeply. I haven't tried gnomevfs-copy for example. I have given up on vfat/nautilus for now, so it's not even in my workflow anymore.
Using gnomevfs-copy and testing whether would be very appreciated.
Another nice thing would be if someone could come up with a minimal test case for this. For instance, if you can repeat this for a folder, can you make a copy of that folder and remove all the files that weren't copied except the first one that didn't get copied. Then does it still reproduce? If it does, does removing another file make be able to copy all files? (I.E. are we limited by number of files, or is it something specific to the file that made it break?) I wasn't able to reproduce this bug by just copying lots of files to vfat, so it seems there is something special about the files its failing at.
Investigating this bug is on my list of things to do in the near future, I'm just moving across the other side of the country right now, so I'm super busy :-) The other guys who have experienced this may be able to provide info more quickly. However, to answer your question, Alexander: as far as I recall, the copy stopped at different points each time, so it was non-deterministic. This usually indicates either a race-condition or the use of uninitialized memory. Given that you can't duplicate it on your own system, I would suggest the former, unless there is nothing in the copy operation that is multithreaded.
gnome-vfs operations are threaded, such that each asynchronous operation is handled by its own thread. However, a whole gnome_vfs_xfer operation is handled by one such thread, so I don't think threads are the problem.
Luke?
Is this bug still not fixed in the 2.12-release?
In the 2.10 release (Debian etch), the bug is still there. Is there any chance it gets solved soon? I almost lost half my music collection because of this bug!
gnomevfs-copy works fine BTW, only the graphical drag-drop copy is affected.
Mathijs: Odd. Are you definitly sure that *only* the drag-drop copy is affected, and that neither gnomevfs-copy nor the usual copy/paste procedure cause this issue? DnD copying and copy/paste copying both seem to involve fm_directory_view_move_copy_items, so there shouldn't be a difference. Updating version, milestoning to 2.14.
Oh, sorry, I meant: graphical: both DnD & Copy-paste DON'T work terminal: both cp -R & gnomevfs-copy work
Does the bug occur randomly or for particular directories? I'll come up with some file transfor debugging code, which helps us to find out what's going on.
For me it was random (it didn't always stop on the same file), although the copy always had to be going for a while before it failed. I just re-tested (sorry to be unresponsive on this bug until now), and I can't duplicate it currently, although I don't have the same large directory of digital photos on here anymore that it used to fail with.
Mathijs: is this problem only occuring for you when copying to a vfat partition? I never had this problem ext3->ext3, only ext3->vfat.
I experienced it on multiple occasions: I first discovered it while copying from a vfat harddisk partition to a vfat USB device (debian etch, gnome 2.10). But then (after I had copied the whole thing with cp -R), on another computer (ubuntu 5.10, gnome 2.12) the same thing happened when I tried to copy from the vfat USB device to the ext3 harddisk. So vfat (hd) -> vfat (usb), vfat (usb) -> ext3 (hd), (and copying back failed too, so also) ext3 (hd) -> vfat (usb) So could it be the USB device? But then still, it shouldn't happen that nautilus can't copy something cp and gnomevfs-copy can. Same here, it only happens when copying big directories. The weirdest thing is that I had a directory "music" containing 10 subdirectories just named 1 through 10, each with 100s of files. Copying the whole music directory didn't fail, but copying a single underlying directory did. So can it be something in the time calculation code involving the number of direct children of the copied directory?
I observed this problem copying from an ext3 partition on my laptop to a vfat partition on the same physical disk, so no, this is not a USB problem. That is weird that copying single directories failed while the whole thing did not. This is probably a race condition, although that's a weird inside-out manifestation of such a problem (usually race conditions become more likely the longer you run a system, all other things being equal)
Actually, just a thought Mathijs, your problem may or may not be related, I had a similar problem with an external USB drive on early 2.6 kernels -- the USB subsystem didn't throttle IO properly and as a result had a tendency to overrun and fail in the middle of a copy: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=130326 However having said that your symptoms do indeed sound the same as what I experienced when I originally reported this bug, and as I stated above I was going from one partition to another on the same drive, no USB involved. AFAICR, the USB throttling bug made the drive unusable until it was unmounted and remounted after a problem occurred. It doesn't sound like that was a problem for you, but I just wanted to throw out that possibility.
Created attachment 57122 [details] [review] Proposed patch The issue seems to be that Nautilus does all its error handling inside the async callback, which is not reliably called. http://blogs.gnome.org/edit/cneumair/drafts/2006/01/10 has some details.
Sorry, http://blogs.gnome.org/view/cneumair/2006/01/10 is the correct URI.
That blog page is wrong, so the patch is not right.
How does one replicate this bug? Can a simple test case be constructed that exposes the bug? Does it happen reliably if we sprinkle a few usleep()s in the VFS code?
I don't know how to duplicate it reliably -- the problem doesn't always occur on the same file. I just tried though and I can duplicate the problem still, which I didn't think I could do... Just go "Show Hidden Files" then "Select All" in your home directory, and drag everything to a vfat partition. If you have enough files, it should die at some point. (Took about 40 seconds on a very slow laptop, at about 2 files/sec.)
OK, I finally got my act together and ran tests of gnomevfs-copy. Sorry for the delay. I was wrong about Nautilus not stopping on the same file every time. At least for the test case I described in the last comment, it always stops on the same file. Unfortunately I don't know what file that is that it fails on, because I can't tell what order Nautilus is copying the files in. (It's not the order returned by 'find' or by 'ls -R'.) Two runs copied 1105 and 1108 of 11828 files respectively, then failed. The difference of 3 is accounted for by the fact that three files were thumbnailed after copying. The third test was to use gnomevfs-copy. It copied 3831 files total then failed with the error message: Failed to copy to <dir> Reason: Invalid URI Given that it did not stop at the same place, it may be failing for a different reason. I also noticed another difference between the copy methods -- Nautilus stops 3 times and says it can't copy a file (presumably because there are no read permissions on the file). I hit "Skip" each time. gnomevfs-copy doesn't ask any questions like this. I assume it is skipping the files by default? Anyway it is possible that it is the user interaction that eventually triggers the race condition?
Weird, I investigated the files that Nautilus is asking for user interaction while copying, and it doesn't make sense to me. I get: Error: "Operation not permitted" while copying "/home/luke/...o/biblio.dbf" $ find ~ -name biblio.dbf -exec ls -l \{\} \; -rw-r--r-- 1 luke luke 58661 Nov 9 2004 /home/luke/.rhopenoffice1.1/user/database/biblio/biblio.dbf -rw-r--r-- 1 luke luke 113095 Apr 12 2005 /home/luke/.openoffice.org2.0/user/database/biblio/biblio.dbf I don't see any reason that there should be an operation not permitted on these files... The perms are normal, they are normal files (not links etc.).
OK, I watched the copy operation closely until it failed, and was able to see the following file flash up for a split second before the window closed silently: /home/luke/.openoffice.org2.0/user/config/classico.sog I will attach the file. It doesn't look like it's anything special. It is definitely in the source and not in the destination, so it did not copy over.
Created attachment 57172 [details] The file the copy fails on, using Nautilus copy
Ah, and gnomevfs-copy fails on the following: ~/.mozilla/firefox/wx7qym8c.default/lock It's some weird soft link to an IP/port, it shows up as broken (red bg) in ls: -rw-rw-r-- 1 luke luke 9417 Jan 11 13:33 formhistory.dat -rw-rw-r-- 1 luke luke 330068 Jan 12 08:11 history.dat -rw-r--r-- 1 luke luke 719 Nov 28 2004 install.log -rw------- 1 luke luke 16384 Jan 11 13:33 key3.db -rw-r--r-- 1 luke luke 9596 Jan 12 08:03 localstore.rdf lrwxrwxrwx 1 luke luke 15 Jan 11 15:23 lock -> 127.0.0.1:+3917 -rw-r--r-- 1 luke luke 7887 Nov 29 13:04 mimeTypes.rdf -rw-rw-r-- 1 luke luke 0 Jan 11 15:23 .parentlock -rwxr-xr-x 1 luke luke 5369 Jan 11 13:33 prefs.js -rw-r--r-- 1 luke luke 752 Nov 27 2004 search.rdf $ gnomevfs-copy ~/.mozilla/firefox/wx7qym8c.default/lock /tmp Failed to copy /home/luke/.mozilla/firefox/wx7qym8c.default/lock to /tmp Reason: Invalid URI
I now have a minimum test case for the Nautilus failure. (I think there are two separate failures in this bug, Nautilus and gnomevfs-copy). The attachment contains a directory, "config". Nautilus fails after copying "config/modern_rus.sog", which successfully makes it to the destination. "config/classico.sog". Interestingly there is also a "config/Classico.sog" which has successfully copied. So it appears that the fact that there are two files with the same name but different case in the same dir is tripping up Nautilus when copying to vfat.
Created attachment 57174 [details] The config directory that causes the Nautilus failure The config directory that causes the Nautilus failure
I also investigated the gnomevfs-copy failure, and it is definitely the "lock" softlink/socket that is causing the problem. Nautilus actually gives the user the option of skipping the file, although it reports the wrong filename (it says "Invalid URI" for the file before the failure, ".parentlock", although it is definitely "lock" that is at fault, as can be demonstrated by moving "lock" out of the directory and trying again.) This is another bug (that the wrong filename is reported)... I'm filing it here rather than elsewhere at this point to keep everything together, let me know if/when you want me to file another bug report. Also Nautilus does not die suddenly in this Invalid URI case, which was the problem in the original bug report. (gnomevfs-copy does die, albeit not silently, but perhaps that is its behavior when any error occurs?)
Sorry for the comment spam, but hopefully some of this is uesful! I can confirm that an even more minimalistic test case for the Nautilus problem is to just create two files with the same name but different case, and copy them along with other files. Nautilus copies in a weird order (sometimes it appears to be the same as the inode order returned by 'find', sometimes different), so you might have to select quite a few files to ensure that some files selected will be copied after the ones you created. Another strange thing is that if you create two files with the same name but a different case, that don't have extensions (e.g. "temp" and "Temp"), Nautilus happily copies both over *without* failing. Both appear in the vfat window, then if you hit refresh one disappears. For some reason it is only files that have different cases and have extensions that fail. I don't have time to investigate right now but probably the difference in behavior between 8.3-format DOS filenames and long filenames should also be investigated, as it is possible there is a difference there too.
Thanks for all your insigthful comments Luke! > For some reason it is only files that have different cases and have extensions that fail. (...) > I don't have time to investigate right now but probably the difference in behavior between > 8.3-format DOS filenames and long filenames should also be investigated, as it is possible there is a difference there too. Exactly! :) We don't deal with GNOME_VFS_ERROR_NAME_TOO_LONG in handle_transfer_duplicate from nautilus-file-operations.c, like we do in new_file_transfer_callback and new_folder_transfer_callback (those are used for "New File"/"New Folder" feature) However, I can't find any code that generates GNOME_VFS_ERROR_NAME_TOO_LONG in GnomeVFS ATM.
If you have some time, you can also grab my patch to gnomevfs-copy [1] which will print very verbose output if invoked with the "-vv" option. [1] http://mail.gnome.org/archives/gnome-vfs-list/2006-January/msg00015.html
Hey, but my problem was copying from vfat to vfat, so there can't be any files with the same name with only the case differring! So there must be another problem too.
Did this get looked at for GNOME 2.14? Is it hard to fix? It's a bad one because it can cause data loss. gnome-vfs2 probably just needs to send Nautilus a warning, and it should open a dialog asking if the user wants to overwrite the first file with the second file that has different case. It would seem that a lot of code could be reused for this (the code that asks the user if they want to overwrite a file if it already exists).
Adding GNOME Target
I'm not sure if bug 347457 is a dupe of this or if it's just a "simplest" test case.
Not sure if bug 342437 is correlated somehow to this bug, or that it is a completely unrelated bug.
Re. comment #49: no, I think that bug is to do with hitting the 32-bit unsigned integer limit.
Christian Kellner, Alex Larsson: Do you think it is a good idea to (ab)use GNOME_VFS_XFER_PROGRESS_STATUS_DUPLICATE together with GNOME_VFS_ERROR_INVALID_FILENAME to let the application provide a new, DOS-compliant filename, as we already do with GNOME_VFS_ERROR_NAME_TOO_LONG for long filenames?
I confirm this, on Ubuntu 6.06 with all updates, and Nautilus 2.14.3-0ubuntu1. When I try to copy/move files that include e.g. "file.txt" and "FILE.TXT" to a VFAT filesystem the operation is aborted silently and I am none the wiser. I have even deleted files accidentally that I thought had been copied over :/
Should the target field be updated? The problem will likely only become more of an issue with increasing use of FAT-formatted flash devices...
GNOME 2.16.0 still has that issue. Ubuntu bug about that: https://launchpad.net/products/nautilus/+bug/52348. Updating settings and target since 2.16 is the new stable
So, i experimented a bit with "file.txt" and "FILE.TXT", and i got this: [alex@greebo fat_test]$ ls -l total 0 ?--------- ? ? ? ? ? file.txt [alex@greebo fat_test]$ rm file.txt rm: cannot remove `file.txt': No such file or directory [alex@greebo fat_test]$ touch file.txt touch: cannot touch `file.txt': File exists WTH?
*** Bug 347457 has been marked as a duplicate of this bug. ***
Fixed in CVS: 2006-11-06 Alexander Larsson <alexl@redhat.com> * libgnomevfs/gnome-vfs-xfer.c: (copy_items): Don't always cancel on EFILEEXISTS unless we're trying to generate unique filenames. This fixes a silent abort when copying "file.txt" and "FILE.TXT" to a case insensitive filesystem (like FAT). (#144726)
Thanks Alexander!