GNOME Bugzilla – Bug 329098
Volume monitor emits "volume-unmounted" too early for USB devices
Last modified: 2008-09-06 19:08:31 UTC
USB mass data storage devices can be mounted async, and some data may not be entirely written before the unmount request happens. Unfortunately, the "volume-unmounted" signal is emitted before the data is totally written, thus we can't decide how long a progress dialog (proposed in bug 313639) should be shown, leaving it up to the user to decide whether he trusts the computer that all his data is written.
Presumably gnome-vfs-daemon can issue a sync(2) call in a separate thread before emitting the "volume-unmounted" signal? Just put up a dialog if that thread is taking it's time. Btw, if you happen to exercise influence over kernel people you could also ask them for a new syscall that flushes only the buffers for a specific device and gives you progress indication... I wouldn't hold my breath for that one though :-) Another option is to fix the umount(2) syscall so it blocks until caches are flushed but I wouldn't count on getting that either. However, if we get this, things like gnome-umount will be able to display the progress dialog. Note also that upcoming releases of vfat support in Linux will sport the "flush" mount option that tries to flush as quickly as possible though this is not nearly enough...
You know what I dont get? The mount program itself blocks, right? But from the source I don't see it doing something differntly then calling umount (2). Also we call the program in the deamon with g_spwan_sync before emitting the signal. If the cmdline program blocks we would block. Where is my thinko here?
> The mount program itself blocks, right? Incorrect on at least Linux 2.6
I can verify that with Crispin's "Any Drive" USB stick (manufacturer: http://www.airlinktek.com, product code "BEKRUSB") when running pumount shortly after a data transfer was initiated, the command seems to terminate quickly indicating success, but afterwards a diode on the stick denotes that data is still written to the stick. > > The mount program itself blocks, right? > Incorrect on at least Linux 2.6 How does this relate to the "-l" (lazy) flag of (p)unmount? According to man punmount and man pmount it does the following: -l, --lazy Lazy unmount. Detach the filesystem from the filesystem hierar‐ chy now, and cleanup all references to the filesystem as soon as it is not busy anymore. (Requires kernel 2.4.11 or later.) Which to my interpretation is done ATM even without the "-l" flag.
https://lists.ubuntu.com/archives/ubuntu-devel/2006-February/015130.html proposes to invoke sync before unmounting.
In response to comment 4 > How does this relate to the "-l" (lazy) flag of (p)unmount? Lazy unmounting simply allows you to unmount even when processes have open files. The problem at hand is that even when all processes have closed their files the block cache in the kernel still have a bunch of outstanding writes. In response to comment 5 > https://lists.ubuntu.com/archives/ubuntu-devel/2006-February/015130.html > proposes to invoke sync before unmounting. That's exactly what I proposed in comment 1. For most desktop workloads this is probably OK but it could take much longer time if you have a lot of other IO. I guess we should just try this and see how it works.
> > https://lists.ubuntu.com/archives/ubuntu-devel/2006-February/015130.html > > proposes to invoke sync before unmounting. > That's exactly what I proposed in comment 1. For most desktop workloads this is > probably OK but it could take much longer time if you have a lot of other IO. I > guess we should just try this and see how it works. Maybe we should rather use the POSIX fsync() on an FD belonging to a particular volume/drive on unmount instead of invoking the global buffer-flushing sync()?
CCing RML, he knows the kernel very well. A little test application revealed that on kernel 2.6.12 calling fsync or fdatasync on the mount point surrounded by open and close with the matter having O_CTTY | O_DSYNC flags didn't actually ensure that the buffer is completely flushed. Maybe we have to call fsync_bdev, or do we have to flush the device node directly?
s^matter^former^. Sorry for the spam.
I forwarded the issue to the linux kernel mailing list: http://lkml.org/lkml/2006/2/10/231/index.html
It turns out that this should be fixable without modifying the linux kernel, although it to my understanding it doesn't respect the POSIX specs wrt fsync() behavior as of writing. The problem seems to be that while a blocking eject operation is actually started for USB sticks and similar devices, libhal emits the LibHalDevicePropertyModified with "removed" set to TRUE. David, this is your area of expertise, so I'll let you track down the issue here :P
It looks like "eject" first unmounts the volume, causing the server to emit the "volume-unmount" signal, and then tries to eject it. The problem now is that we have to delay the invocation of _gnome_vfs_volume_monitor_unmounted on the client part until the eject command ran so that the signal emission request is not propagated to the server during eject, but afterwards. The ejection itself is controlled in a separate thread, which causes some architectural headache, i.e. we'll have to cook a mutex-protected hash table which contains activation URIs, which are then checked on _gnome_vfs_volume_monitor_unmounted requests against the URI of the volume to determine whether an actual emission will happen. It would really be way more simple if we could just fsync() the device before running "eject".
After many fsync() failures I've now tried using ioctl (fd, BLKFLSBUF, 0); to flush the buffer, and according to fcntl.h these just require read-only permissions, and flush the buffer, but the linux source block/ioctl.c contains code like: case BLKFLSBUF: if (!capable(CAP_SYS_ADMIN)) return -EACCES; preventing the code from running properly except in the sudo case. I've really come to a point where I'm totally clueless on the historic semantics of linux/UNIX permissions. [1] shows an LVM-related patch from 2001 modifying case BLKFLSBUF: /* flush buffer cache */ - if (!capable(CAP_SYS_ADMIN)) return -EACCES; + if (!(file->f_mode & FMODE_WRITE) && !capable(CAP_SYS_ADMIN)) + return -EACCES; in another piece of code. eject uses various special-case commands for the actual ejection, including CDROMEJECT, SG_IO, FDEJECT and MTIOCTOP based on the device to be ejected. [1] https://www.redhat.com/archives/linux-lvm/2000-December/msg00171.html
i really don't understant this bugreport. I do a daily system backup on a USB2 drive. gvm mounts it like the following : /dev/sda1 on /media/usbdisk type reiserfs (rw,nosuid,nodev). When the backup is finished, i umount my drive with drivemount applet. When i do this, i can see that my external harddrive is busy. Unmounting often takes many seconds. When it gets unmounted, multiload-applet's memory graph shows a big gap in buffer/cache, meaning that cache/buffer used by the volume have been discarded. Just after that i un unplug and power down my USB drive. Never got a file corrupted. I get the same behaviour with USB keys (and more generally with every removable writable media).
(In reply to comment #14) > i really don't understant this bugreport. > [..] >Never got a file corrupted. my usb key doesn't have lights showing activity... how do i know when all the data's been flushed? who's going to explain my mother she should wait for the gap in buffer/cache? i think this bug deserves its severity of "major" which is currently assigned to it.
Sorry, i should have added the drive icon disappears only when everything has been synced. I don't have to watched the led. I just meant that i seems that sync-umount sequence is already there.
Benoit, do you mount them with the "async" option?
gvm mount as i showed you. Default is async, no doubt about it. It simple to experiment. Plugin your USB drive, read a file on it, unmount. Plugin your USB drive, write a big file to it and then unmount. Notice that this second unmount took much more time. I have the bad feeling that some people think umount doesn't sync before actually unmounting the filesystem... Maybe there's a problem if the signal is emitted when the umount process starts, but if it's emited after it has ended, we're safe AFAIK. You can't cleanly unmount a dirty filesystem. Is this bugreport about kernel prematurely reporting FS as unmounted ? We all have onced worked with floppy. Read and write operations were somehow fast, and floppy drive was seldom spining during work session. But umount took several seconds (may be 1 minute) because everything was actually written to the floppy before unmont returned. Same goes for USB/FireWire/etc disk i guess. May be i'm stupid and i don't understand what's going on here. But Christian hasn't explained us how he knows that data still has to been written to it's USB disk when the drive is not mounted anymore.
(In reply to comment #16) > Sorry, i should have added the drive icon disappears only when everything has > been synced. I don't have to watched the led. I just meant that i seems that > sync-umount sequence is already there. sorry for the misunderstanding... to make it clear, i didn't experience the problem myself (bought my usb key on friday, barely tested it). glad we agree that if it exists, it must be fixed.
In response to comment 11: > It turns out that this should be fixable without modifying the linux kernel, > although it to my understanding it doesn't respect the POSIX specs wrt fsync() > behavior as of writing. Careful here, there's a distinction between eject and unmounting. The former AFAIK is device dependent (look through the source for eject(1), ugh) and for example on my Apple iPod nano it removes all the block child devices but lets the main block device remain. > The problem seems to be that while a blocking eject > operation is actually started for USB sticks and similar devices, libhal emits > the LibHalDevicePropertyModified with "removed" set to TRUE. I don't think we actually eject anything, we unmount, not eject, the thing and it just so happens that the semantics of unmount don't guarantee that the underlying block caches are flushed. I wonder: what's actually wrong with, just before unmounting, async invoking sync(1) from gnome-vfs-daemon and waiting for this to terminate? Then put up a dialog with a spinner if it takes it's time to terminate.
fwiw, fsync() is to be used on a file that you just wrote data to, you can't just open a filedes to /dev/<blockdevice> and call fsync() on it and expect it to work. Same for fdatasync(). As DavidZ mentioned, there's a new mount option for vfat called "flush" which I've just added support for in g-v-m. As an aside on the whole fsync() idea, if all GNOME applications used fsync() on files they write to before close()ing them, this wouldn't be as big of an issue. Remember: fflush() just flushes to kernel buffers, not necessarily to the storage device like fsync() does.
This issue is also being tracked here https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=194296#c12 I've fixed this with HAL 0.5.8.1 and gnome-mount 0.5 (both to be released tomorrow). First, it turns out umount(8) sorta does the right thing, e.g. it only returns after caches are flushed. Second, HAL was modified to not change the state of a volume to be unmounted before umount(8) returns (and this can take 30 seconds on really slow USB1.1 devices). Btw, this is somewhat wrong as the file system is actually detached almost immediately when umount(8) is invoked, hence HAL recieves an event on it's poll(2) on /proc/mounts and sends out the mount state change on D-BUS. But in another way it's right. Anyway, the net result from the HAL change is that the icon on the desktop remains while caches are flushed. Third, gnome-mount will put up this notification http://people.freedesktop.org/~david/gm-nag4.png with no timeout if Unmount() doesn't return within 750ms. If the dialog was put up, then we switch to this notification http://people.freedesktop.org/~david/gm-nag5.png for a default timeout (5 or 10 secs I think). We never put up dialogs for read-only mounted file systems or volumes from optical drives. So as this is fixed in gnome-mount, which is the default mount helper for gnome-vfs, one could argue this bug can be closed once gnome-vfs starts depending on HAL 0.5.8.1.
gnome-vfs has been deprecated and superseded by gio/gvfs since GNOME 2.22, hence mass-closing many of the gnome-vfs requests/bug reports. This means that gnome-vfs is NOT actively maintained anymore, however patches are still welcome. If your reported issue is still valid for gio/gvfs, please feel free to file a bug report against glib/gio or gvfs. @Bugzilla mail recipients: query for gnome-vfs-mass-close to get rid of these notification emails all together. General further information: http://en.wikipedia.org/wiki/GVFS Reasons behind this decision are listed at http://www.mail-archive.com/gnome-vfs-list@gnome.org/msg00899.html