GNOME Bugzilla – Bug 762941
Operations sometimes failing with: No such file or directory
Last modified: 2016-08-18 10:42:25 UTC
I have seen a number of occurrences of file system specific commands failing with error "No such file or directory" when accessing the /dev/PTN device file. Seen it with recent distros Fedora 23 and CentOS 7 inside VirtualBox VMs. Here is an example of a check operation failing: Check and repair file system (ext4) on dev/sdc2 calibrate /dev/sdc2 path: /dev/sdc2 (partition) start: 4196352 end: 6293503 size: 2097152 (1.00 GiB) check file system on /dev/sdc2 for errors and (if possible) fix them e2fsck -f -y -bv -C 0 /dev/sdc2 e2fsck 1.42.9 (28-Dec-2013) e2fsck: No such file or directory while trying to open /dev/sdc2 Possibly non-existent device?
This bug report in Red Hat bugzilla is reporting the same same thing but from ntfsresize as part of an NTFS resize operation. 1296479 - NTFS Resize failes from LiveCD https://bugzilla.redhat.com/show_bug.cgi?id=1296479 Shrink /dev/sda4 from 908.31 GiB to 810.66 GiB 00:00:01 ( ERROR ) calibrate /dev/sda4 00:00:00 ( SUCCESS ) path: /dev/sda4 (partition) start: 1615872 end: 1906485504 size: 1904869633 (908.31 GiB) check file system on /dev/sda4 for errors and (if possible) fix them 00:00:01 ( ERROR ) ntfsresize -i -f -v /dev/sda4 ntfsresize v2015.3.14 (libntfs-3g) ERROR(2): Failed to check '/dev/sda4' mount state: No such file or directory Probably /etc/mtab is missing. It's too risky to continue. You might try an another Linux distro.
This message from parted commit after parted 3.2 release is informative. http://git.savannah.gnu.org/cgit/parted.git/commit/?id=db1728e6402a27fe64e8e132f810c22160ab8bcc commit db1728e6402a27fe64e8e132f810c22160ab8bcc Author: Brian C. Lane <bcl@redhat.com> Date: Fri Aug 7 11:43:17 2015 -0700 tests: Use wait_for_dev_to_ functions Recent changes to udev have made some long-standing problems appear more frequently. udev executes various actions when changes are made to devices. Sometimes this can result in device nodes not appearing immediately. Other times it can result in EBUSY being returned. This patch only addresses devices that are slow to appear/disappear. It is best to use the wait_for_dev_to_appear_ and wait_for_dev_to_disappear_ functions than to test for existance. These will loop and wait for up to 2 seconds for it to appear.
I find it most easy to reproduce on my work's laptop with CentOS 7 VirtualBox VM. Just queue up a sequence of repeating check and move operations on a partition. Here's the above case where it failed on the first check operation. Check and repair file system (ext4) on dev/sdc2 calibrate /dev/sdc2 path: /dev/sdc2 (partition) start: 4196352 end: 6293503 size: 2097152 (1.00 GiB) check file system on /dev/sdc2 for errors and (if possible) fix them e2fsck -f -y -bv -C 0 /dev/sdc2 e2fsck 1.42.9 (28-Dec-2013) e2fsck: No such file or directory while trying to open /dev/sdc2 Possibly non-existent device? This is using libparted 3.1 on CentOS 7 VirtualBox VM and debugging added to GParted. Running "udevadm monitor" and GParted both writing to the same terminal, I captured this sequence: # udevadm monitor & # .../gpartedbin ... 87.334903 +18.508899 calibrate_partition() calling get_device("/dev/sdc", lp_device) ... 87.338625 +0.003722 calibrate_partition() get_device() returned 87.338650 +0.000025 calibrate_partition() calling get_disk(lp_device, lp_disk) ... 87.364316 +0.025666 calibrate_partition() get_disk() returned 87.364637 +0.000322 destroy_device_and_disk() calling ped_disk_destroy(lp_disk) ... 87.364659 +0.000022 destroy_device_and_disk() ped_disk_destroy() returned 87.364667 +0.000008 destroy_device_and_disk() calling ped_device_destroy(lp_disk) ... 87.364679 +0.000012 destroy_device_and_disk() ped_device_destroy() returned 87.364843 +0.000164 execute_command() e2fsck -f -y -v -C 0 /dev/sdc2 KERNEL[33964.473301] remove /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc1 (block) KERNEL[33964.473531] remove /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc2 (block) KERNEL[33964.473683] remove /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc3 (block) KERNEL[33964.477476] change /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc (block) KERNEL[33964.477886] add /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc1 (block) KERNEL[33964.478122] add /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc2 (block) KERNEL[33964.482547] add /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc3 (block) 87.379364 +0.014521 execute_command() exit status 8 UDEV [33964.492595] remove /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc3 (block) UDEV [33964.494841] remove /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc2 (block) UDEV [33964.517200] remove /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc1 (block) UDEV [33964.571198] change /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc (block) UDEV [33964.598390] add /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc3 (block) UDEV [33964.600225] add /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc1 (block) UDEV [33964.601571] add /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc2 (block) So udev is removing and adding devices just from GParted calling either of these libparted calls: * ped_device_get() - get a device by name * ped_disk_new() - read partition table from device and is still removing and adding /dev entries when e2fsck is run and finished.
Hi Mike, I just saw a commit related to this issue in the parted-devel mailing list. [parted-devel] [PATCH] tests: Add udevadm settle to wait_for_ loop (#1260664) https://lists.alioth.debian.org/pipermail/parted-devel/2016-March/004806.html Email message: Sometimes the device will vanish after the wait_for_dev_to_appear exits. Add udevadm settle in an attempt to make sure the udev system is done flapping around and the device will stay in place. Related: rhbz#1260664 Curtis
Hi Curtis, That's another fix to the parted test suite to handle the fact that udev takes time and is asynchronous with respect to the kernel call returning after libparted informed the kernel of the partition changes libparted had just written to disk. I'm hoping this parted patch (post 3.2) will fix the issue for GParted calibrate needing only read-only access to partition information but leading to udev deleting and re-adding disk and partition device entries. http://git.savannah.gnu.org/cgit/parted.git/commit/?id=44d5ae0115c4ecfe3158748309e9912c5aede92d commit 44d5ae0115c4ecfe3158748309e9912c5aede92d Author: Brian C. Lane <bcl@redhat.com> Date: Thu Aug 6 07:17:14 2015 -0700 libparted: Use read only when probing devices on linux (#1245144) When a device is opened for RW closing it can trigger other actions, like udev scanning it for partition changes. Use read only for the init_* methods and RW for actual changes to the device. ... All of these changes are self-contained with no external API changes. The only visible change in behavior is that when a new PedDevice is created the device is opened in RO_MODE instead of RW_MODE. Resolves: rhbz#1245144 However that won't fix the issue for users running GParted on their desktop distros with libparted 3.2 and less with recent enough udev to encounter this problem. That suggests adding a call to udevsettle at the end of calibrate so that doesn't return before udev has re-created everything. Thanks, Mike
(In reply to Mike Fleetwood from comment #5) > ...<snip> > However that won't fix the issue for users running GParted on their > desktop distros with libparted 3.2 and less with recent enough udev to > encounter this problem. That suggests adding a call to udevsettle at > the end of calibrate so that doesn't return before udev has re-created > everything. Agreed. If adding a call to udevsettle at the end of calibrate will resolve the issue then I am all for it. Curtis
Hi Curtis, For the last couple of days on both Fedora 23 and CentOS 7 VMs I have been completely unable to reproduce this bug any more. Went from often failing on a single check command and stringing together a dozen operations was impossible, to not being able to reproduce this at all! I can't write a fix if I can't find a way to reproduce this. Mike
That certainly makes it a challenge to confirm if a fix works. I wonder what changed? If the issue is that there was an insufficient pause after the calibrate step, then I'd be okay if we added the udevsettle after calibrate. At worst this would slow down GParted every so slightly. At best it would resolve the issue. Curtis
After downgrading systemd (and udev) to a the previous package (systemd-219-19.el7_2.4.x86_64.rpm -> systemd-219-19.el7.x86_64.rpm) it turned out that a simple reboot of the VM guest allowed me to reproduce this again in my CentOS 7 guest on my work's laptop. Reduced the code to this libparted test case: /* gcc -o c-test-0018 c-test-0018.c -lparted */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <parted/parted.h> int main(void) { PedDevice * lp_device = ped_device_get("/dev/sdc"); PedDisk * lp_disk = ped_disk_new(lp_device); ped_disk_destroy(lp_disk); ped_device_destroy(lp_device); char * const cmd[] = {"e2fsck", "-f", "-y", "-v", "-C", "0", "/dev/sdc3", NULL}; execvp("e2fsck", cmd); return EXIT_SUCCESS; } After debugging and tracing what libparted is doing I reduced the code to this Unix only test: /* gcc -o c-test-0019 c-test-0019.c */ #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> int main(void) { int fd = open("/dev/sdc", O_RDWR); close(fd); char * const cmd[] = {"e2fsck", "-f", "-y", "-v", "-C", "0", "/dev/sdc3", NULL}; execvp("e2fsck", cmd); return EXIT_SUCCESS; } Both of these test cases (libparted test when using CentOS 7's libparted 3.1) cause udev to remove and readd device entries for all the partitions on the drive, as reported by "udevadm monitor" and matching the example in comment #3 above. Also e2fsck is sometimes failing like this when it happens to run just when the sdc3 device entry is missing: e2fsck 1.42.9 (28-Dec-2013) e2fsck: No such file or directory while trying to open /dev/sdc3 Possibly non-existent device? Now back on track with trying to fully understand what and why so I can implement the right fix for GParted. Mike
Posted this question to the system-devel email list, hoping to get insight into when udev changed and why so that I might implement a better workaround, limit it to only when needed and document it in detail. Subject: udev removing and re-adding partition /dev entries after read-write open close of whole disk device https://lists.freedesktop.org/archives/systemd-devel/2016-March/036046.html
Created attachment 326290 [details] [review] Wait for udev to recreate /dev/PTN entries when calibrating (v1) Hi Curtis, Finally here's the patchset to fix this. Passed testing on CentOS 5, 6, 7 and Fedora 24 (alpha). Fixes the issue on CentOS 7 and continues to work correctly on the other distros. After this patch is done I think it's time for another GParted release. Thanks, Mike
Hi Mike, Thank you for the patch to address this timing of device entry creation issue. On the topic of another release, I agree it's time. I was holding off in case you were able to create and confirm a fix for this issue. I will see about starting the release process soon, and also ask if Steven can include a patch or two for libparted. I have successfully run an operation or two in GParted on the following distros: debian 7 debian 8 fedora 23 kubuntu 12.04 openSUSE 13.2 ubuntu 14.04 ubuntu 15.10 Since I did not find any regression issues, patch set (v1) from comment #11 has been committed to the git repository. The relevant git commits can be viewed at the following links: Wait for udev to recreate /dev/PTN entries when calibrating (#762941) https://git.gnome.org/browse/gparted/commit/?id=fd9013d5f6971e9282f019903d6e148e367718bf Add symbolic constants SETTLE_DEVICE_*_MAX_WAIT_SECONDS https://git.gnome.org/browse/gparted/commit/?id=94979a3805b36e630a7f0e58343d282f3360fd2a Curtis
This enhancement was included in the GParted 0.26.0 release on April 26, 2016.
*** Bug 767153 has been marked as a duplicate of this bug. ***
*** Bug 770051 has been marked as a duplicate of this bug. ***