Bug 762941 – Operations sometimes failing with: No such file or directory

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 762941 - Operations sometimes failing with: No such file or directory


Summary:	Operations sometimes failing with: No such file or directory


Status:	RESOLVED FIXED

Product:	gparted
Classification:	Other
Component:	application
Version:	GIT HEAD
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Mike Fleetwood
QA Contact:	gparted maintainers alias

URL:
Whiteboard:

Duplicates:	767153 770051 (view as bug list)
Depends on:
Blocks:

Reported:	2016-03-01 18:56 UTC by Mike Fleetwood
Modified:	2016-08-18 10:42 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Wait for udev to recreate /dev/PTN entries when calibrating (v1) (9.66 KB, patch) 2016-04-18 18:33 UTC, Mike Fleetwood	none	Details \| Review

Description Mike Fleetwood 2016-03-01 18:56:11 UTC

I have seen a number of occurrences of file system specific commands
failing with error "No such file or directory" when accessing the
/dev/PTN device file.  Seen it with recent distros Fedora 23 and
CentOS 7 inside VirtualBox VMs.


Here is an example of a check operation failing:

  Check and repair file system (ext4) on dev/sdc2
    calibrate /dev/sdc2
      path: /dev/sdc2 (partition)
      start: 4196352
      end: 6293503
      size: 2097152 (1.00 GiB)
    check file system on /dev/sdc2 for errors and (if possible) fix them
      e2fsck -f -y -bv -C 0 /dev/sdc2
        e2fsck 1.42.9 (28-Dec-2013)
        e2fsck: No such file or directory while trying to open /dev/sdc2
        Possibly non-existent device?

Comment 1 Mike Fleetwood 2016-03-01 18:59:03 UTC

This bug report in Red Hat bugzilla is reporting the same same thing but
from ntfsresize as part of an NTFS resize operation.

  1296479 - NTFS Resize failes from LiveCD
  https://bugzilla.redhat.com/show_bug.cgi?id=1296479

  Shrink /dev/sda4 from 908.31 GiB to 810.66 GiB  00:00:01    ( ERROR )

    calibrate /dev/sda4  00:00:00    ( SUCCESS )

      path: /dev/sda4 (partition)
      start: 1615872
      end: 1906485504
      size: 1904869633 (908.31 GiB)

    check file system on /dev/sda4 for errors and (if possible) fix them  00:00:01    ( ERROR )

      ntfsresize -i -f -v /dev/sda4

        ntfsresize v2015.3.14 (libntfs-3g)
        ERROR(2): Failed to check '/dev/sda4' mount state: No such file or directory
        Probably /etc/mtab is missing. It's too risky to continue. You might try
        an another Linux distro.

Comment 2 Mike Fleetwood 2016-03-01 19:00:07 UTC

This message from parted commit after parted 3.2 release is
informative.

http://git.savannah.gnu.org/cgit/parted.git/commit/?id=db1728e6402a27fe64e8e132f810c22160ab8bcc
  commit db1728e6402a27fe64e8e132f810c22160ab8bcc
  Author: Brian C. Lane <bcl@redhat.com>
  Date:   Fri Aug 7 11:43:17 2015 -0700

      tests: Use wait_for_dev_to_ functions

      Recent changes to udev have made some long-standing problems appear more
      frequently. udev executes various actions when changes are made to
      devices. Sometimes this can result in device nodes not appearing
      immediately. Other times it can result in EBUSY being returned. This
      patch only addresses devices that are slow to appear/disappear.

      It is best to use the wait_for_dev_to_appear_ and
      wait_for_dev_to_disappear_ functions than to test for existance. These
      will loop and wait for up to 2 seconds for it to appear.

Comment 3 Mike Fleetwood 2016-03-01 19:02:57 UTC

I find it most easy to reproduce on my work's laptop with CentOS 7
VirtualBox VM.  Just queue up a sequence of repeating check and move
operations on a partition.

Here's the above case where it failed on the first check operation.

  Check and repair file system (ext4) on dev/sdc2
    calibrate /dev/sdc2
      path: /dev/sdc2 (partition)
      start: 4196352
      end: 6293503
      size: 2097152 (1.00 GiB)
    check file system on /dev/sdc2 for errors and (if possible) fix them
      e2fsck -f -y -bv -C 0 /dev/sdc2
        e2fsck 1.42.9 (28-Dec-2013)
        e2fsck: No such file or directory while trying to open /dev/sdc2
        Possibly non-existent device?

This is using libparted 3.1 on CentOS 7 VirtualBox VM and debugging
added to GParted.  Running "udevadm monitor" and GParted both writing
to the same terminal, I captured this sequence:

# udevadm monitor &
# .../gpartedbin
...
 87.334903 +18.508899 calibrate_partition()          calling get_device("/dev/sdc", lp_device) ...
 87.338625 +0.003722 calibrate_partition()          get_device() returned
 87.338650 +0.000025 calibrate_partition()          calling get_disk(lp_device, lp_disk) ...
 87.364316 +0.025666 calibrate_partition()          get_disk() returned
 87.364637 +0.000322 destroy_device_and_disk()      calling ped_disk_destroy(lp_disk) ...
 87.364659 +0.000022 destroy_device_and_disk()      ped_disk_destroy() returned
 87.364667 +0.000008 destroy_device_and_disk()      calling ped_device_destroy(lp_disk) ...
 87.364679 +0.000012 destroy_device_and_disk()      ped_device_destroy() returned
 87.364843 +0.000164 execute_command()              e2fsck -f -y -v -C 0 /dev/sdc2
KERNEL[33964.473301] remove   /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc1 (block)
KERNEL[33964.473531] remove   /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc2 (block)
KERNEL[33964.473683] remove   /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc3 (block)
KERNEL[33964.477476] change   /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc (block)
KERNEL[33964.477886] add      /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc1 (block)
KERNEL[33964.478122] add      /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc2 (block)
KERNEL[33964.482547] add      /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc3 (block)
 87.379364 +0.014521 execute_command()              exit status 8
UDEV  [33964.492595] remove   /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc3 (block)
UDEV  [33964.494841] remove   /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc2 (block)
UDEV  [33964.517200] remove   /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc1 (block)
UDEV  [33964.571198] change   /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc (block)
UDEV  [33964.598390] add      /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc3 (block)
UDEV  [33964.600225] add      /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc1 (block)
UDEV  [33964.601571] add      /devices/pci0000:00/0000:00:0d.0/ata5/host4/target4:0:0/4:0:0:0/block/sdc/sdc2 (block)

So udev is removing and adding devices just from GParted calling either
of these libparted calls:
*  ped_device_get() - get a device by name
*  ped_disk_new()   - read partition table from device
and is still removing and adding /dev entries when e2fsck is run and
finished.

Comment 4 Curtis Gedak 2016-03-02 01:56:17 UTC

Hi Mike,

I just saw a commit related to this issue in the parted-devel mailing list.

[parted-devel] [PATCH] tests: Add udevadm settle to wait_for_ loop (#1260664)
https://lists.alioth.debian.org/pipermail/parted-devel/2016-March/004806.html

Email message:

Sometimes the device will vanish after the wait_for_dev_to_appear exits.
Add udevadm settle in an attempt to make sure the udev system is done
flapping around and the device will stay in place.

Related: rhbz#1260664

Curtis

Comment 5 Mike Fleetwood 2016-03-06 12:27:07 UTC

Hi Curtis,

That's another fix to the parted test suite to handle the fact that udev
takes time and is asynchronous with respect to the kernel call returning
after libparted informed the kernel of the partition changes libparted
had just written to disk.

I'm hoping this parted patch (post 3.2) will fix the issue for GParted
calibrate needing only read-only access to partition information but
leading to udev deleting and re-adding disk and partition device
entries.

http://git.savannah.gnu.org/cgit/parted.git/commit/?id=44d5ae0115c4ecfe3158748309e9912c5aede92d
  commit 44d5ae0115c4ecfe3158748309e9912c5aede92d
  Author: Brian C. Lane <bcl@redhat.com>
  Date:   Thu Aug 6 07:17:14 2015 -0700

      libparted: Use read only when probing devices on linux (#1245144)

    When a device is opened for RW closing it can trigger other actions,
    like udev scanning it for partition changes. Use read only for the
    init_* methods and RW for actual changes to the device.
    ...
    All of these changes are self-contained with no external API changes.
    The only visible change in behavior is that when a new PedDevice is
    created the device is opened in RO_MODE instead of RW_MODE.

    Resolves: rhbz#1245144

However that won't fix the issue for users running GParted on their
desktop distros with libparted 3.2 and less with recent enough udev to
encounter this problem.  That suggests adding a call to udevsettle at
the end of calibrate so that doesn't return before udev has re-created
everything.

Thanks,
Mike

Comment 6 Curtis Gedak 2016-03-07 17:00:16 UTC

(In reply to Mike Fleetwood from comment #5)
> ...<snip>
> However that won't fix the issue for users running GParted on their
> desktop distros with libparted 3.2 and less with recent enough udev to
> encounter this problem.  That suggests adding a call to udevsettle at
> the end of calibrate so that doesn't return before udev has re-created
> everything.

Agreed.

If adding a call to udevsettle at the end of calibrate will resolve the issue then I am all for it.

Curtis

Comment 7 Mike Fleetwood 2016-03-08 21:25:01 UTC

Hi Curtis,

For the last couple of days on both Fedora 23 and CentOS 7 VMs I have
been completely unable to reproduce this bug any more.  Went from often
failing on a single check command and stringing together a dozen
operations was impossible, to not being able to reproduce this at all!

I can't write a fix if I can't find a way to reproduce this.

Mike

Comment 8 Curtis Gedak 2016-03-08 23:26:24 UTC

That certainly makes it a challenge to confirm if a fix works.  I wonder what changed?

If the issue is that there was an insufficient pause after the calibrate step, then I'd be okay if we added the udevsettle after calibrate.

At worst this would slow down GParted every so slightly.  At best it would resolve the issue.

Curtis

Comment 9 Mike Fleetwood 2016-03-14 21:02:35 UTC

After downgrading systemd (and udev) to a the previous package
(systemd-219-19.el7_2.4.x86_64.rpm -> systemd-219-19.el7.x86_64.rpm) it
turned out that a simple reboot of the VM guest allowed me to reproduce
this again in my CentOS 7 guest on my work's laptop.


Reduced the code to this libparted test case:

/* gcc -o c-test-0018 c-test-0018.c -lparted */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <parted/parted.h>

int main(void)
{
	PedDevice * lp_device = ped_device_get("/dev/sdc");
	PedDisk * lp_disk = ped_disk_new(lp_device);
	ped_disk_destroy(lp_disk);
	ped_device_destroy(lp_device);

	char * const cmd[] = {"e2fsck", "-f", "-y", "-v", "-C", "0", "/dev/sdc3", NULL};
	execvp("e2fsck", cmd);
	return EXIT_SUCCESS;
}


After debugging and tracing what libparted is doing I reduced the code
to this Unix only test:

/* gcc -o c-test-0019 c-test-0019.c */

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(void)
{
	int fd = open("/dev/sdc", O_RDWR);
	close(fd);

	char * const cmd[] = {"e2fsck", "-f", "-y", "-v", "-C", "0", "/dev/sdc3", NULL};
	execvp("e2fsck", cmd);
	return EXIT_SUCCESS;
}


Both of these test cases (libparted test when using CentOS 7's libparted
3.1) cause udev to remove and readd device entries for all the
partitions on the drive, as reported by "udevadm monitor" and matching
the example in comment #3 above.  Also e2fsck is sometimes failing like
this when it happens to run just when the sdc3 device entry is missing:
    e2fsck 1.42.9 (28-Dec-2013)
    e2fsck: No such file or directory while trying to open /dev/sdc3
    Possibly non-existent device?


Now back on track with trying to fully understand what and why so I can
implement the right fix for GParted.

Mike

Comment 10 Mike Fleetwood 2016-03-20 19:48:05 UTC

Posted this question to the system-devel email list, hoping to get
insight into when udev changed and why so that I might implement a
better workaround, limit it to only when needed and document it in
detail.

Subject: udev removing and re-adding partition /dev entries after read-write open close of whole disk device
https://lists.freedesktop.org/archives/systemd-devel/2016-March/036046.html

Comment 11 Mike Fleetwood 2016-04-18 18:33:34 UTC

Created attachment 326290 [details] [review]
Wait for udev to recreate /dev/PTN entries when calibrating (v1)

Hi Curtis,

Finally here's the patchset to fix this.

Passed testing on CentOS 5, 6, 7 and  Fedora 24 (alpha).
Fixes the issue on CentOS 7 and continues to work correctly on the other
distros.

After this patch is done I think it's time for another GParted release.

Thanks,
Mike

Comment 12 Curtis Gedak 2016-04-18 20:28:30 UTC

Hi Mike,

Thank you for the patch to address this timing of device entry
creation issue.

  On the topic of another release, I agree it's time.  I was holding
  off in case you were able to create and confirm a fix for this
  issue.  I will see about starting the release process soon, and also
  ask if Steven can include a patch or two for libparted.

I have successfully run an operation or two in GParted on the
following distros:

    debian  7
    debian  8
    fedora 23
   kubuntu 12.04
  openSUSE 13.2
    ubuntu 14.04
    ubuntu 15.10

Since I did not find any regression issues, patch set (v1) from
comment #11 has been committed to the git repository.

The relevant git commits can be viewed at the following links:

Wait for udev to recreate /dev/PTN entries when calibrating (#762941)
https://git.gnome.org/browse/gparted/commit/?id=fd9013d5f6971e9282f019903d6e148e367718bf

Add symbolic constants SETTLE_DEVICE_*_MAX_WAIT_SECONDS
https://git.gnome.org/browse/gparted/commit/?id=94979a3805b36e630a7f0e58343d282f3360fd2a

Curtis

Comment 13 Curtis Gedak 2016-04-26 15:57:22 UTC

This enhancement was included in the GParted 0.26.0 release on April 26, 2016.

Comment 14 Curtis Gedak 2016-06-02 15:15:21 UTC

*** Bug 767153 has been marked as a duplicate of this bug. ***

Comment 15 Mike Fleetwood 2016-08-18 10:42:25 UTC

*** Bug 770051 has been marked as a duplicate of this bug. ***