GNOME Bugzilla – Bug 791875
Rollback specific failed partition change steps
Last modified: 2018-03-19 17:09:00 UTC
As reported in bug 790418 comment 18 GParted/libparted can still encounter errors informing the kernel of the the new partition layout. In such a case the partition has been successfully written to the disk but just informing the kernel failed. This is a problem because when a partition is being moved in advance of a file system move step, failure to inform the kernel leaves the partition boundaries not matching the on disk boundaries of the file system. For a move to the left this leaves the partition reported as unknown, apparently loosing the users data. Example test case: Move a 512 MiB partition 100 MiB to the left. Fail after the first partition change leaves the partition as unknown, loosing the file system no longer at the start of the partition. Example operation results: Move /dev/sdb1 to the left (ERROR) * calibrate /dev/sdb1 (SUCCESS) * check file system on /dev/sdb1 for errors and (if possib...(SUCCESS) * grow partition from 512.00 MiB to 612.00 MiB (ERROR) old start: 1048576 old end: 2097151 old size: 1048576 (512.00 MiB) requested start: 843776 requested end: 2097151 requested size: 1253376 (612.00 MiB) * libparted messages (ERROR) Other cases where the partition boundaries are being updated after a file system move or shrink step then the partition should be updated to match the location of the file system itself. In this case no rollback is wanted, and if the failure was only to inform the kernel then in fact the partition has actually been updated on disk after all. Therefore each partition resize/move step needs to be examined and a decision made on a case by case basis whether rollback should happen or not. Patchset is being worked on for this.
Created attachment 366115 [details] [review] Rollback specific failed partition change steps (v1) Hi Curtis, Here's the patchset for this. It chooses which partition changes are best rolled back on failure and attempts rollback on failure. Naturally rolling back a change to a partition is also a change to the partition. If the first partition change failed informing the kernel just because of a timing issue with udev, i.e. what was happening in bug 790418, then rollback is a good strategy. If the first partition change failed because of a write error then it seems likely that the rollback will fail too. But it won't be any worse that what the code does now with no rollback for any partition change. I tested all the resize / move partition cases. Found one issue, which necessitated patch "Fix rollback when growing a partition by more than twice fails". Testing or any unusual cases you can think of would be useful. Thanks, Mike
Created attachment 366116 [details] [review] Debug patches to optionally fail committing to disk These are the debugging patches I have been using to simulate failure to commit the partition change to the disk. Obviously not for applying upstream. The first simulates complete failure to write to the drive, the second simulates failure to inform the kernel. Mike
Hi Mike, Wow. This looks like it was a lot of work to determine which situations to rollback and then to implement and test the code. Thank you for tackling this improvement. Patch set v1 in comment #1 looks mostly good to me. The only issues I observed were some minor spelling mistakes / typos. Following are changes I suggest to the commit messages (I can make these before applying the commits to master). ------- [P 3/6] Implement rollback of failed partition resize/move steps (#791875) PROBLEM: Extra mismatched double quotes in commit comment. Even after implementing a fix for bug 790418 ""Unable to inform the ^ kernel of the change" message may lead to corrupted partition table" ^ GParted/libparted can still encounter errors informing the kernel of the new partition layout. This has been seen with GParted on CentOS 7 with libparted 3.1. PROBLEM: Spelling mistake "loosing" s/b "losing" partition reported as unknown, apparently loosing the user's data. ^^^^^^^ PROBLEM: Spelling mistake "a error" s/b "an error" It doesn't matter why updating the partition failed, even if it was because of a error writing to the disk. Rollback of the change to the ^^^ ------- [P 4/6] Enable failed partition change rollback for selected steps (#791875) PROBLEM: Spelling mistake "mach" s/b "match" Keep new smaller partition boundaries to mach shrunk file system. ^^^^ ------- [P 5/6] Fix rollback when growing a partition by more than twice fails (#791875) PROBLEM: Spelling mistake "change" s/b "changed" sector in the middle to identify the partition to be change. However ^^^^^^ PROBLEM: Spelling mistake "though" s/b "thought" reported error "Can't have overlapping partitions" because it though ^^^^^^ ------- For testing I checked for regressions in behaviour. More specifically I tested to ensure that the following operations still complete successfully as expected: | Normal HDD | DMRAID (Fake RAID) Operation (ext4) | /dev/sdc | /dev/mapper/isw_jedagafcd_Vol0 ------------------+------------+------------------------------- Create | Success | Success Grow | Success | Success Shrink | Success | Success Move right | Success | Success Move left | Success | Success Grow+Move right | Success | Success Grow+Move left | Success | Success Shrink+Move right | Success | Success Shrink+Move left | Success | Success Delete | Success | Success I also tested some move/grow/shrink operations with FAT16, which uses the internal GParted copy operation, and these completed successfully. If there are no concerns raised, then I will commit patch set v1 in the next few days. Curtis
Created attachment 366153 [details] [review] Rollback specific failed partition change steps (v2) Hi Curtis, Here's patchset v2. Corrects the spelling mistakes. I would normally just let you do this, but the quotes around the bug title in P 3/6 was me being less than obvious. The bug title already had some words quoted and I added a second set around the whole title. Just dropped the inner set set of quotes. The hardest part was testing. All those cases and with the debugging patches added keeping track to correctly select success or failure for all the synthetic exceptions. Thanks, Mike
Thank you Mike for improving the handling of partition change rollback when a partition change fails. Patch set v2 from comment #4 has been committed to the git repository. The relevant git commits can be viewed at the following links: Extract code into resize_move_partition_implement() (#791875) https://git.gnome.org/browse/gparted/commit/?id=890d5a93a72dff8b0a0b5c586345f1699534cdb1 Extract common code into update_dmraid_entry() (#791875) https://git.gnome.org/browse/gparted/commit/?id=93ccc79e3a65f887b6225b368ad55b3fc27eccf3 Implement rollback of failed partition resize/move steps (#791875) https://git.gnome.org/browse/gparted/commit/?id=0f16703bbbae9c7d8d54d46af4a7b3a7e725f17f Enable failed partition change rollback for selected steps (#791875) https://git.gnome.org/browse/gparted/commit/?id=0b5bf83b22a4184afeb7c6b3e08d281211ae6c8b Fix rollback when growing a partition by more than twice fails (#791875) https://git.gnome.org/browse/gparted/commit/?id=f54dd1070720e3db8ad90e259e7163b78b05d48e Rename function and reword text for rollback of failed file system move https://git.gnome.org/browse/gparted/commit/?id=b04dbbc357d2844cd28de9eb6ab63e209ee79e74
This enhancement was included in the GParted 0.31.0 release on March 19, 2018.