GNOME Bugzilla – Bug 754913
Race condition in deploy results in unbootable system
Last modified: 2015-09-23 23:18:01 UTC
I ran into a race condition in the field that resulted in an unbootable system despite OSTree's goal of preventing this from happening. I analyzed the file system state on the device in question and came up with this scenario for what happened when deploying: 1. Bootlinks are swapped and synced 2. Bootloader symlink is swapped, but NOT written to disk 3. Old bootlinks are cleaned up by ostree_sysroot_cleanup(), and this deletion is written to disk 4. System crashes After step (3), the bootloader symlink is still pointing to the old bootlink, but that bootlink has been removed from the disk. On the next boot, the bootloader will look for the nonexistant bootlink and fail.
Created attachment 311193 [details] [review] Patch to fix deployment race
Review of attachment 311193 [details] [review]: Thanks for the excellent bug report, and sorry about that =/ Our testing story for this needs to improve, and I might be able to look at that soon. Something like having libostree support waiting at each step in the critical sections and poweroff qemu at each one. What version of OSTree was this machine running? And which filesystem for / ? I believe this would be fixed by: https://git.gnome.org/browse/ostree/commit/?id=76a976817fb22c419557c6e010638b7ed2c2549b But I'll investigate some more.
It would have been ostree 2013.6, so it was indeed likely fixed through that patch there.
Oh, and / and /boot are ext4 in our products.
Yep, it was observed in ostree 2013.6, and I agree that the patch Colin linked to should have fixed it.
Since this should be fixed I'll close the bug.