After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 754913 - Race condition in deploy results in unbootable system
Race condition in deploy results in unbootable system
Status: RESOLVED INVALID
Product: ostree
Classification: Infrastructure
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: OSTree maintainer(s)
OSTree maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2015-09-11 23:56 UTC by John Hiesey
Modified: 2015-09-23 23:18 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch to fix deployment race (1.89 KB, patch)
2015-09-11 23:57 UTC, John Hiesey
reviewed Details | Review

Description John Hiesey 2015-09-11 23:56:53 UTC
I ran into a race condition in the field that resulted in an unbootable system despite OSTree's goal of preventing this from happening.

I analyzed the file system state on the device in question and came up with this scenario for what happened when deploying:

1. Bootlinks are swapped and synced
2. Bootloader symlink is swapped, but NOT written to disk
3. Old bootlinks are cleaned up by ostree_sysroot_cleanup(), and this
   deletion is written to disk
4. System crashes

After step (3), the bootloader symlink is still pointing to the old
bootlink, but that bootlink has been removed from the disk. On the
next boot, the bootloader will look for the nonexistant bootlink and
fail.
Comment 1 John Hiesey 2015-09-11 23:57:57 UTC
Created attachment 311193 [details] [review]
Patch to fix deployment race
Comment 2 Colin Walters 2015-09-14 12:44:12 UTC
Review of attachment 311193 [details] [review]:

Thanks for the excellent bug report, and sorry about that =/  Our testing
story for this needs to improve, and I might be able to look at that soon.
Something like having libostree support waiting at each step in the critical
sections and poweroff qemu at each one.

What version of OSTree was this machine running?  And which filesystem for / ?  I believe this would be fixed by:

https://git.gnome.org/browse/ostree/commit/?id=76a976817fb22c419557c6e010638b7ed2c2549b

But I'll investigate some more.
Comment 3 Jasper St. Pierre (not reading bugmail) 2015-09-14 16:53:52 UTC
It would have been ostree 2013.6, so it was indeed likely fixed through that patch there.
Comment 4 Jasper St. Pierre (not reading bugmail) 2015-09-14 20:29:22 UTC
Oh, and / and /boot are ext4 in our products.
Comment 5 John Hiesey 2015-09-23 23:16:05 UTC
Yep, it was observed in ostree 2013.6, and I agree that the patch Colin linked to should have fixed it.
Comment 6 John Hiesey 2015-09-23 23:18:01 UTC
Since this should be fixed I'll close the bug.