GNOME Bugzilla – Bug 708351
corrupted /boot as of recent sysroot refactoring
Last modified: 2013-09-20 15:29:45 UTC
I was running ostree admin deploy ... this morning, when the VM turned black for some reason, so I killed it, thinking that it was safe because all upgrades are atomic in ostree. It rebooted fine into the working system, but attempting to deploy again was failing (error when getting information on /ostree/boot.0/gnome-ostree/.../0: No such file or directory). After some investigation, I found that the boot loader entry had been created in /boot/loader, and so was /boot/ostree/gnome-ostree-.../, but /ostree/boot.0/gnome-ostree/... wasn't there. /ostree/boot.1 did not exist. PS: /boot/loader only contains ostree entries (regular files, atomic rename possible) or syslinux.cfg (again, regular file, atomic rename), so why do you need the swapped symlink? Similarly for /ostree/boot.[01], why is it needed? Wouldn't you achieve the same by having the bootloader entries link directly in the /ostree/deploy/osname directory?
I think this is a regression from the recent sysroot work I did, one of: https://git.gnome.org/browse/ostree/commit/?id=95f07d486adcb6bb12d8da0016066d8d8db205d6 https://git.gnome.org/browse/ostree/commit/?id=8f1ea1b50aad70f671b8a2902f4b68771cce8682 or similar. I saw this once myself too =/ But I didn't have time to debug at that moment, so I just rmeoved the offending .conf file and things seemed fine after that. But now that you've hit it, the question is: Is this a transient issue that happens once? Or what triggers it? I've done several upgrades since then myself and things appear to be fine. As far as why the extra levels of indirection in /ostree/boot: it's to avoid touching /boot unless the kernel changes. Think of scenarios where /boot is on NVRAM; we don't want to be modifying it constantly just to swap checksums.
(In reply to comment #1) > As far as why the extra levels of indirection in /ostree/boot: it's to avoid > touching /boot unless the kernel changes. Think of scenarios where /boot is on > NVRAM; we don't want to be modifying it constantly just to swap checksums. That makes sense, but for some reason the checksums in /boot are constantly changing when I redeploy, even if I redploy stuff that did not rebuild the yocto base. I guess the reason might be the initramfs, which is regenerated at the end of each tree compose, and probably changes (timestamps? build-ids of non yocto stuff?) Also, it doesn't answer my question on /boot/loader.
(In reply to comment #2) > That makes sense, but for some reason the checksums in /boot are constantly > changing when I redeploy, even if I redploy stuff that did not rebuild the > yocto base. Ok so this is fairly subtle, but: if you change the total number of deployments, then we have to update the bootloader config. Now OSTree is very flexible, but in the default configuration if you just use "ostree admin upgrade" you will have 2 deployments. Doing an upgrade will garbage collect the older one and deploy the new one, so if the initramfs hasn't changed we shouldn't touch the bootloader config in that case. > I guess the reason might be the initramfs, which is regenerated at the end of > each tree compose, and probably changes (timestamps? build-ids of non yocto > stuff?) See https://bugzilla.gnome.org/show_bug.cgi?id=696283 But as I mentioned in the bug only things which have "initramfs-depends: true" should trigger it to be rebuilt. I did notice something was off with the caching a while ago but doing "rm cache/raw/initramfs -rf" fixed it. > Also, it doesn't answer my question on /boot/loader. I want to be able to swap the whole bootloader config at once...it's just an easy way to do that.
Ok, I just hit this again, so it's not fixed for sure.
Created attachment 255358 [details] postmorterm boot directory
Created attachment 255359 [details] postmortem ostree boot
Created attachment 255360 [details] transcript
Ok I may have found it, going to sleep on this diff: diff --git a/src/libostree/ostree-sysroot-deploy.c b/src/libostree/ostree-sysroot-deploy.c index e0232dd..1ea984c 100644 --- a/src/libostree/ostree-sysroot-deploy.c +++ b/src/libostree/ostree-sysroot-deploy.c @@ -985,7 +985,7 @@ bootcsum_counts_for_deployment_list (GPtrArray *deployments) if (!g_hash_table_lookup_extended (ret, bootcsum, &orig_key, &countp)) { - g_hash_table_insert (ret, (char*)bootcsum, GUINT_TO_POINTER (0)); + g_hash_table_insert (ret, (char*)bootcsum, GUINT_TO_POINTER (1)); } else { Basically if we're doing an upgrade and we have 1 deployment kernel shared between both lists and one different (which is in fact the expected normal case), g_hash_table_lookup returns 0 for both because I'm stupid and put that in where it should be 1.
Created attachment 255386 [details] [review] deploy: Correctly swap bootloader version with new boot checksums If we had two deployments with different boot checksums, and were trying to remove the one that was the same and add a new one (the normal case), we'd end up assuming due to comparison with 0 that we only needed to do the fast subbootversion swap. Fix this by actually putting 1 where we really mean 1. And update the tests to verify the fix; I have double-verified by undoing the fix, and noting that the test fails.
Created attachment 255412 [details] [review] deploy: Correctly swap bootloader version with new boot checksums Simplify the code.
Review of attachment 255412 [details] [review]: Looks correct to me.
Attachment 255412 [details] pushed as 298625d - deploy: Correctly swap bootloader version with new boot checksums