After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 708351 - corrupted /boot as of recent sysroot refactoring
corrupted /boot as of recent sysroot refactoring
Status: RESOLVED FIXED
Product: ostree
Classification: Infrastructure
Component: general
unspecified
Other Linux
: High critical
: ---
Assigned To: OSTree maintainer(s)
OSTree maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2013-09-19 09:00 UTC by Giovanni Campagna
Modified: 2013-09-20 15:29 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
postmorterm boot directory (746 bytes, application/x-compressed-tar)
2013-09-20 02:03 UTC, Colin Walters
  Details
postmortem ostree boot (669 bytes, application/x-compressed-tar)
2013-09-20 02:03 UTC, Colin Walters
  Details
transcript (1.88 KB, text/plain)
2013-09-20 02:04 UTC, Colin Walters
  Details
deploy: Correctly swap bootloader version with new boot checksums (3.74 KB, patch)
2013-09-20 12:17 UTC, Colin Walters
none Details | Review
deploy: Correctly swap bootloader version with new boot checksums (4.21 KB, patch)
2013-09-20 15:23 UTC, Colin Walters
committed Details | Review

Description Giovanni Campagna 2013-09-19 09:00:42 UTC
I was running ostree admin deploy ... this morning, when the VM turned black for some reason, so I killed it, thinking that it was safe because all upgrades are atomic in ostree.

It rebooted fine into the working system, but attempting to deploy again was failing (error when getting information on /ostree/boot.0/gnome-ostree/.../0: No such file or directory).
After some investigation, I found that the boot loader entry had been created in /boot/loader, and so was /boot/ostree/gnome-ostree-.../, but /ostree/boot.0/gnome-ostree/... wasn't there.
/ostree/boot.1 did not exist.

PS: /boot/loader only contains ostree entries (regular files, atomic rename possible) or syslinux.cfg (again, regular file, atomic rename), so why do you need the swapped symlink?
Similarly for /ostree/boot.[01], why is it needed? Wouldn't you achieve the same by having the bootloader entries link directly in the /ostree/deploy/osname directory?
Comment 1 Colin Walters 2013-09-19 13:02:20 UTC
I think this is a regression from the recent sysroot work I did, one of:

https://git.gnome.org/browse/ostree/commit/?id=95f07d486adcb6bb12d8da0016066d8d8db205d6
https://git.gnome.org/browse/ostree/commit/?id=8f1ea1b50aad70f671b8a2902f4b68771cce8682

or similar.

I saw this once myself too =/  But I didn't have time to debug at that moment, so I just rmeoved the offending .conf file and things seemed fine after that.

But now that you've hit it, the question is: Is this a transient issue that happens once?  Or what triggers it?

I've done several upgrades since then myself and things appear to be fine.

As far as why the extra levels of indirection in /ostree/boot: it's to avoid touching /boot unless the kernel changes.  Think of scenarios where /boot is on NVRAM; we don't want to be modifying it constantly just to swap checksums.
Comment 2 Giovanni Campagna 2013-09-19 13:11:47 UTC
(In reply to comment #1)
> As far as why the extra levels of indirection in /ostree/boot: it's to avoid
> touching /boot unless the kernel changes.  Think of scenarios where /boot is on
> NVRAM; we don't want to be modifying it constantly just to swap checksums.

That makes sense, but for some reason the checksums in /boot are constantly changing when I redeploy, even if I redploy stuff that did not rebuild the yocto base.
I guess the reason might be the initramfs, which is regenerated at the end of each tree compose, and probably changes (timestamps? build-ids of non yocto stuff?)

Also, it doesn't answer my question on /boot/loader.
Comment 3 Colin Walters 2013-09-19 14:33:56 UTC
(In reply to comment #2)

> That makes sense, but for some reason the checksums in /boot are constantly
> changing when I redeploy, even if I redploy stuff that did not rebuild the
> yocto base.

Ok so this is fairly subtle, but: if you change the total number of deployments, then we have to update the bootloader config.

Now OSTree is very flexible, but in the default configuration if you just use "ostree admin upgrade" you will have 2 deployments.  Doing an upgrade will garbage collect the older one and deploy the new one, so if the initramfs hasn't changed we shouldn't touch the bootloader config in that case.


> I guess the reason might be the initramfs, which is regenerated at the end of
> each tree compose, and probably changes (timestamps? build-ids of non yocto
> stuff?)

See https://bugzilla.gnome.org/show_bug.cgi?id=696283

But as I mentioned in the bug only things which have "initramfs-depends: true" should trigger it to be rebuilt.  I did notice something was off with the caching a while ago but doing "rm cache/raw/initramfs -rf" fixed it.
 
> Also, it doesn't answer my question on /boot/loader.

I want to be able to swap the whole bootloader config at once...it's just an easy way to do that.
Comment 4 Colin Walters 2013-09-20 01:46:33 UTC
Ok, I just hit this again, so it's not fixed for sure.
Comment 5 Colin Walters 2013-09-20 02:03:30 UTC
Created attachment 255358 [details]
postmorterm boot directory
Comment 6 Colin Walters 2013-09-20 02:03:50 UTC
Created attachment 255359 [details]
postmortem ostree boot
Comment 7 Colin Walters 2013-09-20 02:04:34 UTC
Created attachment 255360 [details]
transcript
Comment 8 Colin Walters 2013-09-20 02:52:46 UTC
Ok I may have found it, going to sleep on this diff:

diff --git a/src/libostree/ostree-sysroot-deploy.c b/src/libostree/ostree-sysroot-deploy.c
index e0232dd..1ea984c 100644
--- a/src/libostree/ostree-sysroot-deploy.c
+++ b/src/libostree/ostree-sysroot-deploy.c
@@ -985,7 +985,7 @@ bootcsum_counts_for_deployment_list (GPtrArray   *deployments)
 
       if (!g_hash_table_lookup_extended (ret, bootcsum, &orig_key, &countp))
         {
-          g_hash_table_insert (ret, (char*)bootcsum, GUINT_TO_POINTER (0));
+          g_hash_table_insert (ret, (char*)bootcsum, GUINT_TO_POINTER (1));
         }
       else
         {


Basically if we're doing an upgrade and we have 1 deployment kernel shared between both lists and one different (which is in fact the expected normal case), g_hash_table_lookup returns 0 for both because I'm stupid and put that in where it should be 1.
Comment 9 Colin Walters 2013-09-20 12:17:01 UTC
Created attachment 255386 [details] [review]
deploy: Correctly swap bootloader version with new boot checksums

If we had two deployments with different boot checksums, and were
trying to remove the one that was the same and add a new one (the
normal case), we'd end up assuming due to comparison with 0 that
we only needed to do the fast subbootversion swap.

Fix this by actually putting 1 where we really mean 1.

And update the tests to verify the fix; I have double-verified by
undoing the fix, and noting that the test fails.
Comment 10 Colin Walters 2013-09-20 15:23:52 UTC
Created attachment 255412 [details] [review]
deploy: Correctly swap bootloader version with new boot checksums

Simplify the code.
Comment 11 Giovanni Campagna 2013-09-20 15:27:31 UTC
Review of attachment 255412 [details] [review]:

Looks correct to me.
Comment 12 Colin Walters 2013-09-20 15:29:42 UTC
Attachment 255412 [details] pushed as 298625d - deploy: Correctly swap bootloader version with new boot checksums