GNOME Bugzilla – Bug 724246
Support FAT/EFI
Last modified: 2016-02-17 19:33:29 UTC
The atomic upgrade support presently requires symbolic links; specifically /boot/loader -> /boot.loader.[01]. To support FAT, we need another scheme. Possibly having /boot/loader.version and having the bootloader be aware of it? I can't think of a way to do this offhand that wouldn't require bootloader support.
We could support non-atomic upgrades on FAT...but I'm not sure I want to go there. kay seemed to think that we should not do this swapped directory pattern, and just write increasing version numbers. But that would prevent us from doing an atomic add new entry + remove old entry. Which would mean that we could still be atomic, but if interrupted at the wrong time, the old entry would stick around, and we wouldn't know to remove it. OSTree really wants the bootloader entries to be the "single source of truth" about the system state, and a way to atomically swap them is important for implementing that.
<hunger> walters: I would be happy to just not get the bootloader updated but just get all the information I need to do that. <hunger> walters: ostree admin deploy --no-bootloader does unfortunately not work (opt_no_bootloader seems entirely unused to me). <walters> it would be pretty easy to just do a non-atomic replace <walters> you'd have just a very small window of unbootability <hunger> walters: I am creating images anyway, so I would not mind that at all. <walters> ok, but are you using online upgrades inside those images after deployment? <hunger> walters: True, I will most likely do that:-/ <walters> yeah <walters> i don't have an EFI setup myself handy, although I thought i saw something go by about qemu supporting it <hunger> walters: Can I force ostree to write into a different directory? Then I can copy things around. <hunger> walters: Fedora has omvf ... efi firmware for qemu. Works great, I use that for my kvm setups. <hunger> walters: Which of the two entries is the one that I need to boot anyway? I keep checking the ids to find the new one that I want to boot:-/ <hunger> walters: I tried adding support for the gummiboot bootloader... but so far I failed. My C is rusty and I never used glib:-/ <walters> hunger, i am presently discussing this with kay in #systemd <hunger> #systemd on this server? <walters> on freenode <walters> http://pastebin.mozilla.org/4270893 <hunger> walters: I think gummiboot should do this: Add its entries to /boot/loader/entries (and maybe remove its own old entries). <walters> it's easy to add one entry safely <walters> just write the new file to .tmp, fsync it, rename it <hunger> walters: Then it should update /boot/loader/loader.conf and set the entry to boot there (provided that was pointing to an ostree entry). <walters> mmm...gummiboot implements the BLS purely, we don't need to update loader.conf as i understand it <walters> it reads the files in entries/ on startup <hunger> walters: You can have as many entries in /boot/loader/entries as you like. No problem there. <walters> btw there is also #ostree on freenode now if you prefer <walters> yes <hunger> walters: Yes, it reads all entries. But it will boot the one you set in loader.conf by default. <walters> the example has "default 6a9857a393724b7a981ebb5b8495b9ea-*" <walters> so if we drop in a new file with a higher version, on start it will boot that <walters> without touching the loader.conf <walters> this is a very useful part of the BLS <walters> it avoids having to touch a global config file just to add a new entry <hunger> Hmmm... so far I always had to check the hashes of the entries and manually pick the right one to boot. <walters> i.e. you don't need anything like the horror that is "grub2-mkconfig" <walters> is the machine-id changing? <walters> oh i see <walters> ostree is kind of being a pain here - we're writing the checksum of the kernel into our filename <hunger> I actually only switched to gummiboot because that is the one knew to be reading /boot/loader/entries/*, so I assumed you must be using that. Didn't know that was some kind of standard. <walters> you could probably have in gummiboot: "default ostree-*" <walters> i'd say it's some text on the internet that wants to be a standard =) <hunger> walters: Anyway: I think you want to add your ostree entries in /boot/loader/entries. <walters> we do <hunger> walters: I think that should actually be pretty safe:-) <hunger> walters: I do not see why you need the symlink. <walters> that allows us to do an atomic swap of the entire directory <hunger> walters: Why is that necessary? <walters> to do an add+remove at the same time <walters> or to add multiple entries atomically, or remove multiple entries <hunger> walters: How about falling back to not do that atomically if the FS does not support it? <hunger> walters: Currently ostree just exists with an error. <walters> i wouldn't be completely opposed to that <hunger> But actually I do not see why you need to do this atomically. <walters> for the simple cases we don't actually <walters> it just makes everything easier to reason about if I have this primitive <hunger> Just remove the entry as before you remove a deployment and add it after creating a new deployment. <hunger> Once you start deleting a deployment the entry should be gone as the deployment is broken. Doing that atomically later is actually wrong IMHO. <hunger> ... and once you added a deployment and the kernels in /boot you are fine to add an entry (and adding a file atomically is not a problem, even on vfat I think). <hunger> adding/removing deployments and then updating the boot loader later is actually wrong IMHO. <hunger> s/later/later for all the additions and removals at once/ <walters> it's safe to add deployments, then update the config <walters> before and after ostree does anything it performs a GC <walters> which deletes deployments not referenced by the bootconfig <walters> so if it's interrupted halfway through making a deployment (hardlinking, merging etc/ or whatever), that directory gets purged, and a new one is made <walters> so in the add/remove case, if we're interrupted after swapping the config, but before we clean up, that's fine, because the next run will clean up <walters> i mean again, we could probably get away with doing it non-atomically for most cases <walters> but the whole design of the code is oriented around the bootconfig being the "single source of truth" <hunger> walters: I think in all cases, provided you add the entry after adding a deployment and removing the entry before starting to remove it. <hunger> walters: bootconfig stays the single source of truth. <walters> yes again except that if we're interrupted between adding and removing, then on the next boot, we have no way to know we wanted to remove that entry <walters> we could fix this with a journal file <hunger> walters: I do not see a problem. The next deployment run will clean it up anyway. <walters> how would it know we wanted it removed without a journal? <walters> dunno, i guess we could usually get away with removing an entry first, then adding the new one <hunger> how does it know that it should remove deployments now? I never told it, it still went ahead and removed some deployments. <walters> well by default we only keep the running + new <walters> but you can pass --retain to deploy to just append <walters> like imagine you are writing an automatic bisection tool <walters> you could create 50 deployments <walters> automate rebooting, when you find one is good, drop 25 of them, reboot, repeat <walters> now you could do such a thing also by only having one, and just deploying the new target each time <hunger> Oh, I thought it would basically keep everything that had a symlink to it in ostree/deploy/OS/... like the current one. <walters> no <walters> the bootloader config points to /ostree
Another option here is to finish the update inside the initramfs if it's only partially completed. We detect that we have an old kernel+initramfs and then finish swapping the bootconfig. Just need to put enough bits of ostree in the initramfs to accomplish this.
(Then we need to reboot again, don't try kexec, it's going to hit too many kernel driver bugs)
To expand on comment #3 - the idea here is that we can implement our own journal on top of FAT. This needs consultation with FAT expert about how best to do this, but: I suspect the simplest thing is to have an "ostreejournal.txt" file. Whenever we want to perform operations, we write out what we want to the journal, do it, then remove the journal. Inside the initramfs, we check if there's a journal file. If so, replay it, then delete it, then reboot. The primitives in this journal would be something of the form: 1) Create file with given content 2) Delete file And a constraint on #1 is likely that the filenames would have to be unique.
I discussed this with Kay and Peter again, and I think the most sensible is the plan in comment #3. You have the hit of a double reboot only in the case where you happen to be interrupted during a very small window. The main issue with this approach is that we'll have to be very careful to test the complete-in-initramfs code as it will be almost completely unused during normal operations.
See also https://bugzilla.redhat.com/show_bug.cgi?id=1101359
Created attachment 288864 [details] [review] Add (non-atomic) support for GRUB2 + UEFI We need basic support for UEFI - many newer servers don't support BIOS compatibility mode anymore. However, this patch only implements non-atomic because UEFI is FAT, and we can't do the previous design for OSTree of atomic swap of /boot/loader. The Fedora/RHEL UEFI layout has the kernels on a "real" /boot partition, and /boot/efi/EFI/$vendor just holds the grub2 UEFI binary and grub.cfg. Following this, /boot/loader is still on the OS boot partition, and we still atomically swap it. This potentially paves the way to atomic upgrades in the future.
Review of attachment 288864 [details] [review]: Spotted a couple minor things in _ostree_bootloader_grub2_query(): 1) Consider replacing the stat() call with g_file_test(), just to clarify it's nothing more than a file existence test. 2) The new self->config_path_efi_* pointers need to be cleared in finalize(). Also, if the query() method gets called more than once, the previous GFile references will leak. Maybe not an issue in practice, but clearing those pointers before allocating wouldn't hurt. The rest of the patch looks pretty straight-forward.
Created attachment 288984 [details] [review] followup to comments
Let me know if the updated patch on top looks right, I can rebase them together if so.
Attachment 288864 [details] pushed as 0c89abe - Add (non-atomic) support for GRUB2 + UEFI
This came out of an earlier discussion. Ideally we have shared behavior for GRUB2 and gummiboot. From: Peter Jones <pjones@redhat.com> Here's a basic sketch of how to do it. There may be some spots where I don't have the atomic model quite right (or, as likely, the naming conventions for your symlinks), but this should be close enough to get started. During installation: 1) Create a pile of partitions: /realboot /realboot/efi0 /realboot/efi1 2) make /realboot/boot0 and /realboot/boot1 directories, and make /boot always be a symlink to one of them 3) make /realboot/efi a symlink that we set up during boot to point to either /boot/efi0 or /boot/efi1 4) make /boot/efi a symlink to /realboot/efi 5) on UEFI, grub.cfg lives in /boot/efi/EFI/redhat/grub.cfg , so in this case there are two of them. Likewise, grub's environment storage is in /boot/efi/EFI/redhat/grubenv (on RHEL 7.1 and later). (not really a step, just info) 6) determine two numbers for Boot#### variables that aren't in use (right now I think anaconda just lets efibootmgr do that; we'll have to do it ahead of time for atomic.) These are $M and $N in #7 7) store those numbers so grub2-mkconfig can find them: echo GRUB_EFI_BOOT_0=$M >> /etc/default/grub echo GRUB_EFI_BOOT_1=$N >> /etc/default/grub (grub2-mkconfig will have to be modified to export these) 8) make grub have efi_get_variable and efi_set_variable commands that work as shown below. (this is pretty easy; the internal calls to do it already exist so it's just defining a command, parsing command line options, etc.) 9) make a boot failed variable in grub's environment: grub2-editenv /boot/efi0/EFI/redhat/grubenv --set atomic_boot_failed=0 grub2-editenv /boot/efi1/EFI/redhat/grubenv --set atomic_boot_failed=0 9) make a shell script that can go in the grub package as /etc/grub.d/01_atomic. This generates and emits config file script sections when grub2-mkconfig is run, i.e. when grub.cfg is created. You basically want each of the grub2 config files to have a section that does something like the following pseudocode: search --fs-uuid --set=boot [uuid of /realboot] search --fs-uuid --set=bootefi0 ["uuid" of /realboot/efi0] search --fs-uuid --set=bootefi1 ["uuid" of /realboot/efi1] efi_get_variable --name=BootCurrent --set=BC efi_boot_dir=$(readlink $(boot)/efi) if [ "${atomic_boot_failed}" == 0 ]; then if [ $BC == $GRUB_EFI_BOOT_0 -a $efi_boot_dir == /boot/efi1 ]; then # this means something failed set atomic_boot_failed=1 save_env -f $(bootefi0)/EFI/redhat/grubenv atomic_boot_failed save_env -f $(bootefi1)/EFI/redhat/grubenv atomic_boot_failed efi_set_variable --name=BootNext --value=$GRUB_EFI_BOOT_1 reboot elif [ $BC == $GRUB_EFI_BOOT_1 -a $efi_boot_dir == /boot/efi0 ]; then # this means something failed set atomic_boot_failed=1 save_env -f $(bootefi0)/EFI/redhat/grubenv atomic_boot_failed save_env -f $(bootefi1)/EFI/redhat/grubenv atomic_boot_failed efi_set_variable --name=BootNext --value=$GRUB_EFI_BOOT_1 reboot fi fi Note that in my psuedocode here, $GRUB_EFI_BOOT_[01] are evaluated during /generation/ of the script, but everything else is done when it is running. 10) Instead of the efibootmgr call that anaconda is doing right now, make it do two of them, one for /realboot/efi0 and one for /realboot/efi1: efibootmgr -b $M -c -d /dev/sda1 -L RHEL -l '\EFI\redhat\shim.efi' efibootmgr -b $N -c -d /dev/sda2 -L RHEL -l '\EFI\redhat\shim.efi' and also: efibootmgr -o $M,$N 11) when atomic does an update, you now have to do two things: a) move the /realboot/efi symlink to point at the right one of /boot/efi0 or /boot/efi1 b) do "efibootmgr -n $M" if we picked /boot/efi0 or "efibootmgr -n $N" for /boot/efi1 12) some time very early in bootup, do: grub2-editenv /boot/efi/EFI/redhat/grubenv list | grep -q ^atomic_boot_failed=1$ and test that - if it's 1, something went horribly wrong and atomic should do whatever it does when something has gone horribly wrong, and also reset all of this state: ln -sf $whicheverbootefi01iscorrectnow /realboot/efi grub2-editenv /boot/efi0/EFI/redhat/grubenv --set atomic_boot_failed=0 grub2-editenv /boot/efi1/EFI/redhat/grubenv --set atomic_boot_failed=0 if it's 0, atomic should do something like: if [ "$(readlink /boot/efi)" == "/boot/efi0" ]; then efibootmgr -o $M,$N else efibootmgr -o $N,$M fi
I have been thinking about atomic GRUB+EFI use case as well. After reading about all involved parts here is my deduced list (theoretical list - next step is to verify it in practise.) of steps* : 1) Partitioning: - EFI (FAT) Partition contains: /EFI/BOOT/bootx64.efi (Grub-Efi boot loader binary) - ROOTFS (EXT4) Partition contains everything else. This approach does not require any changes in how OSTree swaps the boot loader symlinks. loader/grub.cfg 2) Configure GRUB to search for grub.cfg on ROOTFS/loader/grub.cfg *. * If there are no barriers to configure GRUB in this way, then above approach *should* work.
Ok, I have verified that it is possible to be atomic even on EFI based systems. One limitation from the above scenario is that you would need to update /EFI/BOOT/bootx64.efi binary manually. Not sure how common it is to update a boot loader binary on a deployed system, but its not impossible. Nevertheless this approach seems to work well and completely bypasses the non-atomic efi+grub code path from ostree-bootloader-grub2.c, thanks to: _ostree_bootloader_grub2_query() { ... if (g_file_query_exists (self->config_path_bios, NULL)) { *out_is_active = TRUE; ret = TRUE; goto out; } ... } as I have the boot/grub2/grub.cfg (config_path_bios) symlink.