Root on ZFS maintenance

Boot Environment

This section is compatible with Alpine, Arch, Fedora and RHEL guides. Not necessary for NixOS. Incompatible with Ubuntu and Debian guides.

Note: boot environments as described below are intended only for system recovery purposes, that is, you boot into the alternate boot environment once to perform system recovery on the default datasets:

rpool/distro/root
bpool/distro/root

then reboot to those datasets once you have successfully recovered the system.

Switching the default boot environment complicates bootloader recovery and other maintenance operations and is thus currently not supported.

  1. If you want to use the @initial-installation snapshot created during installation, set my_boot_env=initial-installation and skip Step 3 and 4.

  2. Identify which dataset is currently mounted as root / and boot /boot

    set -x
    boot_dataset=$(df -P /boot | tail -n1 | cut -f1 -d' ' || true )
    root_dataset=$(df -P / | tail -n1 | cut -f1 -d' ' || true )
    
  3. Choose a name for the new boot environment

    my_boot_env=backup
    
  4. Take snapshots of the / and /boot datasets

    zfs snapshot "${boot_dataset}"@"${my_boot_env}"
    zfs snapshot "${root_dataset}"@"${my_boot_env}"
    
  5. Create clones from read-only snapshots

    new_root_dataset="${root_dataset%/*}"/"${my_boot_env}"
    new_boot_dataset="${boot_dataset%/*}"/"${my_boot_env}"
    
    zfs clone -o canmount=noauto \
      -o mountpoint=/ \
      "${root_dataset}"@"${my_boot_env}" \
      "${new_root_dataset}"
    
    zfs clone -o canmount=noauto \
      -o mountpoint=legacy \
      "${boot_dataset}"@"${my_boot_env}" \
      "${new_boot_dataset}"
    
  6. Mount clone and update file system table (fstab)

    MNT=$(mktemp -d)
    mount -t zfs -o zfsutil "${new_root_dataset}" "${MNT}"
    mount -t zfs  "${new_boot_dataset}" "${MNT}"/boot
    
    sed -i s,"${root_dataset}","${new_root_dataset}",g "${MNT}"/etc/fstab
    sed -i s,"${boot_dataset}","${new_boot_dataset}",g "${MNT}"/etc/fstab
    
    if test -f "${MNT}"/boot/grub/grub.cfg; then
      is_grub2=n
      sed -i s,"${boot_dataset#bpool/}","${new_boot_dataset#bpool/}",g "${MNT}"/boot/grub/grub.cfg
    elif test -f "${MNT}"/boot/grub2/grub.cfg; then
      is_grub2=y
      sed -i s,"${boot_dataset#bpool/}","${new_boot_dataset#bpool/}",g "${MNT}"/boot/grub2/grub.cfg
    else
      echo "ERROR: no grub menu found!"
      exit 1
    fi
    

    Do not proceed if no grub menu was found!

  7. Unmount clone

    umount -Rl "${MNT}"
    
  8. Add new boot environment as GRUB menu entry

    echo "# ${new_boot_dataset}" > new_boot_env_entry_"${new_boot_dataset##*/}"
    printf '\n%s' "menuentry 'Boot environment ${new_boot_dataset#bpool/} from ${boot_dataset#bpool/}' "  \
      >> new_boot_env_entry_"${new_boot_dataset##*/}"
    if [ "${is_grub2}" = y ]; then
       # shellcheck disable=SC2016
       printf '{ search --set=drive1 --label bpool; configfile ($drive1)/%s@/grub2/grub.cfg; }' \
       "${new_boot_dataset#bpool/}" >> new_boot_env_entry_"${new_boot_dataset##*/}"
    else
       # shellcheck disable=SC2016
       printf '{ search --set=drive1 --label bpool; configfile ($drive1)/%s@/grub/grub.cfg; }' \
       "${new_boot_dataset#bpool/}" >> new_boot_env_entry_"${new_boot_dataset##*/}"
    fi
    
    find /boot/efis/ -name "grub.cfg" -print0 \
    | xargs -t -0I '{}' sh -vxc "tail -n1 new_boot_env_entry_${new_boot_dataset##*/}  >> '{}'"
    
  9. Do not delete new_boot_env_entry_"${new_boot_dataset##*/}" file. It is needed when you want to remove the new boot environment from GRUB menu later.

  10. After reboot, select boot environment entry from GRUB menu to boot from the clone. Press ESC inside submenu to return to the previous menu.

  11. Steps above can also be used to create a new clone from an existing snapshot.

  12. To delete the boot environment, first store its name in a variable:

    my_boot_env=backup
    
  13. Ensure that the boot environment is not currently used

    set -x
    boot_dataset=$(df -P /boot | tail -n1 | cut -f1 -d' ' || true )
    root_dataset=$(df -P / | tail -n1 | cut -f1 -d' ' || true )
    new_boot_dataset="${boot_dataset%/*}"/"${my_boot_env}"
    rm_boot_dataset=$(head -n1 new_boot_env_entry_"${new_boot_dataset##*/}" | sed 's|^# *||' || true )
    
    if [ "${boot_dataset}" = "${rm_boot_dataset}" ]; then
      echo "ERROR: the dataset you want to delete is the current root! abort!"
      exit 1
    fi
    
  14. Then check the origin snapshot

    rm_root_dataset=rpool/"${rm_boot_dataset#bpool/}"
    
    rm_boot_dataset_origin=$(zfs get -H origin "${rm_boot_dataset}"|cut -f3 || true )
    rm_root_dataset_origin=$(zfs get -H origin "${rm_root_dataset}"|cut -f3 || true )
    
  15. Finally, destroy clone (boot environment) and its origin snapshot

    zfs destroy "${rm_root_dataset}"
    zfs destroy "${rm_root_dataset_origin}"
    zfs destroy "${rm_boot_dataset}"
    zfs destroy "${rm_boot_dataset_origin}"
    
  16. Remove GRUB entry

    new_entry_escaped=$(tail -n1 new_boot_env_entry_"${new_boot_dataset##*/}" | sed -e 's/[\/&]/\\&/g' || true )
    find /boot/efis/ -name "grub.cfg" -print0 | xargs -t -0I '{}' sed -i "/${new_entry_escaped}/d" '{}'
    

Disk replacement

When a disk fails in a mirrored setup, the disk can be replaced with the following procedure.

  1. Shutdown the computer.

  2. Replace the failed disk with another disk. The replacement should be at least the same size or larger than the failed disk.

  3. Boot the computer.

    When a disk fails, the system will boot, albeit several minutes slower than normal.

    For NixOS, this is due to the initrd and systemd designed to only import a pool in degraded state after a 90s timeout.

    Swap partition on that disk will also fail.

  4. Install GNU parted with your distribution package manager.

  5. Identify the bad disk and a working old disk

    ZPOOL_VDEV_NAME_PATH=1 zpool status
    
    pool:   bpool
    status: DEGRADED
    action: Replace the device using 'zpool replace'.
    ...
    config: bpool
        mirror-0
        2387489723748                    UNAVAIL    0  0  0   was /dev/disk/by-id/ata-BAD-part2
        /dev/disk/by-id/ata-disk_known_good-part2    ONLINE     0  0  0
    
  6. Store the bad disk and a working old disk in a variable, omit the partition number -partN

    disk_to_replace=/dev/disk/by-id/ata-disk_to_replace
    disk_known_good=/dev/disk/by-id/ata-disk_known_good
    
  7. Identify the new disk

    find /dev/disk/by-id/
    
    /dev/disk/by-id/ata-disk_known_good-part1
    /dev/disk/by-id/ata-disk_known_good-part2
    ...
    /dev/disk/by-id/ata-disk_known_good-part5
    /dev/disk/by-id/ata-disk_new       <-- new disk w/o partition table
    
  8. Store the new disk in a variable

    disk_new=/dev/disk/by-id/ata-disk_new
    
  9. Create partition table on "${disk_new}", refer to respective installation pages for details.

  10. Format and mount EFI system partition, refer to respective installation pages for details.

  11. Replace failed disk in ZFS pool

    zpool offline bpool "${disk_to_replace}"-part2
    zpool offline rpool "${disk_to_replace}"-part3
    zpool replace bpool "${disk_to_replace}"-part2 "${disk_new}"-part2
    zpool replace rpool "${disk_to_replace}"-part3 "${disk_new}"-part3
    zpool online  bpool "${disk_new}"-part2
    zpool online  rpool "${disk_new}"-part3
    

    Let the new disk resilver. Check status with zpool status.

  12. Reinstall and mirror bootloader, refer to respective installation pages for details.

    If you are using NixOS, see below.

  13. For NixOS, replace bad disk with new disk inside per-host configuration file.

    sed -i "s|"${disk_to_replace##*/}"|"${disk_new##*/}"|" /etc/nixos/hosts/exampleHost/default.nix
    
  14. Commit and apply the changed configuration, reinstall bootloader, then reboot

    git -C /etc/nixos commit -asm "replace "${disk_to_replace##*/}" with "${disk_new##*/}"."
    
    nixos-rebuild boot --install-bootloader
    
    reboot
    

Bootloader Recovery

This section is compatible with Alpine, Arch, Fedora, RHEL and NixOS root on ZFS guides.

Sometimes the GRUB bootloader might be accidentally overwritten, rendering the system inaccessible. However, as long as the disk partitions where boot pool and root pool resides remain untouched, the system can still be booted easily.

  1. Download GRUB rescue image from this repo.

    You can also build the image yourself if you are familiar with Nix package manager.

  2. Extract either x86_64-efi or i386-pc image from the archive.

  3. Write the image to a disk.

  4. Boot the computer from the GRUB rescue disk. Select your distro in GRUB menu.

  5. Reinstall bootloader. See respective installation pages for details.