Ubuntu 20.04 Root on ZFS

Errata

If you previously installed using this guide, please apply these fixes if applicable:

/boot/grub Not Mounted

Severity: Grave
Fixed: 2020-05-30

For a mirror or raidz topology, /boot/grub is on a separate dataset. This is now bpool/BOOT/ubuntu_UUID/grub, but was previously bpool/grub. Unfortunately, zsys sets canmount=off on bpool/grub, so it is not mounted. As a result, updates the GRUB configuration will be written to the /boot filesystem and not used by GRUB (because it is still looking in bpool/grub). Check for bpool/grub:

zfs list bpool/grub

If this says “dataset does not exist”, you are good. If it exists, fix it.

Once you start this process, the system will be unbootable until you have completed it. Do not reboot until you have completed all of the steps.

  1. Rename the dataset:

    umount /boot/grub
    # Ignore any error about it not being mounted.
    
    rm -rf /boot/grub
    
    zfs list -r bpool
    # Replace UUID below:
    zfs rename bpool/grub bpool/BOOT/ubuntu_UUID/grub
    zfs inherit com.ubuntu.zsys:bootfs bpool/BOOT/ubuntu_UUID/grub
    zfs set canmount=on bpool/BOOT/ubuntu_UUID/grub
    zfs mount bpool/BOOT/ubuntu_UUID/grub
    
  2. Ensure that zed updated the cache to use bpool/BOOT/ubuntu_UUID/grub:

    grep grub /etc/zfs/zfs-list.cache/bpool
    
  3. Rebuild the initrd and reinstall GRUB:

    update-initramfs -c -k all
    update-grub
    grub-install --target=x86_64-efi --efi-directory=/boot/efi \
        --bootloader-id=ubuntu --recheck --no-floppy
    

    Run this for the additional disk(s), incrementing the “2” to “3” and so on for both /boot/efi2 and ubuntu-2:

    cp -a /boot/efi/EFI /boot/efi2
    grub-install --target=x86_64-efi --efi-directory=/boot/efi2 \
        --bootloader-id=ubuntu-2 --recheck --no-floppy
    

    Check that these have set prefix=($root)'/BOOT/ubuntu_UUID/grub@':

    grep prefix= \
        /boot/efi/EFI/ubuntu/grub.cfg \
        /boot/efi2/EFI/ubuntu-2/grub.cfg
    
  4. If using encryption, patch a dependency loop:

    sudo apt install --yes curl patch
    curl https://launchpadlibrarian.net/478315221/2150-fix-systemd-dependency-loops.patch | \
        sed "s|/etc|/lib|;s|\.in$||" | (cd / ; patch -p1)
    
  5. Disable grub-initrd-fallback.service:

    systemctl mask grub-initrd-fallback.service
    

AccountsService Not Mounted

Severity: Normal
Fixed: 2020-05-28

The HOWTO previously had a typo in AccountsService (where Accounts is plural) as AccountServices (where Services is plural). This means that AccountsService data will be written to the root filesystem. This is only harmful in the event of a rollback of the root filesystem that does not include a rollback of the user data. Check it:

zfs list | grep Account

If the “s” is on “Accounts”, you are good. If it is on “Services”, fix it:

mv /var/lib/AccountsService /var/lib/AccountsService-old
zfs list -r rpool
# Replace the UUID twice below:
zfs rename rpool/ROOT/ubuntu_UUID/var/lib/AccountServices \
           rpool/ROOT/ubuntu_UUID/var/lib/AccountsService
mv /var/lib/AccountsService-old/* /var/lib/AccountsService
rmdir /var/lib/AccountsService-old

Overview

Ubuntu Installer

The Ubuntu installer has support for root-on-ZFS. This HOWTO produces nearly identical results as the Ubuntu installer because of bidirectional collaboration.

If you want a single-disk, unencrypted, desktop install, use the installer. It is far easier and faster than doing everything by hand.

If you want a ZFS native encrypted, desktop install, you can trivially edit the installer. The -o recordsize=1M there is unrelated to encryption; omit that unless you understand it. Additionally, once the system is installed, you should switch to encrypted swap:

swapon -v
# Note the device, including the partition.

ls -l /dev/disk/by-id/
# Find the by-id name of the disk.

sudo swapoff -a
sudo vi /etc/fstab
# Remove the swap entry.

sudo apt install --yes cryptsetup curl patch

curl https://launchpadlibrarian.net/478315221/2150-fix-systemd-dependency-loops.patch | \
    sed "s|/etc|/lib|;s|\.in$||" | (cd / ; patch -p1)

# Replace DISK-partN as appropriate from above:
echo swap /dev/disk/by-id/DISK-partN /dev/urandom \
    swap,cipher=aes-xts-plain64:sha256,size=512 | sudo tee -a /etc/crypttab
echo /dev/mapper/swap none swap defaults 0 0 | sudo tee -a /etc/fstab

Hopefully the installer will gain encryption support in the future.

If you want to setup a mirror or raidz topology, use LUKS encryption, and/or install a server (no desktop GUI), use this HOWTO.

Caution

  • This HOWTO uses a whole physical disk.

  • Do not use these instructions for dual-booting.

  • Backup your data. Any existing data will be lost.

System Requirements

Computers that have less than 2 GiB of memory run ZFS slowly. 4 GiB of memory is recommended for normal performance in basic workloads. If you wish to use deduplication, you will need massive amounts of RAM. Enabling deduplication is a permanent change that cannot be easily reverted.

Support

If you need help, reach out to the community using the zfs-discuss mailing list or IRC at #zfsonlinux on freenode. If you have a bug report or feature request related to this HOWTO, please file a new issue and mention @rlaager.

Contributing

  1. Fork and clone: https://github.com/openzfs/openzfs-docs

  2. Install the tools:

    sudo apt install pip3
    pip3 install -r requirements.txt
    # Add ~/.local/bin to your $PATH, e.g. by adding this to ~/.bashrc:
    PATH=$HOME/.local/bin:$PATH
    
  3. Make your changes.

  4. Test:

    cd docs
    make html
    sensible-browser _build/html/index.html
    
  5. git commit --signoff to a branch, git push, and create a pull request. Mention @rlaager.

Encryption

This guide supports three different encryption options: unencrypted, ZFS native encryption, and LUKS. With any option, all ZFS features are fully available.

Unencrypted does not encrypt anything, of course. With no encryption happening, this option naturally has the best performance.

ZFS native encryption encrypts the data and most metadata in the root pool. It does not encrypt dataset or snapshot names or properties. The boot pool is not encrypted at all, but it only contains the bootloader, kernel, and initrd. (Unless you put a password in /etc/fstab, the initrd is unlikely to contain sensitive data.) The system cannot boot without the passphrase being entered at the console. Performance is good. As the encryption happens in ZFS, even if multiple disks (mirror or raidz topologies) are used, the data only has to be encrypted once.

LUKS encrypts almost everything. The only unencrypted data is the bootloader, kernel, and initrd. The system cannot boot without the passphrase being entered at the console. Performance is good, but LUKS sits underneath ZFS, so if multiple disks (mirror or raidz topologies) are used, the data has to be encrypted once per disk.

Step 1: Prepare The Install Environment

  1. Boot the Ubuntu Live CD. Select Try Ubuntu. Connect your system to the Internet as appropriate (e.g. join your WiFi network). Open a terminal (press Ctrl-Alt-T).

  2. Setup and update the repositories:

    sudo apt-add-repository universe
    sudo apt update
    
  3. Optional: Install and start the OpenSSH server in the Live CD environment:

    If you have a second system, using SSH to access the target system can be convenient:

    passwd
    # There is no current password.
    sudo apt install --yes openssh-server vim
    

    Installing the full vim package fixes terminal problems that occur when using the vim-tiny package (that ships in the Live CD environment) over SSH.

    Hint: You can find your IP address with ip addr show scope global | grep inet. Then, from your main machine, connect with ssh ubuntu@IP.

  4. Become root:

    sudo -i
    
  5. Install ZFS in the Live CD environment:

    apt install --yes debootstrap gdisk zfs-initramfs
    systemctl stop zed
    

Step 2: Disk Formatting

  1. Set a variable with the disk name:

    DISK=/dev/disk/by-id/scsi-SATA_disk1
    

    Always use the long /dev/disk/by-id/* aliases with ZFS. Using the /dev/sd* device nodes directly can cause sporadic import failures, especially on systems that have more than one storage pool.

    Hints:

    • ls -la /dev/disk/by-id will list the aliases.

    • Are you doing this in a virtual machine? If your virtual disk is missing from /dev/disk/by-id, use /dev/vda if you are using KVM with virtio; otherwise, read the troubleshooting section.

  2. If you are re-using a disk, clear it as necessary:

    If the disk was previously used in an MD array:

    apt install --yes mdadm
    
    # See if one or more MD arrays are active:
    cat /proc/mdstat
    # If so, stop them (replace ``md0`` as required):
    mdadm --stop /dev/md0
    
    # For an array using the whole disk:
    mdadm --zero-superblock --force $DISK
    # For an array using a partition (e.g. a swap partition per this HOWTO):
    mdadm --zero-superblock --force ${DISK}-part2
    

    Clear the partition table:

    sgdisk --zap-all $DISK
    

    If you get a message about the kernel still using the old partition table, reboot and start over (except that you can skip this step).

  3. Create bootloader partition(s):

    sgdisk     -n1:1M:+512M   -t1:EF00 $DISK
    

    Note: This partition is setup for UEFI support. For legacy (BIOS) booting, this will allow you to move the disk(s) to a new system/motherboard in the future without having to rebuild the pool (and restore your data from a backup). Additionally, this is used for /boot/grub in single-disk installs, as discussed below.

    For legacy (BIOS) booting:

    sgdisk -a1 -n5:24K:+1000K -t5:EF02 $DISK
    

    Note: For simplicity and forward compatibility, this HOWTO uses GPT partition labels for both UEFI and legacy (BIOS) booting. The Ubuntu installer uses an MBR label for legacy (BIOS) booting.

  4. Create a partition for swap:

    Previous versions of this HOWTO put swap on a zvol. Ubuntu recommends against this configuration due to deadlocks. There is a bug report upstream.

    Putting swap on a partition gives up the benefit of ZFS checksums (for your swap). That is probably the right trade-off given the reports of ZFS deadlocks with swap. If you are bothered by this, simply do not enable swap.

    Choose one of the following options if you want swap:

    • For a single-disk install:

      sgdisk     -n2:0:+500M    -t2:8200 $DISK
      
    • For a mirror or raidz topology:

      sgdisk     -n2:0:+500M    -t2:FD00 $DISK
      

    Adjust the swap swize to your needs. If you wish to enable hiberation (which only works for unencrypted installs), the swap partition must be at least as large as the system’s RAM.

  5. Create a boot pool partition:

    sgdisk     -n3:0:+2G      -t3:BE00 $DISK
    

    The Ubuntu installer uses 5% of the disk space constrained to a minimum of 500 MiB and a maximum of 2 GiB. Making this too small (and 500 MiB might be too small) can result in an inability to upgrade the kernel.

  6. Create a root pool partition:

    Choose one of the following options:

    • Unencrypted or ZFS native encryption:

      sgdisk     -n4:0:0        -t4:BF00 $DISK
      
    • LUKS:

      sgdisk     -n4:0:0        -t4:8309 $DISK
      

    If you are creating a mirror or raidz topology, repeat the partitioning commands for all the disks which will be part of the pool.

  7. Create the boot pool:

    zpool create \
        -o ashift=12 -d \
        -o feature@async_destroy=enabled \
        -o feature@bookmarks=enabled \
        -o feature@embedded_data=enabled \
        -o feature@empty_bpobj=enabled \
        -o feature@enabled_txg=enabled \
        -o feature@extensible_dataset=enabled \
        -o feature@filesystem_limits=enabled \
        -o feature@hole_birth=enabled \
        -o feature@large_blocks=enabled \
        -o feature@lz4_compress=enabled \
        -o feature@spacemap_histogram=enabled \
        -o feature@zpool_checkpoint=enabled \
        -O acltype=posixacl -O canmount=off -O compression=lz4 \
        -O devices=off -O normalization=formD -O relatime=on -O xattr=sa \
        -O mountpoint=/boot -R /mnt \
        bpool ${DISK}-part3
    

    You should not need to customize any of the options for the boot pool.

    GRUB does not support all of the zpool features. See spa_feature_names in grub-core/fs/zfs/zfs.c. This step creates a separate boot pool for /boot with the features limited to only those that GRUB supports, allowing the root pool to use any/all features. Note that GRUB opens the pool read-only, so all read-only compatible features are “supported” by GRUB.

    Hints:

    • If you are creating a mirror topology, create the pool using:

      zpool create \
          ... \
          bpool mirror \
          /dev/disk/by-id/scsi-SATA_disk1-part3 \
          /dev/disk/by-id/scsi-SATA_disk2-part3
      
    • For raidz topologies, replace mirror in the above command with raidz, raidz2, or raidz3 and list the partitions from additional disks.

    • The pool name is arbitrary. If changed, the new name must be used consistently. The bpool convention originated in this HOWTO.

    Feature Notes:

    • The allocation_classes feature should be safe to use. However, unless one is using it (i.e. a special vdev), there is no point to enabling it. It is extremely unlikely that someone would use this feature for a boot pool. If one cares about speeding up the boot pool, it would make more sense to put the whole pool on the faster disk rather than using it as a special vdev.

    • The project_quota feature has been tested and is safe to use. This feature is extremely unlikely to matter for the boot pool.

    • The resilver_defer should be safe but the boot pool is small enough that it is unlikely to be necessary.

    • The spacemap_v2 feature has been tested and is safe to use. The boot pool is small, so this does not matter in practice.

    • As a read-only compatible feature, the userobj_accounting feature should be compatible in theory, but in practice, GRUB can fail with an “invalid dnode type” error. This feature does not matter for /boot anyway.

    • The zpool_checkpoint feature has been tested and is safe to use. The Ubuntu installer does not use it. This HOWTO does, as the feature may be desirable for the boot pool.

  8. Create the root pool:

    Choose one of the following options:

    • Unencrypted:

      zpool create \
          -o ashift=12 \
          -O acltype=posixacl -O canmount=off -O compression=lz4 \
          -O dnodesize=auto -O normalization=formD -O relatime=on \
          -O xattr=sa -O mountpoint=/ -R /mnt \
          rpool ${DISK}-part4
      
    • ZFS native encryption:

      zpool create \
          -o ashift=12 \
          -O encryption=aes-256-gcm \
          -O keylocation=prompt -O keyformat=passphrase \
          -O acltype=posixacl -O canmount=off -O compression=lz4 \
          -O dnodesize=auto -O normalization=formD -O relatime=on \
          -O xattr=sa -O mountpoint=/ -R /mnt \
          rpool ${DISK}-part4
      
    • LUKS:

      cryptsetup luksFormat -c aes-xts-plain64 -s 512 -h sha256 ${DISK}-part4
      cryptsetup luksOpen ${DISK}-part4 luks1
      zpool create \
          -o ashift=12 \
          -O acltype=posixacl -O canmount=off -O compression=lz4 \
          -O dnodesize=auto -O normalization=formD -O relatime=on \
          -O xattr=sa -O mountpoint=/ -R /mnt \
          rpool /dev/mapper/luks1
      

    Notes:

    Hints:

    • If you are creating a mirror topology, create the pool using:

      zpool create \
          ... \
          rpool mirror \
          /dev/disk/by-id/scsi-SATA_disk1-part4 \
          /dev/disk/by-id/scsi-SATA_disk2-part4
      
    • For raidz topologies, replace mirror in the above command with raidz, raidz2, or raidz3 and list the partitions from additional disks.

    • When using LUKS with mirror or raidz topologies, use /dev/mapper/luks1, /dev/mapper/luks2, etc., which you will have to create using cryptsetup.

    • The pool name is arbitrary. If changed, the new name must be used consistently. On systems that can automatically install to ZFS, the root pool is named rpool by default.

Step 3: System Installation

  1. Create filesystem datasets to act as containers:

    zfs create -o canmount=off -o mountpoint=none rpool/ROOT
    zfs create -o canmount=off -o mountpoint=none bpool/BOOT
    
  2. Create filesystem datasets for the root and boot filesystems:

    UUID=$(dd if=/dev/urandom of=/dev/stdout bs=1 count=100 2>/dev/null |
        tr -dc 'a-z0-9' | cut -c-6)
    
    zfs create -o canmount=noauto -o mountpoint=/ \
        -o com.ubuntu.zsys:bootfs=yes \
        -o com.ubuntu.zsys:last-used=$(date +%s) rpool/ROOT/ubuntu_$UUID
    zfs mount rpool/ROOT/ubuntu_$UUID
    
    zfs create -o canmount=noauto -o mountpoint=/boot \
        bpool/BOOT/ubuntu_$UUID
    zfs mount bpool/BOOT/ubuntu_$UUID
    

    With ZFS, it is not normally necessary to use a mount command (either mount or zfs mount). This situation is an exception because of canmount=noauto.

  3. Create datasets:

    zfs create -o com.ubuntu.zsys:bootfs=no \
        rpool/ROOT/ubuntu_$UUID/srv
    zfs create -o com.ubuntu.zsys:bootfs=no -o canmount=off \
        rpool/ROOT/ubuntu_$UUID/usr
    zfs create rpool/ROOT/ubuntu_$UUID/usr/local
    zfs create -o com.ubuntu.zsys:bootfs=no -o canmount=off \
        rpool/ROOT/ubuntu_$UUID/var
    zfs create rpool/ROOT/ubuntu_$UUID/var/games
    zfs create rpool/ROOT/ubuntu_$UUID/var/lib
    zfs create rpool/ROOT/ubuntu_$UUID/var/lib/AccountsService
    zfs create rpool/ROOT/ubuntu_$UUID/var/lib/apt
    zfs create rpool/ROOT/ubuntu_$UUID/var/lib/dpkg
    zfs create rpool/ROOT/ubuntu_$UUID/var/lib/NetworkManager
    zfs create rpool/ROOT/ubuntu_$UUID/var/log
    zfs create rpool/ROOT/ubuntu_$UUID/var/mail
    zfs create rpool/ROOT/ubuntu_$UUID/var/snap
    zfs create rpool/ROOT/ubuntu_$UUID/var/spool
    zfs create rpool/ROOT/ubuntu_$UUID/var/www
    
    zfs create -o canmount=off -o mountpoint=/ \
        rpool/USERDATA
    zfs create -o com.ubuntu.zsys:bootfs-datasets=rpool/ROOT/ubuntu_$UUID \
        -o canmount=on -o mountpoint=/root \
        rpool/USERDATA/root_$UUID
    

    For a mirror or raidz topology, create a dataset for /boot/grub:

    zfs create bpool/BOOT/ubuntu_$UUID/grub
    

    A tmpfs is recommended later, but if you want a separate dataset for /tmp:

    zfs create -o com.ubuntu.zsys:bootfs=no \
        rpool/ROOT/ubuntu_$UUID/tmp
    chmod 1777 /mnt/tmp
    

    The primary goal of this dataset layout is to separate the OS from user data. This allows the root filesystem to be rolled back without rolling back user data.

    If you do nothing extra, /tmp will be stored as part of the root filesystem. Alternatively, you can create a separate dataset for /tmp, as shown above. This keeps the /tmp data out of snapshots of your root filesystem. It also allows you to set a quota on rpool/tmp, if you want to limit the maximum space used. Otherwise, you can use a tmpfs (RAM filesystem) later.

  4. Install the minimal system:

    debootstrap focal /mnt
    

    The debootstrap command leaves the new system in an unconfigured state. An alternative to using debootstrap is to copy the entirety of a working system into the new ZFS root.

Step 4: System Configuration

  1. Configure the hostname:

    Replace HOSTNAME with the desired hostname:

    echo HOSTNAME > /mnt/etc/hostname
    vi /mnt/etc/hosts
    
    Add a line:
    127.0.1.1       HOSTNAME
    or if the system has a real name in DNS:
    127.0.1.1       FQDN HOSTNAME
    

    Hint: Use nano if you find vi confusing.

  2. Configure the network interface:

    Find the interface name:

    ip addr show
    

    Adjust NAME below to match your interface name:

    vi /mnt/etc/netplan/01-netcfg.yaml
    
    network:
      version: 2
      ethernets:
        NAME:
          dhcp4: true
    

    Customize this file if the system is not a DHCP client.

  3. Configure the package sources:

    vi /mnt/etc/apt/sources.list
    
    deb http://archive.ubuntu.com/ubuntu focal main restricted universe multiverse
    deb http://archive.ubuntu.com/ubuntu focal-updates main restricted universe multiverse
    deb http://archive.ubuntu.com/ubuntu focal-backports main restricted universe multiverse
    deb http://security.ubuntu.com/ubuntu focal-security main restricted universe multiverse
    
  4. Bind the virtual filesystems from the LiveCD environment to the new system and chroot into it:

    mount --rbind /dev  /mnt/dev
    mount --rbind /proc /mnt/proc
    mount --rbind /sys  /mnt/sys
    chroot /mnt /usr/bin/env DISK=$DISK UUID=$UUID bash --login
    

    Note: This is using --rbind, not --bind.

  5. Configure a basic system environment:

    apt update
    
    dpkg-reconfigure locales
    

    Even if you prefer a non-English system language, always ensure that en_US.UTF-8 is available:

    dpkg-reconfigure tzdata
    

    Install your preferred text editor:

    apt install --yes nano
    apt install --yes vim
    

    Installing the full vim package fixes terminal problems that occur when using the vim-tiny package (that is installed by debootstrap) over SSH.

  6. For LUKS installs only, setup /etc/crypttab:

    apt install --yes cryptsetup
    
    echo luks1 UUID=$(blkid -s UUID -o value ${DISK}-part4) none \
        luks,discard,initramfs > /etc/crypttab
    

    The use of initramfs is a work-around for cryptsetup does not support ZFS.

    Hint: If you are creating a mirror or raidz topology, repeat the /etc/crypttab entries for luks2, etc. adjusting for each disk.

  7. Create the EFI filesystem:

    Perform these steps for both UEFI and legacy (BIOS) booting:

    apt install --yes dosfstools
    
    mkdosfs -F 32 -s 1 -n EFI ${DISK}-part1
    mkdir /boot/efi
    echo UUID=$(blkid -s UUID -o value ${DISK}-part1) \
        /boot/efi vfat umask=0022,fmask=0022,dmask=0022 0 1 >> /etc/fstab
    mount /boot/efi
    

    For a mirror or raidz topology, repeat these steps for the additional disks, using /boot/efi2, /boot/efi3, etc.

    Note: The -s 1 for mkdosfs is only necessary for drives which present 4 KiB logical sectors (“4Kn” drives) to meet the minimum cluster size (given the partition size of 512 MiB) for FAT32. It also works fine on drives which present 512 B sectors.

  8. Put /boot/grub on the EFI System Partition:

    For a single-disk install only:

    mkdir /boot/efi/grub /boot/grub
    echo /boot/efi/grub /boot/grub none defaults,bind 0 0 >> /etc/fstab
    mount /boot/grub
    

    This allows GRUB to write to /boot/grub (since it is on a FAT-formatted ESP instead of on ZFS), which means that /boot/grub/grubenv and the recordfail feature works as expected: if the boot fails, the normally hidden GRUB menu will be shown on the next boot. For a mirror or raidz topology, we do not want GRUB writing to the EFI System Partition. This is becase we duplicate it at install without a mechanism to update the copies when the GRUB configuration changes (e.g. as the kernel is upgraded). Thus, we keep /boot/grub on the boot pool for the mirror or raidz topologies. This preserves correct mirroring/raidz behavior, at the expense of being able to write to /boot/grub/grubenv and thus the recordfail behavior.

  9. Install GRUB/Linux/ZFS in the chroot environment for the new system:

    Choose one of the following options:

    • Install GRUB/Linux/ZFS for legacy (BIOS) booting:

      apt install --yes grub-pc linux-image-generic zfs-initramfs zsys
      

      Select (using the space bar) all of the disks (not partitions) in your pool.

    • Install GRUB/Linux/ZFS for UEFI booting:

      apt install --yes \
          grub-efi-amd64 grub-efi-amd64-signed linux-image-generic \
          shim-signed zfs-initramfs zsys
      

      Note: For a mirror or raidz topology, this step only installs GRUB on the first disk. The other disk(s) will be handled later.

  10. Optional: Remove os-prober:

    dpkg --purge os-prober
    

    This avoids error messages from update-grub. os-prober is only necessary in dual-boot configurations.

  11. Set a root password:

    passwd
    
  12. Configure swap:

    Choose one of the following options if you want swap:

    • For an unencrypted single-disk install:

      mkswap -f ${DISK}-part2
      echo UUID=$(blkid -s UUID -o value ${DISK}-part2) \
          none swap discard 0 0 >> /etc/fstab
      swapon -a
      
    • For an unencrypted mirror or raidz topology:

      apt install --yes mdadm
      # Adjust the level (ZFS raidz = MD raid5, raidz2 = raid6) and
      # raid-devices if necessary and specify the actual devices.
      mdadm --create /dev/md0 --metadata=1.2 --level=mirror \
          --raid-devices=2 ${DISK1}-part2 ${DISK2}-part2
      mkswap -f /dev/md0
      echo UUID=$(blkid -s UUID -o value /dev/md0) \
          none swap discard 0 0 >> /etc/fstab
      swapon -a
      
    • For an encrypted (LUKS or ZFS native encryption) single-disk install:

      apt install --yes cryptsetup
      echo swap ${DISK}-part2 /dev/urandom \
            swap,cipher=aes-xts-plain64:sha256,size=512 >> /etc/crypttab
      echo /dev/mapper/swap none swap defaults 0 0 >> /etc/fstab
      
    • For an encrypted (LUKS or ZFS native encryption) mirror or raidz topology:

      apt install --yes cryptsetup mdadm
      # Adjust the level (ZFS raidz = MD raid5, raidz2 = raid6) and
      # raid-devices if necessary and specify the actual devices.
      mdadm --create /dev/md0 --metadata=1.2 --level=mirror \
          --raid-devices=2 ${DISK1}-part2 ${DISK2}-part2
      echo swap /dev/md0 /dev/urandom \
            swap,cipher=aes-xts-plain64:sha256,size=512 >> /etc/crypttab
      echo /dev/mapper/swap none swap defaults 0 0 >> /etc/fstab
      
  13. Optional (but recommended): Mount a tmpfs to /tmp

    If you chose to create a /tmp dataset above, skip this step, as they are mutually exclusive choices. Otherwise, you can put /tmp on a tmpfs (RAM filesystem) by enabling the tmp.mount unit.

    cp /usr/share/systemd/tmp.mount /etc/systemd/system/
    systemctl enable tmp.mount
    
  14. Setup system groups:

    addgroup --system lpadmin
    addgroup --system lxd
    addgroup --system sambashare
    
  15. Patch a dependency loop:

    For ZFS native encryption or LUKS:

    sudo apt install --yes curl patch
    
    curl https://launchpadlibrarian.net/478315221/2150-fix-systemd-dependency-loops.patch | \
        sed "s|/etc|/lib|;s|\.in$||" | (cd / ; patch -p1)
    

    This patch is from Bug #1875577 Encrypted swap won’t load on 20.04 with zfs root.

Step 5: GRUB Installation

  1. Verify that the ZFS boot filesystem is recognized:

    grub-probe /boot
    
  2. Refresh the initrd files:

    update-initramfs -c -k all
    

    Note: When using LUKS, this will print “WARNING could not determine root device from /etc/fstab”. This is because cryptsetup does not support ZFS.

  3. Disable memory zeroing:

    vi /etc/default/grub
    # Add init_on_alloc=0 to: GRUB_CMDLINE_LINUX_DEFAULT
    # Save and quit.
    

    This is to address performance regressions.

  4. Optional (but highly recommended): Make debugging GRUB easier:

    vi /etc/default/grub
    # Comment out: GRUB_TIMEOUT_STYLE=hidden
    # Set: GRUB_TIMEOUT=5
    # Below GRUB_TIMEOUT, add: GRUB_RECORDFAIL_TIMEOUT=5
    # Remove quiet and splash from: GRUB_CMDLINE_LINUX_DEFAULT
    # Uncomment: GRUB_TERMINAL=console
    # Save and quit.
    

    Later, once the system has rebooted twice and you are sure everything is working, you can undo these changes, if desired.

  5. Update the boot configuration:

    update-grub
    

    Note: Ignore errors from osprober, if present.

  6. Install the boot loader:

    Choose one of the following options:

    • For legacy (BIOS) booting, install GRUB to the MBR:

      grub-install $DISK
      

      Note that you are installing GRUB to the whole disk, not a partition.

      If you are creating a mirror or raidz topology, repeat the grub-install command for each disk in the pool.

    • For UEFI booting, install GRUB to the ESP:

      grub-install --target=x86_64-efi --efi-directory=/boot/efi \
          --bootloader-id=ubuntu --recheck --no-floppy
      

      For a mirror or raidz topology, run this for the additional disk(s), incrementing the “2” to “3” and so on for both /boot/efi2 and ubuntu-2:

      cp -a /boot/efi/EFI /boot/efi2
      grub-install --target=x86_64-efi --efi-directory=/boot/efi2 \
          --bootloader-id=ubuntu-2 --recheck --no-floppy
      
  7. Disable grub-initrd-fallback.service

    For a mirror or raidz topology:

    systemctl mask grub-initrd-fallback.service

    This is the service for /boot/grub/grubenv which does not work on mirrored or raidz topologies. Disabling this keeps it from blocking subsequent mounts of /boot/grub if that mount ever fails.

    Another option would be to set RequiresMountsFor=/boot/grub via a drop-in unit, but that is more work to do here for no reason. Hopefully this bug will be fixed upstream.

  8. Fix filesystem mount ordering:

    We need to activate zfs-mount-generator. This makes systemd aware of the separate mountpoints, which is important for things like /var/log and /var/tmp. In turn, rsyslog.service depends on var-log.mount by way of local-fs.target and services using the PrivateTmp feature of systemd automatically use After=var-tmp.mount.

    mkdir /etc/zfs/zfs-list.cache
    touch /etc/zfs/zfs-list.cache/bpool
    touch /etc/zfs/zfs-list.cache/rpool
    ln -s /usr/lib/zfs-linux/zed.d/history_event-zfs-list-cacher.sh /etc/zfs/zed.d
    zed -F &
    

    Verify that zed updated the cache by making sure these are not empty:

    cat /etc/zfs/zfs-list.cache/bpool
    cat /etc/zfs/zfs-list.cache/rpool
    

    If either is empty, force a cache update and check again:

    zfs set canmount=noauto bpool/BOOT/ubuntu_$UUID
    zfs set canmount=noauto rpool/ROOT/ubuntu_$UUID
    

    Stop zed:

    fg
    Press Ctrl-C.
    

    Fix the paths to eliminate /mnt:

    sed -Ei "s|/mnt/?|/|" /etc/zfs/zfs-list.cache/*
    

Step 6: First Boot

  1. Optional: Install SSH:

    apt install --yes openssh-server
    
    vi /etc/ssh/sshd_config
    # Set: PermitRootLogin yes
    
  2. Exit from the chroot environment back to the LiveCD environment:

    exit
    
  3. Run these commands in the LiveCD environment to unmount all filesystems:

    mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | \
        xargs -i{} umount -lf {}
    zpool export -a
    
  4. Reboot:

    reboot
    

    Wait for the newly installed system to boot normally. Login as root.

  5. Create a user account:

    Replace username with your desired username:

    UUID=$(dd if=/dev/urandom of=/dev/stdout bs=1 count=100 2>/dev/null |
        tr -dc 'a-z0-9' | cut -c-6)
    ROOT_DS=$(zfs list -o name | awk '/ROOT\/ubuntu_/{print $1;exit}')
    zfs create -o com.ubuntu.zsys:bootfs-datasets=$ROOT_DS \
        -o canmount=on -o mountpoint=/home/username \
        rpool/USERDATA/username_$UUID
    adduser username
    
    cp -a /etc/skel/. /home/username
    chown -R username:username /home/username
    usermod -a -G adm,cdrom,dip,lpadmin,lxd,plugdev,sambashare,sudo username
    

Step 7: Full Software Installation

  1. Upgrade the minimal system:

    apt dist-upgrade --yes
    
  2. Install a regular set of software:

    Choose one of the following options:

    • Install a command-line environment only:

      apt install --yes ubuntu-standard
      
    • Install a full GUI environment:

      apt install --yes ubuntu-desktop
      vi /etc/gdm3/custom.conf
      # In the [daemon] section, add: InitialSetupEnable=false
      

      Hint: If you are installing a full GUI environment, you will likely want to manage your network with NetworkManager:

      rm /etc/netplan/01-netcfg.yaml
      vi /etc/netplan/01-network-manager-all.yaml
      
      network:
        version: 2
        renderer: NetworkManager
      
  3. Optional: Disable log compression:

    As /var/log is already compressed by ZFS, logrotate’s compression is going to burn CPU and disk I/O for (in most cases) very little gain. Also, if you are making snapshots of /var/log, logrotate’s compression will actually waste space, as the uncompressed data will live on in the snapshot. You can edit the files in /etc/logrotate.d by hand to comment out compress, or use this loop (copy-and-paste highly recommended):

    for file in /etc/logrotate.d/* ; do
        if grep -Eq "(^|[^#y])compress" "$file" ; then
            sed -i -r "s/(^|[^#y])(compress)/\1#\2/" "$file"
        fi
    done
    
  4. Reboot:

    reboot
    

Step 8: Final Cleanup

  1. Wait for the system to boot normally. Login using the account you created. Ensure the system (including networking) works normally.

  2. Optional: Disable the root password:

    sudo usermod -p '*' root
    
  3. Optional (but highly recommended): Disable root SSH logins:

    If you installed SSH earlier, revert the temporary change:

    vi /etc/ssh/sshd_config
    # Remove: PermitRootLogin yes
    
    systemctl restart ssh
    
  4. Optional: Re-enable the graphical boot process:

    If you prefer the graphical boot process, you can re-enable it now. If you are using LUKS, it makes the prompt look nicer.

    sudo vi /etc/default/grub
    # Uncomment: GRUB_TIMEOUT_STYLE=hidden
    # Add quiet and splash to: GRUB_CMDLINE_LINUX_DEFAULT
    # Comment out: GRUB_TERMINAL=console
    # Save and quit.
    
    sudo update-grub
    

    Note: Ignore errors from osprober, if present.

  5. Optional: For LUKS installs only, backup the LUKS header:

    sudo cryptsetup luksHeaderBackup /dev/disk/by-id/scsi-SATA_disk1-part4 \
        --header-backup-file luks1-header.dat
    

    Store that backup somewhere safe (e.g. cloud storage). It is protected by your LUKS passphrase, but you may wish to use additional encryption.

    Hint: If you created a mirror or raidz topology, repeat this for each LUKS volume (luks2, etc.).

Troubleshooting

Rescuing using a Live CD

Go through Step 1: Prepare The Install Environment.

For LUKS, first unlock the disk(s):

cryptsetup luksOpen /dev/disk/by-id/scsi-SATA_disk1-part4 luks1
# Repeat for additional disks, if this is a mirror or raidz topology.

Mount everything correctly:

zpool export -a
zpool import -N -R /mnt rpool
zpool import -N -R /mnt bpool
zfs load-key -a
# Replace “UUID” as appropriate; use zfs list to find it:
zfs mount rpool/ROOT/ubuntu_UUID
zfs mount bpool/BOOT/ubuntu_UUID
zfs mount -a

If needed, you can chroot into your installed environment:

mount --rbind /dev  /mnt/dev
mount --rbind /proc /mnt/proc
mount --rbind /sys  /mnt/sys
chroot /mnt /bin/bash --login
mount -a

Do whatever you need to do to fix your system.

When done, cleanup:

exit
mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | \
    xargs -i{} umount -lf {}
zpool export -a
reboot

Areca

Systems that require the arcsas blob driver should add it to the /etc/initramfs-tools/modules file and run update-initramfs -c -k all.

Upgrade or downgrade the Areca driver if something like RIP: 0010:[<ffffffff8101b316>]  [<ffffffff8101b316>] native_read_tsc+0x6/0x20 appears anywhere in kernel log. ZoL is unstable on systems that emit this error message.

MPT2SAS

Most problem reports for this tutorial involve mpt2sas hardware that does slow asynchronous drive initialization, like some IBM M1015 or OEM-branded cards that have been flashed to the reference LSI firmware.

The basic problem is that disks on these controllers are not visible to the Linux kernel until after the regular system is started, and ZoL does not hotplug pool members. See https://github.com/zfsonlinux/zfs/issues/330.

Most LSI cards are perfectly compatible with ZoL. If your card has this glitch, try setting ZFS_INITRD_PRE_MOUNTROOT_SLEEP=X in /etc/default/zfs. The system will wait X seconds for all drives to appear before importing the pool.

QEMU/KVM/XEN

Set a unique serial number on each virtual disk using libvirt or qemu (e.g. -drive if=none,id=disk1,file=disk1.qcow2,serial=1234567890).

To be able to use UEFI in guests (instead of only BIOS booting), run this on the host:

sudo apt install ovmf
sudo vi /etc/libvirt/qemu.conf

Uncomment these lines:

nvram = [
   "/usr/share/OVMF/OVMF_CODE.fd:/usr/share/OVMF/OVMF_VARS.fd",
   "/usr/share/OVMF/OVMF_CODE.secboot.fd:/usr/share/OVMF/OVMF_VARS.fd",
   "/usr/share/AAVMF/AAVMF_CODE.fd:/usr/share/AAVMF/AAVMF_VARS.fd",
   "/usr/share/AAVMF/AAVMF32_CODE.fd:/usr/share/AAVMF/AAVMF32_VARS.fd",
   "/usr/share/OVMF/OVMF_CODE.ms.fd:/usr/share/OVMF/OVMF_VARS.ms.fd"
]
sudo systemctl restart libvirtd.service

VMware

  • Set disk.EnableUUID = "TRUE" in the vmx file or vsphere configuration. Doing this ensures that /dev/disk aliases are created in the guest.