# RAIDZ

tl;dr: RAIDZ is effective for large block sizes and sequential workloads.

## Introduction

RAIDZ is a variation on RAID-5 that allows for better distribution of parity and eliminates the RAID-5 “write hole” (in which data and parity become inconsistent after a power loss). Data and parity is striped across all disks within a raidz group.

A raidz group can have single, double, or triple parity, meaning that the raidz
group can sustain one, two, or three failures, respectively, without losing any
data. The `raidz1`

vdev type specifies a single-parity raidz group; the `raidz2`

vdev type specifies a double-parity raidz group; and the `raidz3`

vdev type
specifies a triple-parity raidz group. The `raidz`

vdev type is an alias for
raidz1.

A raidz group of N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P devices failing without losing data. The minimum number of devices in a raidz group is one more than the number of parity disks. The recommended number is between 3 and 9 to help increase performance.

## Space efficiency

Actual used space for a block in RAIDZ is based on several points:

minimal write size is disk sector size (can be set via ashift vdev parameter)

stripe width in RAIDZ is dynamic, and starts with at least one data block part, or up to

`disks count`

minus`parity number`

parts of data blockone block of data with size of

`recordsize`

is splitted equally via`sector size`

parts and written on each stripe on RAIDZ vdeveach stripe of data will have a part of block

in addition to data one, two or three blocks of parity should be written, one per disk; so, for raidz2 of 5 disks there will be 3 blocks of data and 2 blocks of parity

Due to these inputs, if `recordsize`

is less or equal to sector size,
then RAIDZ’s parity size will be effictively equal to mirror with same redundancy.
For example, for raidz1 of 3 disks with `ashift=12`

and `recordsize=4K`

we will allocate on disk:

one 4K block of data

one 4K parity block

and usable space ratio will be 50%, same as with double mirror.

Another example for `ashift=12`

and `recordsize=128K`

for raidz1 of 3 disks:

total stripe width is 3

one stripe can have up to 2 data parts of 4K size because of 1 parity blocks

we will have 128K/8k = 16 stripes with 8K of data and 4K of parity each

16 stripes each with 12k, means we write 192k to store 128k

so usable space ratio in this case will be 66%.

The more disks RAIDZ has, the wider the stripe, the greater the space efficiency.

You can find actual parity cost per RAIDZ size here:

(source)

## Performance considerations

### Write

A stripe spans across all drives in the array. A one block write will write the stripe part onto each disk. A RAIDZ vdev has a write IOPS of the slowest disk in the array in the worst case because the write operation of all stripe parts must be completed on each disk.