RAIDZ
tl;dr: RAIDZ is effective for large block sizes and sequential workloads.
Introduction
RAIDZ is a variation on RAID-5 that allows for better distribution of parity and eliminates the RAID-5 “write hole” (in which data and parity become inconsistent after a power loss). Data and parity is striped across all disks within a raidz group.
A raidz group can have single, double, or triple parity, meaning that the raidz
group can sustain one, two, or three failures, respectively, without losing any
data. The raidz1
vdev type specifies a single-parity raidz group; the raidz2
vdev type specifies a double-parity raidz group; and the raidz3
vdev type
specifies a triple-parity raidz group. The raidz
vdev type is an alias for
raidz1.
A raidz group of N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P devices failing without losing data. The minimum number of devices in a raidz group is one more than the number of parity disks. The recommended number is between 3 and 9 to help increase performance.
Space efficiency
Actual used space for a block in RAIDZ is based on several points:
minimal write size is disk sector size (can be set via ashift vdev parameter)
stripe width in RAIDZ is dynamic, and starts with at least one data block part, or up to
disks count
minusparity number
parts of data blockone block of data with size of
recordsize
is splitted equally viasector size
parts and written on each stripe on RAIDZ vdeveach stripe of data will have a part of block
in addition to data one, two or three blocks of parity should be written, one per disk; so, for raidz2 of 5 disks there will be 3 blocks of data and 2 blocks of parity
Due to these inputs, if recordsize
is less or equal to sector size,
then RAIDZ’s parity size will be effictively equal to mirror with same redundancy.
For example, for raidz1 of 3 disks with ashift=12
and recordsize=4K
we will allocate on disk:
one 4K block of data
one 4K parity block
and usable space ratio will be 50%, same as with double mirror.
Another example for ashift=12
and recordsize=128K
for raidz1 of 3 disks:
total stripe width is 3
one stripe can have up to 2 data parts of 4K size because of 1 parity blocks
we will have 128K/8k = 16 stripes with 8K of data and 4K of parity each
16 stripes each with 12k, means we write 192k to store 128k
so usable space ratio in this case will be 66%.
The more disks RAIDZ has, the wider the stripe, the greater the space efficiency.
You can find actual parity cost per RAIDZ size here:
(source)
Performance considerations
Write
A stripe spans across all drives in the array. A one block write will write the stripe part onto each disk. A RAIDZ vdev has a write IOPS of the slowest disk in the array in the worst case because the write operation of all stripe parts must be completed on each disk.