This page is a draft.
This page contains tips for troubleshooting ZFS on Linux and what info developers might want for bug triage.
About Log Files¶
Log files can be very useful for troubleshooting. In some cases, interesting information is stored in multiple log files that are correlated to system events.
Pro tip: logging infrastructure tools like elasticsearch, fluentd, influxdb, or splunk can simplify log analysis and event correlation.
Generic Kernel Log¶
Typically, Linux kernel log messages are available from
/var/log/syslog, or where kernel log messages are sent (eg by
zpool command appear hung, does not return, and
is not killable
Likely cause: kernel thread hung or panic
Important information: if a kernel thread is stuck, then a backtrace of the stuck thread can be in the logs. In some cases, the stuck thread is not logged until the deadman timer expires. See also debug tunables
ZFS uses an event-based messaging interface for communication of
important events to other consumers running on the system. The ZFS Event
Daemon (zed) is a userland daemon that listens for these events and
processes them. zed is extensible so you can write shell scripts or
other programs that subscribe to events and take action. For example,
the script usually installed at
a formatted event message to
syslog. See the man page for
for more information.
A history of events is also available via the
zpool events command.
This history begins at ZFS kernel module load and includes events from
any pool. These events are stored in RAM and limited in count to a value
determined by the kernel tunable
zed has an internal throttling mechanism to prevent overconsumption
of system resources processing ZFS events.
More detailed information about events is observable using
zpool events -v The contents of the verbose events is subject to
change, based on the event and information available at the time of the
Each event has a class identifier used for filtering event types.
Commonly seen events are those related to pool management with class
sysevent.fs.zfs.* including import, export, configuration updates,
zpool history updates.
Events related to errors are reported as class
ereport.* These can
be invaluable for troubleshooting. Some faults can cause multiple
ereports as various layers of the software deal with the fault. For
example, on a simple pool without parity protection, a faulty disk could
ereport.io during a read from the disk that results in an
erport.fs.zfs.checksum at the pool level. These events are also
reflected by the error counters observed in
zpool status If you see
checksum or read/write errors in
zpool status then there should be
one or more corresponding ereports in the
zpool events output.