Understanding ZFS, the Zettabyte File System

ZFS is a project started at Sun Microsystems as a part of the Solaris operating system. It is an open-source, next generation filesystem with volume management capabilities, which allows users to group physical storage into unified pools featuring advanced features such as redundancy, automatic data repair, deduplication, compression and many more.

In-depth overview Link to heading

zpools Link to heading

At a higher level, ZFS works by arranging storage into zpools, each zpool is made of several vdevs grouped together, which in turn are made by aggregating several physical devices together. While the vdevs provide the underlying data redudancy and fault tolerance, the zpool provides the actual storage space. This storage space can consist of:

datasets: a set of files and directories mounted inside the zpool itself
zvolumes (or zvols): virtual block devices that behave like regular physical storage

Both dataset and zvols are governed by a series of attributes. Those attributes enable features such as data encryption, compression, deduplication and usage quotas. Futhermore, it is possible to take regular snapshots of the zpool. Those snapshots can then be browsed or even rolled-back to restore a previous state of pool.

vdevs and RAID-Z Link to heading

Every zpool is made by striping together one or several vdevs. ZFS offers several options to create vdevs, each with different levels of redudancy. The two simplest options to create a vdev are:

stripe: data is striped across every drive of the vdev. Very fast but doesn’t provide any protection against hardware fault.
mirror: a copy of the data is written to every drive of the vdev. Safe against hardware fault but only provides the capacity of a single drive.

ZFS also provides RAID-like topologies, called RAID-Z, to configure vdevs. Those are:

RAID-Z1: single parity is used and distributed in the vdev thus allowing the loss of only one physical drive. Total capacity equals the sum of the capacity of all drives minus the capacity of one drive. Similar to standard RAID-5.
RAID-Z2: same as RAID-Z1 but with double parity. Total capacity equals the sum of the capacity of all drives minus the capacity of two drives. Can lose up to two drives. Similar to standard RAID-6.
RAID-Z3: same as RAID-Z1 but with triple parity. Total capacity equals the sum of the capacity of all drives minus the capacity of three drives. Can lose up to three drives.

Setup Link to heading

Hardware requirements Link to heading

When it comes to buy or build a system for use with ZFS, it is important to remember that:

ZFS needs direct access to the physical drives present in your system. This makes RAID cards unsuitable for use with ZFS. Also make sure that your disk controller is operating in a JBOD fashion (on that note, some controllers may need to be reflashed with a specific firmware in order to do JBOD)
Try to steer away from SATA port multipliers or chipset SATA controllers, as they tend to introduce bottlenecks. SATA/SAS HBAs are ideal for use with ZFS.
If you’re using mechanical hard drives for your zpool, check if your drives use Conventional or Shingled Magnetic Recording. SMR drives are slower and can introduce major bottlenecks. Always use CMR drives with ZFS.
If you’re using solid state drives, make sure that your drives have a DRAM cache. DRAM-less drives are slower to the point that they can even stop responding during intense operations. Also, if you don’t have a UPS, consider using power loss protected SSDs.

Installation Link to heading

The most commonly used ZFS implementation is OpenZFS, availble for both Linux and FreeBSD systems. Due to licensing incompatibilities, Linux doesn’t have native ZFS support (FreeBSD does), meaning that Linux users need to install ZFS manually. On Linux, ZFS is shipped either as a DKMS module or as a kABI-tracking kmod (for RHEL-based distros). For this reason it is important to choose a distro with a stable kernel (like Debian for example) as ZFS can break on kernel updates.

Basic usage Link to heading

Creating a zpool Link to heading

To create and manage a ZFS pool, you’ll only need to use the zfs and zpool tools. The zpool create command is used to create a new zpool, with a new dataset of the same name automatically created in the zpool root (the zpool itself should be automatically mounted inside the system root). Here are some examples:

zpool create zfs_test /dev/sda: creates a zpool composed of a single vdev containing only one physical device.
zpool create zfs_test mirror /dev/sda /dev/sdb: creates a zpool composed of a single vdev containing two physical devices in mirror.
zpool create zfs_test raidz /dev/sda /dev/sdb /dev/sdc: creates a zpool composed of a single vdev containing three physical devices in RAID-Z1.

You can use the zpool list command to check the list of the currently known zpools and some usage statistics. The zpool status command gives you real-time information on the condition of the zpools and the health of the vdevs. If you were to encounter hardware failure, given adequate redundancy, the pool will continue to operate in a degraded state. Once the failing drive(s) are replaced, ZFS will automatically begin to restore the pool to its original conditions. This process is known as resilvering.

Creating datasets and zvols Link to heading

You can create both datasets and zvols using the zfs create command. Here are a couple of examples:

zfs create zfs_test/mydata: creates dataset mydata inside the zfs_test zpool
zfs create -V 5gb zfs_test/vol: creates zvol vol inside the zfs_test zpool

To destroy a dataset/zvol, you can use the zfs destroy command.

Warning

No confirmation prompt appears with the destroy subcommand. Use it with extreme caution

Snapshots and scrubbing Link to heading

The zfs snapshot command takes a snapshot of the zpool in its current state. The snapshots are listed in the zpool-path@snapshot-name format. To make a snapshot, run:

zfs snapshot zfs_test/mydata@snap: takes snapshot snap of the mydata dataset

To list all snapshots present on your zpools, run zfs list -t snapshot.

Another common ZFS operation is scrubbing, which consists in checking data integrity on the pool and eventually rebuilding corrupt data. To manually scrub a pool, run zpool scrub <pool-name>.

Info

This might take some time and is quite I/O intensive. Schedule your scrubs wisely

Conclusion Link to heading

This is just a basic overview of ZFS features and capabilities. Please refer to the official documentation for more information on ZFS, and be sure to have a good understanding of ZFS before pouring important data into it. Happy hacking ✌️

In-depth overview Link to heading

zpools Link to heading

vdevs and RAID-Z Link to heading

Setup Link to heading

Hardware requirements Link to heading

Installation Link to heading

Basic usage Link to heading

Creating a zpool Link to heading

Creating datasets and zvols Link to heading

Snapshots and scrubbing Link to heading

Conclusion Link to heading

External links Link to heading