by Margaret Bierman with Lenz Grimmer
Published August 2012
In my last article, "How I Got Started with the Btrfs File System for Oracle Linux," I provided an overview of the file system and illustrated how to start using its features. In this article, I continue the exploration and delve into some of the interesting—and sometimes less obvious—features of Btrfs. While Btrfs includes a number of advanced capabilities, this article focuses on those that can be used easily and have immediate benefit to users, such as redundant configurations, data integrity options, compression, snapshots, and performance enhancements.
My research and the examples provided throughout this article are based on the version of Btrfs available in Oracle Linux 6 with the Unbreakable Enterprise Kernel Release 2 (Version 2.6.39).
Before diving into some of the advanced capabilities of Btrfs, let's review the basic creation and deletion mechanisms. The example in Listing 1 creates a file system on the device
/dev/sdb and mounts it on
/* Create a Btrfs file system on a device using default options */ # mkfs.btrfs /dev/sdb adding device /dev/sdb id 2 fs created label (null) on /dev/sdb nodesize 4096 leafsize 4096 sectorsize 4096 size 10.00GB Btrfs Btrfs v0.19 /* Mount the newly created file system */ # mount /dev/sdb /mnt
Listing 1. Creating and Mounting a File System
To copy files to and delete files from the newly created file system, use the
rm commands, as shown in Listing 2.
/* Create a subvolume named MYFILES */ # cd /mnt # btrfs subvolume create MYFILES /* Copy files to the new subvolume */ # cp myfile* /mnt/MYFILES /* List the files */ # ls /mnt/MYFILES myfile1 myfile2 myfile3 /* Delete myfile2 */ # rm /mnt/MYFILES/myfile2 /* List the files */ # ls /mnt/MYFILES myfile1 myfile3
Listing 2. Copying and Deleting Files
It is important to note that recursively removing files with the
rm -rf command can be very time consuming, particularly if millions of files reside on scores of disk drives. Instead, use the
btrfs subvolume delete command, which only needs to "walk" the metadata structures to execute the deletion.
/* Remove subvolume MYFILES */ # btrfs subvolume delete MYFILES
Note that the default subvolume cannot be deleted, because that would result in the destruction of the file system. The only clean way to destroy the default subvolume is to rerun the
mkfs.btrfs command, which would destroy existing data. As a result, it is important to think ahead when creating the initial Btrfs file system and default subvolume.
With Btrfs, you no longer need to use
mdadm to create mirrored volumes or complex RAID configurations. These capabilities are built into the file system. To start, a Btrfs file system can be created on one or more devices. Additional disk drives can be added at any time to expand capacity, and they do not need to be the same size or have similar geometry. However, performance can be impacted if the drives have radically different performance characteristics.
By default, Btrfs mirrors metadata across two devices and stripes data across all devices underlying the file system. If only one device is in use, metadata is duplicated on that device and is commingled with the data store. Different RAID modes are supported for data and metadata, even on the same disks.
Continuing with the previous example, the command in Listing 3 adds device
/dev/sdc to the Btrfs file system. Until new files are stored, or a rebalance is triggered, data is not migrated to the new device.
/* Add a new device to the existing Btrfs file system */ # btrfs device add /dev/sdc /mnt/btrfs /* Verify the addition of the device to the file system */ # btrfs filesystem show Label: none uuid: b4f5c9a8-d8ec-4a5b-84f0-2b8c8d18b257 Total devices 2 FS bytes used 200.33MB devid 1 size 5.00GB used 5.00GB path /dev/sdb devid 2 size 5.00GB used 4.98GB path /dev/sdc
Listing 3. Adding a New Device to the File System
I discovered that Btrfs uses the term RAID differently than drive controller RAID implementations use the term. While Btrfs and traditional RAID are similar in concept, Btrfs redundancy is implemented at the chunk level rather than at the drive level. For example, traditional hardware and software RAID implementations aggregate two or more disks into a RAID1 or RAID5 logical volume and stripe data across all disks. In Btrfs, portions of disks, called chunks, can be used to create RAID logical volumes.
In Btrfs, chunks are at least 256 MB and can be mirrored or striped across multiple devices. Chunk and device trees link device items to underlying physical chunks as chunk map items. Every chunk and device is assigned a universally unique identifier (UUID). Data and metadata can be stored with different RAID levels to maximize availability. Critical information, such as metadata and device and extent trees, always is mirrored to improve the survivability of data.
You can control metadata and data RAID levels when creating the file system. For example, the commands in Listing 4 can be used to create a RAID1 (mirrored) configuration and a RAID10 (striped and mirrored) configuration. The metadata and data profiles do not need to match. In fact, metadata, data access patterns, and integrity requirements tend to be different enough that the profiles should be different.
/* Example that creates a RAID1 mirror for both data and metadata */ # mkfs.btrfs -m raid1 -d raid1 /dev/sdb /dev/sdc mkfs.btrfs -m raid1 -d raid1 /dev/sdb /dev/sdc WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using adding device /dev/sdc id 2 fs created label (null) on /dev/sdb nodesize 4096 leafsize 4096 sectorsize 4096 size 10.00GB Btrfs Btrfs v0.19 /* Example that creates a RAID10 striped mirror */ # mkfs.btrfs -m raid10 -d raid10 /dev/sdd /dev/sde WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using adding device /dev/sde id 2 fs created label (null) on /dev/sdd nodesize 4096 leafsize 4096 sectorsize 4096 size 10.00GB Btrfs Btrfs v0.19 /* Example that mixes RAID levels for data and metadata */ /* Mirror the metadata, and stripe and mirror user data */ # mkfs.btrfs -m raid1 -d raid10 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
Listing 4. Creating a RAID1 and RAID10 Configuration
Btrfs includes a number of built-in data integrity mechanisms:
/* Initiate a check of the file system */ # btrfs scrub start /mnt/MYFILES
dm_cryptdisk encryption subsystem and Linux Unified Key Setup (LUKS) layer, which supports a variety of encryption standards. However, this approach disables some of the capabilities and advantages of using Btrfs on raw block devices, such as automatic solid-state disk support and detection.
Btrfs offers compression functionality designed to optimize storage capacity utilization. Compression is supported on a per mount basis, and can be enabled after the subvolume is created. Only files created after the file system is mounted with the compression option are compressed. Once enabled, Btrfs automatically tries to compress files using Lempel-Ziv-Oberhumer (LZO) or
zlib compression. (Other compression algorithms, such as Snappy and LZ4, are in development.) If a file does not compress well, it is marked as not compressible and written to disk uncompressed. In this case, Btrfs does not make additional compression attempts. A
force-compress option is available in case newly added file content can be compressed.
The following command illustrates how to enable compression for a file system at mount time:
/* Compression can be set at the file system-level by mounting */ /* The file system with compression enabled */ # mount -o compress=lzo /dev/sdb /mnt/MYFILES
subvol option can be used to enable compression on a subvolume. The following commands create a subvolume and mount it with compression enabled:
/* Create a subvolume named mysubvol */ # btrfs subvolume create /mnt/MYFILES/mysubvol /* Mount the subvolume and enable compression */ # mount -o compress=lzo,subvol=mysubvol /dev/sdb /mnt/MYSUBVOL
The copy-on-write nature of Btrfs makes it easy for the file system to provide several features that facilitate the replication, migration, backup, and restoration of information.
/* Copy two files to the MYFILES subvolume */ # cp myfile* /mnt/MYFILES /* List the contents of the source subvolume */ # ls /mnt/MYFILES myfile1 myfile2 /* Create a snapshot of the MYFILES subvolume and put it in /mnt/SNAPSHOT */ # btrfs subvolume snapshot /mnt/MYFILES /mnt/SNAPSHOT /* List the contents of the snapshot subvolume */ # ls /mnt/SNAPSHOT myfile1 myfile2
Listing 5. Creating a Snapshot of a Subvolume
cp --reflinkcommand. Clones are lightweight copies—only an inode is created, and it shares the same disk blocks as the original file. The following example clones the file
myfile1, naming the cloned version
/* Clone the file named myfile1, creating the clone myfile3 */ # cp --reflink /mnt/MYFILES/myfile1 /mnt/MYFILES/myfile3 /* List the contents of the source subvolume */ # ls /mnt/MYFILES myfile1 myfile2 myfile3
btrfs subvolume find-new) that identifies which files have changed on a given subvolume. I find this feature faster than traversing the entire file system with the
find -mtimecommand to locate changed files. Obviously, you can use commercial backup applications, simply using a snapshot as the source. For other ideas of how to go about backing up Btrfs file systems, see "Do-It-Yourself Backup System Using Rsync and Btrfs" and "Incremental Backups with Btrfs ."
I find that there are many different backup styles and disciplines that match different customer use cases. When Btrfs snapshots are available, all the methods work better since you do not have to quiesce active operations to the files being backed up.
An advanced use of this facility is the conversion of the root (/) file system to Btrfs. Using file system conversion with the yum-plugin-fs-snapshot facility permit rollbacks, such as undoing a software installation.
These days, no matter what your type of work, system performance matters. Btrfs provides functionality and device support designed to improve file system performance characteristics.
Flash memory is low-cost, nonvolatile computer memory that can be electrically erased and reprogrammed. Most of us use Flash technology on a regular basis in the form of memory cards we put in our digital cameras and the removable USB drives we use to back up and transport data from one machine to another. In the enterprise, Flash technology is used in solid-state disk drives (SSDs) to increase application performance. Wear-leveling is performed in the hardware to foster data integrity.
Btrfs is SSD-aware and exploits TRIM/Discard to allow the file system to report unused blocks to the storage device for reuse. On SSDs, Btrfs avoids unnecessary seek optimization and aggressively sends writes in clusters, even if they are from unrelated files. This results in larger write operations and faster write throughput, albeit at the expense of more seeks later. This article has some dated, but still very meaningful performance examples.
Over the years, I have noticed that file systems that experience a great deal of churn that fragments available capacity tend to deliver lower performance. Btrfs provides a mount option (
-o autodefrag) that enables an auto-defragmentation helper. When a block is copied and written to disk, the auto-defragmentation helper marks that portion of the file for defragmentation and hands it off to another thread, enabling fragmentation to be reduced automatically in the background. This capability can provide significant benefit to small database workloads, browser caches, and similar workloads. The great thing is that defragmentation can take place while the file system is mounted and actively performing operations.
The following command shows how to initiate file system defragmentation for the Btrfs file system.
/* Initiate a defragmentation operation on the mounted Btrfs file system */ # btrfs filesystem defrag /mnt
While it is a young file system, Btrfs has matured at a fast pace. Today, it has a wide range of built-in capabilities—redundant configurations, data integrity options, compression, snapshots, and performance enhancements—that elevate it to an enterprise-class file system. If you use Oracle Linux, Btrfs is a natural choice for deploying high-performance, robust platforms.
The following resources provide more information on the capabilities of Btrfs:
Lenz Grimmer is a member of the Oracle Linux product management team. He has been involved in Linux and Open Source Software since 1995.
Margaret Bierman is a senior writer and trainer specializing in the research and development of technical marketing collateral for high-tech companies. Prior to writing, she worked as a software engineer on optical storage systems, specializing in the development of cross-platform file systems, hierarchical storage management systems, device drivers, and controller firmware. Margaret was also heavily involved in related standards committees, as well as training ISVs and helping them implement solutions. She received a B.S. in computational mathematics from Rensselaer Polytechnic Institute.