What You See Is What You Get Element

How I Got Started with the Btrfs File System for Oracle Linux

by Margaret Bierman with Lenz Grimmer

What Margaret Bierman discovered about the Btrfs file system in Oracle Linux, including an introduction to its basic administrative tasks and commands.


Published July 2012

Introduction

This article describes the basic capabilities that I discovered while becoming familiar with the Btrfs file system in Oracle Linux, plus the instructions I used to create a file system, verify its size, create subdirectories, and perform other basic administrative tasks.  A second article describes how I use the advanced capabilities of the Btrfs file system.
 

About the Btrfs File System

The Oracle Linux operating system provides advanced methods for storing and organizing data on disk storage systems, such as the ext3, ext4, and XFS file systems, the Oracle Cluster File System 2 (OCFS2) clustered file system, and the next-generation Btrfs file system. Each has its own characteristics and feature sets, allowing administrators to select the one that best fits data storage needs and requirements.

Want technical articles like this one delivered to your inbox?  Subscribe to the Systems Community Newsletter—only technical content for sysadmins and developers.

The Btrfs file system provides the following advanced capabilities:

  • Supports large files and file systems
  • Offers integrated volume management
  • Has built-in RAID functionality
  • Keeps data secure using copy-on-write and checksumming techniques
  • Provides writable snapshots

Our research is based on the version of Btrfs available in Oracle Linux 6 with the Unbreakable Enterprise Kernel Release 2 (Version 2.6.39).

File System Design Goals and History

Created by Chris Mason at Oracle, the initial design for Btrfs has its roots in a presentation by Ohad Rodeh about copy-on-write friendly B-tree implementations at the USENIX FAST '07 conference. Mason based the Btrfs design on his experience developing the ReiserFS file system (extent-based storage, packing of small files) and the idea to store data and metadata in B-tree structures. After several months of internal development, Btrfs was presented to the Linux community in June 2007. Since then, Oracle engineers have continued to maintain and advance its development. They work in close collaboration with many contributors from the Linux community, including engineers from Linux distributors, such as Red Hat and SUSE, and other companies, such as Dreamhost, Fujitsu, HP, IBM, and Intel. Today, Btrfs is included in the mainline Linux kernel and is gaining popularity through several Linux distributions, including Oracle Linux.

Getting to Know Btrfs

When researching Btrfs, I discovered it has a wealth of functionality built into it.

  • Scalability and volume management. First and foremost, Btrfs is a scalable, 64-bit file system that can span large volumes to provide files and file systems as large as 16 exabytes. Included is functionality to manage multiple underlying storage devices. This functionality is similar to that traditionally provided by logical volume management tools. For example, Btrfs allows a file system to span multiple devices and present a single logical address space. Unlike most file systems, Btrfs even makes it easy to shrink the size of a single logical volume. In addition, devices can be added or removed while file systems remain online. When a device is removed, the extents stored on it are redistributed to other devices in the file system. Because these features are built into the file system, we think they have better insight into underlying storage and can optimize access patterns and data distribution.
  • Write methodology and access. Btrfs utilizes a B-tree structure to store data types and point to information stored on disk. Unlike other file systems, it does not journal transactions. As a result, writes are performed once, removing the limitations that result from journal size and reducing wear caused by repetitive writing of the same section of hard disk or solid-state disk (SSD). A copy-on-write technique ensures blocks and extents are not overwritten in place. They always are copied to a new location first. Extended attributes and POSIX-compliant Access Control Lists (ACLs) limit the access and manipulation of file system contents by users and applications.
  • Tunables. Btrfs provides minimal user tuning to guard against misuse. One interesting option is the -o autodefrag mount option that enables auto-defragmentation. Another is the ability to disable copy-on-write via the nocow option, which can help to minimize fragmentation, particularly for files with sequential access requirements, such as databases and streaming media. In this mode, file blocks are overwritten in place, similar to traditional file systems.

Data Integrity

Btrfs uses a number of built-in features to ensure data integrity.

  • Redundant configurations. Btrfs supports device mirroring and RAID configurations to improve data survivability and ease data reconstruction. By default, Btrfs mirrors metadata across two devices and stripes data across all devices underlying the file system. Even on a single device, metadata is duplicated and maintained in two locations for redundancy.
  • Checksums. Btrfs generates checksums for data and metadata blocks to preserve the integrity of data against corruption. Checksums are verified each time a data block is read from disk. If the file system detects a checksum mismatch while reading a block, it first tries to obtain (or create) a good copy of this block from another device—if mirroring or RAID techniques are in use. If a good copy is found, it is returned instead and the bad block is corrected.
  • Fault isolation and checksum algorithms. Btrfs provides fault isolation by storing metadata separately from user data, and it provides additional protection through CRCs. The CRCs are stored in a B-tree that is separate from the data to provide fault isolation.
  • Rebuild times. As aptly noted by Mason, Btrfs rebuilds involve only the blocks actively used by the file system. As drive capacities increase, this is a considerable advantage over traditional file system and RAID protection mechanisms. In traditional approaches, the time to rebuild high-capacity drives can be measured in days, during which time there is no protection.
  • Encryption. Btrfs does not provide built-in encryption functionality yet. An encrypted Btrfs file system can be created on top of the dm_crypt disk encryption subsystem and Linux Unified Key Setup (LUKS) layer, which support a variety of encryption standards. However, this approach disables some of the capabilities and advantages of using Btrfs on raw block devices, such as automatic solid-state disk support and detection.

Space Savings

Btrfs supports compression on a mount basis. It can be enabled at any time after the subvolume is created. Once enabled, Btrfs automatically tries to compress files using LZO or zlib compression. (Other compression algorithms, such as Snappy and LZ4, are in development.) If a file does not compress well, it is marked as not compressible and written to disk uncompressed. In this case, Btrfs does not make additional compression attempts. A force-compress option is available that tries to compress new writes in case newly added file content can be compressed.

Performance Enhancements

Btrfs provides functionality and device support designed to improve file system performance characteristics.

  • Solid-state disk support. Flash memory, such as the memory cards we put in our digital cameras and the removable USB drives we use to back up and transport data from one machine to another, is low-cost, nonvolatile computer memory that can be electrically erased and reprogrammed. In the enterprise, Flash technology is used in solid-state disk drives (SSDs) to increase application performance. Btrfs is SSD-aware, avoids unnecessary seek optimization, and aggressively sends writes in clusters, even if they are from unrelated files. This results in larger write operations and faster write throughput.
  • Online defragmentation. Over the years, we have noticed that file systems which experience a great deal of churn that fragments available capacity tend to deliver lower performance. Btrfs provides a mount option (-o autodefrag) that enables an auto-defragmentation helper. When a block is copied and written to disk, the auto-defragmentation helper marks that portion of the file for defragmentation and hands it off to another thread, enabling fragmentation to be reduced automatically in the background. This capability can provide significant benefit to small database workloads, browser caches, and similar workloads.

Subvolumes, Snapshots, and Seed Devices

The copy-on-write nature of Btrfs makes it easy for the file system to provide several features that facilitate the replication, migration, backup, and restoration of information.

  • Subvolumes. The linchpin of Btrfs, subvolumes are essentially named B-trees that hold files and directories. Subvolumes can optionally have quotas and are mounted as if they were disks.
  • Snapshots. In Btrfs, a snapshot starts as a copy of a subvolume taken at a given point in time. In essence, they are clones of a subvolume. When left unchanged, snapshots faithfully record the state of the subvolume at the time of the snapshot. Because snapshots are writable, they can be used as evolving clones of the original subvolume. You can create snapshots almost instantly, and initially they consume virtually no additional disk space. (The modest exception is a small amount of additional metadata.) This capability is useful when it is important to keep copies of older versions of a file hierarchy or move them to other systems for backup or restore operations. Individual files can be cloned using the cp -reflink command, which does for files what snapshots do for volumes.
  • Seed devices. Btrfs seed devices provide a read-only foundation to which multiple read/write file systems can point. All local updates go to these descendents. When the bulk of the data remains unchanged from the original seed file, there is considerable space savings. This can be considered another form of cloning.
  • Backup and restore. Btrfs does not provide built-in support for creating backups. A best practice is to create a snapshot of a volume and use traditional backup utilities to copy data off the file system. To help, a Btrfs feature is available (btrfs subvolume find-new) that identifies the files that have changed on a given subvolume. We find using this feature to be faster than traversing the entire file system with the find -mtime command to locate changed files.
  • Ext file system conversion. Btrfs supports the in-place conversion of existing, ext3 and ext4 file systems. The original ext3 or ext4 file system metadata is kept in a snapshot, so the conversion can be reversed if necessary. Obviously, if a converted file system is modified heavily or over a protracted period of time, the ability to go back could have limited practical value. If the file is reverted after only a short time, this can be a very useful feature. Once you have determined that you do not intend to revert, deleting the snapshot frees disk space.

Administrative Interface

Btrfs is managed primarily using command-line utilities. The only dedicated GUI tools available focus on operating system installation and basic support capabilities. Access to the advanced features of the file system generally is not provided. Table 1 lists the key Btrfs administrative commands.

Table 1. Btrfs Administrative Commands
Task Btrfs
Initialize a file system mkfs.btrfs
Administer an existing file system btrfs options

How to Create and Set Up a Btrfs File System

I found getting started with Btrfs to be very simple. To create file systems, you need to use the sudo command (or otherwise become the root user) and have unused disk devices attached to the system. The first step is to create a Btrfs file system using the mkfs.brtfs command. For example, I created a 10 GB file system that spans two physical 5 GB disks (dev/sdb and /dev/sdc), using default file system configuration parameters.

/* Create a Btrfs file system on two devices using default options */
# mkfs.btrfs /dev/sdb /dev/sdc

adding device /dev/sdc id 2
fs created label (null) on /dev/sdb
nodesize 4096 leafsize 4096 sectorsize 4096 size 10.00GB
Btrfs Btrfs v0.19

Next, I used the btrfs filesystem show command to verify the file system was created on the two devices.

/* Display the file system configuration */
# btrfs filesystem show /dev/sdb

Label: none  uuid: b4f5c9a8-d8ec-4a5b-84f0-2b8c8d18b257
Total devices 2 FS bytes used 28.00KB
devid    1 size 5.00GB used 1.53GB path /dev/sdb
devid    2 size 5.00GB used 1.51GB path /dev/sdc

Btrfs Btrfs v0.19

The next step is to make the file system visible to the operating system so that it can be used. I used the standard Oracle Linux mount command to mount the file system on /mnt. Note that only the first device that comprises the file system needs to be mounted.

/* Mount the newly created file system */
# mount /dev/sdb /mnt

/* Note: only the first device needs to be mounted. Btrfs takes care of the rest. */
# mount /dev/sdc /mnt

mount: /dev/sdc already mounted or /mnt busy
mount: according to mtab, /dev/sdb is already mounted on /mnt

Btrfs Btrfs v0.19

Next, I used the standard Oracle Linux df command to verify the size of the file system created followed by the btrfs filesystem df command to get more detailed file system information.

/* Display the size of the file system */
# df -h /mnt

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb         10G   56K  8.0G   1% /mnt

/* Get more detailed information */
# sudo btrfs filesystem df /mnt

Data, RAID0: total=1.00GB, used=0.00
Data: total=8.00MB, used=0.00
System, RAID1: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=1.00GB, used=24.00KB
Metadata: total=8.00MB, used=0.00

Once it was created and verified, I put the Btrfs file system to work. First, I created a subvolume—a named B-tree to hold directories and files—named subbasefoo.

/* Create a subvolume named subbasefoo */
# btrfs subvolume create subbasefoo

Create subvolume './subbasefoo'

Next, I created three empty files (foobar1, foobar2, and foobar3) in the subbasefoo subvolume using the standard Oracle Linux touch command.

/* Create three empty files named foobar1, foobar2, and foobar3 */
# touch foobar1 foobar2 foobar3

I wanted to determine how best to keep data safe. First, I created a snapshot of the subvolume using the btrfs subvolume snapshot command and verified its existence and contents using the standard Oracle Linux ls command. I named the snapshot subbasefoo-20120501.

/* Create a snapshot of the subbasefoo subvolume */
# btrfs subvolume snapshot subbasefoo/ subbasefoo-20120501

Create a snapshot of 'subbasefoo/' in './subbasefoo-20120501'

/* Verify the existence and contents of the snapshot by doing a recursive listing */
# ls -R
.:
subbasefoo  subbasefoo-20120501

./subbasefoo:
foobar1  foobar2  foobar3

./subbasefoo-20120501:
foobar1  foobar2  foobar3

Since snapshots persist until removed, deleting a file in the subbasefoo subvolume does not release any storage space. Keep in mind that disk space cannot be freed until all snapshots that reference the files in question are removed.

Snapshots are just subvolumes. As a result, all of the same commands apply. To emulate an "undo" facility, always create a new snapshot for experimentation. If you like the result, simply delete the previous generation snapshot. If you do not like the result, just delete the experimental version. It's a handy feature.

When it is necessary to have a zero-space copy of a single file, use the reflink option of the cp command. For example, the following commands verify the size of the subbasefoo subvolume and clone the file named rantest.tst (creating the file clonetest.tst). Subsequent use of the df command shows that the file clone does not consume additional disk space.

/* Create a clone of a 200 MB single file named rantest.tst */
# df -h .

Filesystem      Size  Used Avail Use% Mounted on
-                10G  201M  7.8G   3% /mnt/btrfs/subbasefoo

# cp --reflink rantest.tst clonetest.tst
# df -h .

Filesystem      Size  Used Avail Use% Mounted on
-                10G  201M  7.8G   3% /mnt/btrfs/subbasefoo

Final Thoughts

My research into Btrfs revealed that it addresses long-standing deficiencies found in conventional file systems. Better yet, setting up and using a Btrfs file system is quick and easy, particularly if default configuration parameters are used. These defaults provide a reasonable amount of data protection and improved functionality—and little or no effort is required compared to the default file system. Many advanced features are in place to help improve data integrity and reliability, unify volume management, increase device utilization, and more. In our view, it is the best file system to use when deploying Oracle Linux platforms. As always, the choice is up to you.

See Also

The following resources provide more information on the capabilities of Btrfs:

About the Authors

Lenz Grimmer is a member of the Oracle Linux product management team. He has been involved in Linux and Open Source Software since 1995.

Margaret Bierman is a senior writer and trainer specializing in the research and development of technical marketing collateral for high-tech companies. Prior to writing, she worked as a software engineer on optical storage systems, specializing in the development of cross-platform file systems, hierarchical storage management systems, device drivers, and controller firmware. Margaret was also heavily involved in related standards committees, as well as training ISVs and helping them implement solutions. She received a B.S. in computational mathematics from Rensselaer Polytechnic Institute.


Revision 1.0, 07/11/2012