LibSAM for Sun StorageTek SAM and QFS: Beyond Backup

   
By Svati Chandra and the Sun StorageTek SAM-QFS Development Team, June 2007  

When applied to large file systems, traditional backup software presents three inherent problems:

  • The time needed to back up data, which is also the time during which data is taken offline
  • The time consumed to restore the data, in case of disaster
  • The high cost of resources required for storage, network, tape, and administration

An alternative is to use the LibSAM library for the Sun StorageTek Storage Archive Manager (SAM), a paradigm for data management that goes beyond backup.

What Is the LibSAM Library?

Designed to use with Sun StorageTek SAM and Sun StorageTek QFS software, the LibSAM library or API allows you to manage data in a samfs file system from within an application.

The model employed is client-server: A client process makes requests to a server process. The server processes the requests and returns the processing status to the client. In the simplest case, as is the case with LibSAM, the server and client run on the same machine. Therefore, all requests are local and translate into system calls to the kernel.

Basic Concepts and Implementation

Before this article delves into the details of how LibSAM is used to overcome the limitations of traditional backup mechanisms, you should understand some basic concepts associated with each function.

This article will first discuss the four major components of the Sun StorageTek SAM archive management system:

Archiving

Archiving, the process of backing up a file by copying it from a file system to archive media, is typically the first component. The archive media can be a removable media cartridge or a disk partition of another file system. Figure 1 shows the basic components of the archiving process.

 
Figure 1: The Archiving Process
Figure 1: The Archiving Process

To archive a file managed by a samfs file system, LibSAM provides the function sam_archive.

     #include "/opt/SUNWsamfs/include/lib.h"
     int sam_archive(const char *path, const char *ops);

Given a pathname for a file, sam_archive sets the archive attributes on it, as specified by the ops argument. Typically, this argument is i, implying that the file be archived immediately. Table 1 lists all the archive options.

Table 1: Archive Options
 
 
Archive Option
Description
C
Specifies concurrent archiving for the file. This means that the file can be archived even if opened for writing.
I
Supports inconsistent archive copies. This means that an archive copy can be created even if the file is modified while it is being copied to the media.
d
Resets the archive attributes on the file to the default.
i
Specifies that the file be immediately archived if it is not already archived.
w
Waits for the file to have at least one archive copy before completing.
W
Waits for the file to have all its required archive copies before completing.
n
Specifies that this file never be archived.

As a best practice, an application should create and maintain four copies of each archive set. These copies should be stored on their own media to protect against data loss.

The copy managed directly by the file system is referred to as the on-disk copy, which is stored in the disk cache. Note that you can configure the archiving parameters in the /etc/opt/SUNWsamfs/archiver.cmd file.

Releasing

After a period of time during which a file's data has not been accessed or modified, the software lets go of or releases the file's copy on disk. Releasing is the process of making disk cache space available by identifying archived files for which all archive copies have been made and freeing their online disk space.

The releaser is the second major component of Sun StorageTek SAM. Figure 2 illustrates the releasing process. This frees disk space to allow creation of other files or retrieval of other file data from the archive media. Note: The releaser can release only files that have at least one active archived copy.

 
Figure 2: The Release Process
Figure 2: The Release Process

To release a file managed by a samfs file system, LibSAM provides the function sam_release.

     #include "/opt/SUNWsamfs/include/lib.h"
     int sam_release(const char *path, const char *ops);

Given a pathname for a file, the sam_release function sets the release attributes on it as specified by the ops argument. If this argument is i, the file is released immediately. Table 2 lists all the release options.

Table 2: Release Options
 
 
Release Option
Description
a
Sets the attribute that specifies that a file's disk space be released when at least one archive copy of the file exists
d
Resets the release attributes on the file to the default
i
Specifies that the file's disk space be released immediately
n
Specifies that the disk space for this file never be released
p
Sets the partial attribute on the file so that, when the file's disk space is released, the first portion of that disk space will be retained
s n
Sets the partial attribute on the file so that, when the file's disk space is released, the first n kilobytes of that disk space will be retained
 
 
Staging

Staging is the process of copying file data from an archive copy back into the disk cache. When a file whose data blocks have been released is accessed, the staging software automatically copies the file data -- or a portion of it -- back to the online disk cache. The stager is the third major component of Sun StorageTek SAM. Figure 3 illustrates this concept.

 
Figure 3: The Staging Process
Figure 3: The Staging Process

To stage a file managed by a samfs file system, LibSAM provides the function sam_stage.

     #include "/opt/SUNWsamfs/include/lib.h"
     int sam_stage(const char *path, const char *ops);

Given a pathname for a file, sam_stage sets the staging attributes on a file or directory as specified by the ops argument. If the argument is i, the file is staged immediately.

The p argument means to partially stage, allowing you to view sections of a file before it is fully copied in from archive media. This is especially useful if the file is very large. Partial staging causes an offline file's partial blocks to be staged. Partial staging optimizes system resources for better performance while allowing access to part of the file's data. This is particularly useful when you are dealing with very large files.

Table 3 shows the full list of staging options.

Table 3: Staging Options
 
 
Staging Option
Description
a
Sets the associative staging attribute on the file or directory
d
Resets the stage attributes on the file to the default
i
Specifies that the file be staged immediately
n
Specifies that the file never be automatically staged
p
Partially stages the blocks back online
s
Disables associative staging for the current stage
w
Waits for the file to be staged back online before completing
1,2,3,4
Stages the archive copy specified by the option
 
 
Recycling

Over a period of time, archived copies can become redundant by their age and can be deleted from archive media to create space for newer versions or copies. Archived file data is stored in a tar format along with other file data. The process of removing redundant archived file data and moving other good archived file data is called recycling.

Recycling conserves archive media by reclaiming expired archive data, thus making room for additional file data to be archived. Recycling is the fourth and last major component of Sun StorageTek SAM. Figure 4 illustrates this concept.

 
Figure 4: The Recycling Process
Figure 4: The Recycling Process

Recycling is a process that typically does not require user interaction. Often, the recycler is invoked through root's crontab file at an off-peak time. However, the recycler can be invoked at any time using the sam-recycler command. For more information, download the man pages.

Real-Life Example

To put the pieces together, consider the example of a hospital, in which large volumes of data about patients are generated and subsequently sit unused for long periods of time. ( Download the example program.)

A patient's hospital records might have the following data storage history:

  1. A patient comes in for surgery, and the physician creates the patient's file. It is placed on disk for the first time and immediately archived to tape.

  2. While the patient is in the hospital, the data in the patient's file is accessed multiple times a day and remains on disk. Each time it changes, it is archived.

  3. As the tape library fills, tapes are recycled and relabeled, removing access to obsolete copies of the patient's file. In this setting, obsolete archives are not kept, because in medical practice, a current patient file contains the patient's entire history. Everything that was in the obsolete archives will also therefore be in the current files.

  4. A week after the patient is discharged, the patient's file is released from disk.

  5. A month after the patient is discharged, the patient returns for a follow-up appointment. The patient's file is staged from tape, modified, and archived.

  6. When the tape library fills, the tapes containing the last, most current archives of the patient's data are exported, moved to storage, and replaced with empty tape.

  7. Years later, when that patient returns to the hospital, the patient's file is accessed and is called up from the storage facility and imported. The file data is then copied from tape back to the disk cache.

Overcoming Limitations of Traditional Backup Software

In current backup methodologies, files are copied repeatedly at each full backup cycle, regardless of whether they have changed. With the Sun StorageTek Storage Archive Manager (SAM) software, a file never needs to be copied again unless it changes, because it is protected by the four copies made to different media.

Should a file change after it is archived, the advanced file system in Sun StorageTek SAM software automatically makes new copies to new media or new locations, protecting the new versions of this file again, without having to wait for a full or incremental backup that traditional backup software requires.

After the new copy is made successfully, the file's metadata is updated to reflect its new location and media. This technique of automatically copying files only after they have been created or modified effectively eliminates the need for a specific period to back up data and can significantly decrease administrator overhead.

In addition, the software features the capability to copy and archive from disk to disk, allowing organizations to develop more resilient disaster-recovery scenarios by copying to remote sites.

Restoring a file system protected by Sun StorageTek SAM software is also extremely fast when compared to traditional backup software because only the metadata needs to be restored before the file system can be mounted and used. This can take minutes rather than the hours or days that a full restoration from tape requires.

Once the metadata is restored and the file systems mounted, all references to the data are satisfied from the Sun StorageTek SAM archive. Files are migrated back to online storage as they are accessed, providing transparent access to the actual file data from near-line storage. And Sun StorageTek SAM software's read-behind feature enables users to begin reading the file even before it is fully restored, significantly benefiting users who need to access large files.

Last but not least, because Sun StorageTek SAM software copies only new and changed files, only the tape space for these files is necessary. The software can easily keep up with the rapid scaling and explosion of data in today's economy by never having to perform a full backup of all old and new data.

Not only does this save total cost of ownership by eliminating the need to add media to preserve old and stale data, it also saves valuable time and money by limiting administrator overhead and reducing the number of tape devices that are required to accomplish a full backup within a given time.

Additional Features

Enhanced policy-based administration and security features include quotas and access control lists (ACLs) to control space consumption and data access.

In addition, Sun StorageTek SAM includes the ability to manage very large files by making use of segments, as well as the ability to extend continuous archiving capabilities to remote sites with SAM Remote, libsamrpc. For an introduction to libsamrpc, download the man pages and refer to the intro_libsam(3) man page.

Table 4 shows the LibSAM API's functions and descriptions at a glance.

Table 4: LibSAM API
 
 
Function
Description
sam_advise
Sets file attributes.
sam_archive
Sets archive attributes on a file or directory.
sam_rearchive
Sets rearchive attributes on a file or directory.
sam_exarchive
Exchanges archive copies of a file or directory.
sam_unarchive
Removes archive copies for a file or directory.
sam_unrearch
Removes rearchive attributes on a file or directory.
sam_damage
Sets damaged attribute on a file or directory.
sam_undamage
Clears damaged and stale status of a file or directory.
sam_cancelstage
Cancels a pending or in-progress stage on a file.
sam_closecat
Ends access to the catalog for an automated library.
sam_devstat, sam_ndevstat
Gets device status.
sam_devstr
Translates numeric device status into a character string.
sam_getcatalog
Obtains a range of entries from the catalog for an automated library.
sam_opencat
Accesses the volume serial name (VSN) catalog for an automated library.
sam_readrminfo
Gets information for a removable media file.
sam_release
Releases and sets release attributes on a file.
sam_request
Creates a removable media file.
sam_restore_copy
Creates an archive copy for an existing file.
sam_restore_file
Creates an offline file.
sam_segment
Sets segment attributes on a file or directory.
sam_segment_stat
Obtains file information and follows symbolic links to a segmented file.
sam_setfa
Sets file or directory attributes.
sam_ssum
Sets checksum attributes on a file.
sam_stage
Stages and sets stage attributes on a file or directory.
sam_stat, sam_lstat
sam_stat obtains file information and follows symbolic links to the file. sam_lstat obtains file information, and if that file is a link, it returns information about the link.
sam_vsn_stat, sam_segment_vsn_stat
Obtains VSN status for a file or a file's data segment that overflows VSNs.
 
 

All the APIs in LibSAM, except for sam_closecat, sam_getcatalog, and sam_opencat, are available for use with 64-bit programs.

For more details about each library routine, see the individual corresponding man page for that routine. Library routines contained in LibSAM are found in section 3 of the downloadable man pages.

Summary

Sun StorageTek SAM is often combined with Sun StorageTek QFS software. With this combination, also known as Sun SAM-QFS, Sun presents a new approach that helps organizations manage information assets according to their business needs. The software enables dynamic archiving, reduced backup windows, and fast recovery to help enhance productivity and improve resource use. It consolidates innovative archiving and backup methodologies in a high-performance file system with virtually unlimited scalability.

The software replaces traditional backups to improve storage resource use for applications in which data needs to be available continuously and quickly restored in the event of a business disruption. Administrators can set automatic archiving policies to determine when, where, and how information is stored, ensuring cost-effective management of large volumes of data. Metadata archiving and read-behind features help enterprises recover from business disruptions in minutes or hours, as opposed to days, and they let users begin reading files even before they are fully restored.

When you put it all together, Sun StorageTek SAM-QFS software enables enterprises to get great value from their information, meeting demanding business requirements across a wide array of applications, regulations, user needs, and corporate policies, while delivering lower overall costs.

References and Download Information

Download: LibSAM man pages
Download: LibSAM
Sun StorageTek SAM documentation
Sun StorageTek QFS documentation
Sun SAM-QFS project page on OpenSolaris.org

About the Authors

Svati Chandra is a staff member of the Sun StorageTek SAM-QFS development team.

The Sun StorageTek SAM-QFS development team works toward providing a cost-effective, high-performance, end-to-end solution for information lifecycle management (ILM).

Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.