Best Way to Automate ZFS Snapshots and Track Software Updates

Part II of Software Management Best Practices for Oracle Solaris 11 Express

By Ginny Henningsen, May 2011

Part I - Best Way to Update Software with IPS
Part II - Best Way to Automate ZFS Snapshots and Track Software Updates



Introduction

This is the second article in a series highlighting best practices in Oracle Solaris 11 Express. The first article, Updating Software, introduced the Image Packaging System (IPS) software packaging model and discussed the best practice for performing updates:  creating a new Boot Environment (BE) before applying an update. In some cases, such as with updates and full upgrades, IPS creates a new BE automatically.

BEs are a built-in safety net when you make software changes, much like Live Upgrade environments in Oracle Solaris 10. In Oracle Solaris 11 Express, the root file system is implemented using Oracle Solaris ZFS, so BEs are basically Oracle Solaris ZFS snapshots that are readable/writableand activated for booting. Because of this underlying technology, you can periodically generate snapshots of BEs just like you can take a snapshot of any Oracle Solaris ZFS volume.

This article describes how Oracle Solaris 11 Express uses automated snapshots to deliver Time Slider services, a new feature in the GNOME desktop. Using Time Slider, you can take periodic snapshots of the active BE, capturing the software state at regular intervals. This approach might prove useful if you forget to explicitly create a BE when you update software.

Since the history of software updates can be especially helpful in troubleshooting, this article also highlights commands for researching package changes. Note that all examples presume an authorized user (see “User Accounts, Roles, and Rights Profiles” in Getting Started With Oracle Solaris 11 Express).

Introducing Time Slider

The Time Slider service is a new component in the GNOME file manager (a.k.a., nautilus), and it relies on services that generate periodic Oracle Solaris ZFS snapshots. Similar in concept to Time Machine in Apple’s Mac OS, Time Slider provides an easy way for desktop users to restore individual files or directories from automatically scheduled, incremental snapshots of home directories. Once activated and set up, Time Slider services can generate “frequent” snapshots (every 15 minutes, by default), as well as hourly, daily, weekly, or monthly.

The services that automate periodic snapshots can do so for any file system, as well as BEs, even on non-desktop systems. However, to automatically snapshot using Time Slider mechanisms, packages for the GNOME desktop environment (for example, the slim_install package group) are required even when the gdm desktop manager is not running. (This is because Time Slider services use notification mechanisms that are part of the GNOME stack.)

On a Desktop, How Do I Enable Time Slider Services?

By default, the services that support Time Slider are disabled, even on desktop systems:

  # svcs time-slider
  STATE          STIME    FMRI
  disabled       16:33:56 svc:/application/time-slider:default
  # svcs auto-snapshot
  STATE          STIME    FMRI
  disabled       16:33:56 svc:/system/filesystem/zfs/auto-snapshot:monthly
  disabled       16:33:56 svc:/system/filesystem/zfs/auto-snapshot:weekly
  disabled       16:33:56 svc:/system/filesystem/zfs/auto-snapshot:daily
  disabled       16:33:56 svc:/system/filesystem/zfs/auto-snapshot:hourly
  disabled       16:33:56 svc:/system/filesystem/zfs/auto-snapshot:frequent

The easiest way to enable Time Slider services on a desktop is to use the GUI (enter time-slider-setup on the command line or use the System -»Administration-»Time Slider menu). The GUI (shown in Figure 1) allows you to select which file systems to snapshot, specify whether to back up snapshots to an external drive, and set the capacity policy for snapshot removal.

Oracle Solaris 11 Express - Automating Snapshots and Tracking Updates
Figure 1. Time Slider GUI

How Do I Enable Time Slider Services Using the Command Line?

First, if you are not using the GUI, designate the file systems and BEs to be captured via automatic snapshots. To do this, set the com.sun:auto-snapshot property to true, as shown in the following commands. The first command designates the home directory of user jdoe, and the second command designates the home directory of user rkai. In the output of the third command, values shown as true will be snapshotted.

  # zfs set com.sun:auto-snapshot=true rpool1/export/home/jdoe  
  # zfs set com.sun:auto-snapshot=true rpool1/export/home/rkai 
  # zfs get all |grep auto-snapshot

Since directories inherit Oracle Solaris ZFS properties from their parent, setting up automatic snapshots for /export/home captures all subdirectories in that hierarchy.

Whether or not you use the GUI to select what to snapshot, you must first enable Time Slider services. In the following example, all auto-snapshot services are enabled, as are the Time Slider plug-in and Time Slider services, which are also needed:

  # svcadm enable auto-snapshot:frequent
  # svcadm enable auto-snapshot:hourly
  # svcadm enable auto-snapshot:daily 
  # svcadm enable auto-snapshot:weekly 
  # svcadm enable auto-snapshot:monthly 
  # svcadm enable time-slider/plugin:zfs-send
  # svcadm enable time-slider/plugin:rsync
  # svcadm enable time-slider

Since auto-snapshot and time-slider/plugin services depend on the time-slider service, be sure to enable the time-slider service last (or restart it). Again, these services require GNOME desktop packages, and they go into maintenance mode if they are enabled when GNOME packages are not installed.

On a Desktop, How Do I Use Time Slider?

Once services are activated and Oracle Solaris ZFS properties are set (either through the GUI or command line, as described above), Time Slider functionality is available within the GNOME file manager. Start the file manager and click the clock icon in the file manager’s navigation bar. (If the clock icon is grayed out, Time Slider services haven’t been enabled properly. Go back and check the previous steps and then restart the file manager.)

Time Slider functionality allows the desktop user to navigate among captured snapshots. In Figure 2, the left-most file manager shows home directory contents of an earlier snapshot while the right-most file manager shows the current state. A directory called “Project A” appears in the earlier snapshot but no longer exists in the current directory.

Oracle Solaris 11 Express - Automating Snapshots and Tracking Updates

Figure 2. Comparing Time Slider Snapshots

On a Desktop, How Do I Restore a File Using Time Slider?

Suppose the “Project A” directory shown in Figure 2 was accidentally deleted. To restore it, the user simply drags and drops it from the previous snapshot into the desired target directory in the current state (labeled “Now” in the File Manager).

How Do I Automatically Snapshot a Boot Environment?

OK, great. Time Slider has obvious value for a user who’s inadvertently deleted or overwritten a file or directory, as in the “Project A” example above. Automatic snapshots, though, can periodically capture the current BE, which can be helpful if a software update goes awry. By regularly cloning the current software state in a BE, you can revert to a previous state if needed.

If you are an authorized user, you can initiate automatic snapshots of BEs using the GUI or command‑line equivalents. The procedure to designate a BE for automatic snapshots is basically the same as for a user’s home directory: Simply activate the services (as shown before) and set the BE’s com.sun:auto-snapshot property to true. As an example, these commands set the auto-snapshot property to true for the BEs solaris and BE2:

  # zfs set com.sun:auto-snapshot=true rpool1/ROOT/solaris
  # zfs set com.sun:auto-snapshot=true rpool1/ROOT/BE2

Of course, as an alternative, you can create a cron job to periodically snapshot a BE (issuing a command such as zfs snapshot rpool1/ROOT/solaris@backup). Additional examples of generating Oracle Solaris ZFS snapshots can be found in the Oracle Solaris ZFS Administration Guide.

How Do I List Available Snapshots?

The following command lists all snapshots, both those that are BEs and those that are not BEs:

  # zfs list -t snapshot
  NAME                                                 USED AVAIL REFER MOUNTPOINT
  rpool/ROOT/BE2@zfs-auto-snap_hourly-2011-03-10-11h10              0  -  3.49G  
  rpool/ROOT/BE2@zfs-auto-snap_frequent-2011-03-10-11h55            0  -  3.49G  -
  rpool/ROOT/solaris@install                                    26.5M  -  3.34G  -
  rpool/ROOT/solaris@2011-03-09-22:39:13                        11.6M  -  3.58G  -
  rpool/ROOT/solaris@zfs-auto-snap_hourly-2011-03-10-11h10      40.5K  -  3.58G  -
  rpool/ROOT/solaris@zfs-auto-snap_frequent-2011-03-10-11h40    40.5K  -  3.58G  -
  rpool/ROOT/solaris@zfs-auto-snap_frequent-2011-03-10-11h55        0  -  3.58G  -
  rpool/export/home/jdoe@zfs-auto-snap_monthly-2011-03-09-16h10 19.5K  -  39.0M  -
  rpool/export/home/jdoe@zfs-auto-snap_hourly-2011-03-09-22h10    19K  -  39.0M  -
  rpool/export/home/jdoe@zfs-auto-snap_hourly-2011-03-10-09h10   112K  -  39.0M  -
  rpool/export/home/jdoe@zfs-auto-snap_hourly-2011-03-10-10h10   282K  -   942K  -
  rpool/export/home/jdoe@zfs-auto-snap_hourly-2011-03-10-11h10    23K  -  31.6M  -
  rpool/export/home/jdoe@zfs-auto-snap_frequent-2011-03-10-11h25  23K  -  31.6M  -
  rpool/export/home/jdoe@zfs-auto-snap_frequent-2011-03-10-11h40  32K  -  38.1M  -
  rpool/export/home/jdoe@zfs-auto-snap_frequent-2011-03-10-11h55    0  -  38.1M  -

To show only snapshots for BEs, use this form of beadm:

  # beadm list -s
  BE/Snapshot                                      Space  Policy Created          
  -----------                                      -----  ------ -------          
  BE2
   BE2@zfs-auto-snap_frequent-2011-03-10-11h55     0      static 2011-03-10 11:55 
   BE2@zfs-auto-snap_hourly-2011-03-10-11h10       0      static 2011-03-10 11:10 
  solaris
   solaris@2011-03-09-22:39:13                     11.60M static 2011-03-09 16:39 
   solaris@install                                 26.54M static 2011-03-08 16:17 
   solaris@zfs-auto-snap_frequent-2011-03-10-11h40 40.5K  static 2011-03-10 11:40 
   solaris@zfs-auto-snap_frequent-2011-03-10-11h55 0      static 2011-03-10 11:55 
   solaris@zfs-auto-snap_hourly-2011-03-10-11h10   40.5K  static 2011-03-10 11:10

For automatic snapshots, snapshot names use the Oracle Solaris ZFS data set name followed by
@zfs-auto-snap <type> (where type is frequent, hourly, and so on), along with a date and time stamp.

How Long Are Automatic Snapshots Retained?

A certain number of automatic Oracle Solaris ZFS snapshots are kept, as long as space permits, according to type. From the service manifest (/var/svc/manifest/system/filesystem/auto-snapshot.xml), the auto-snapshot service keeps (by default) 3 frequent, 23 hourly, 6 daily, 4 weekly, and 12 monthly snapshots into the past. Snapshots are deleted, however, if space is needed, with oldest snapshots being deleted first.

How Do I Delete Unwanted Automatic Snapshots?

Carefully! The following command deletes all snapshots matching the pattern @zfs-auto-snap:

  # for s in `zfs list -H -o name -t snapshot | grep @zfs-auto-snap`; 
    do zfs destroy $s; done

If you are an authorized user, you can also use the Time Slider GUI to delete unwanted snapshots. Deleting a BE deletes related snapshots. Compare the following output with the earlier beadm list command:

  # beadm destroy BE2
  Are you sure you want to destroy BE2?  This action cannot be undone(y/[n]): y
  # beadm list -s
  BE/Snapshot                                        Space  Policy Created          
  -----------                                        -----  ------ -------          
  solaris
   solaris@install                                 26.58M static 2011-03-08 16:17 
   solaris@zfs-auto-snap_frequent-2011-03-10-12h25 41.0K  static 2011-03-10 12:25 
   solaris@zfs-auto-snap_frequent-2011-03-10-12h40 41.0K  static 2011-03-10 12:40 
   solaris@zfs-auto-snap_frequent-2011-03-10-13h40 29.0K  static 2011-03-10 13:40 
   solaris@zfs-auto-snap_hourly-2011-03-10-11h10   40.5K  static 2011-03-10 11:10 
   solaris@zfs-auto-snap_hourly-2011-03-10-12h10   40.5K  static 2011-03-10 12:10 
   solaris@zfs-auto-snap_hourly-2011-03-10-13h10   41.0K  static 2011-03-10 13:10

How Do I Revert to a BE Preserved in an Automatic Snapshot?

To roll back using a previous snapshot of a BE, create a BE from the snapshot, activate the BE, and reboot. In the following example, a new BE called BEnew is created from a previous BE snapshot, and then BEnew is set to become the active BE upon reboot:

  # beadm create -e solaris@zfs-auto-snap_hourly-2011-03-10-11h10 BEnew
  # beadm activate BEnew
  # reboot       

How Do I Track Package Installs and Updates?

The pkg history command is quite useful for researching what software updates have been made to the current BE. When it is used without arguments, it provides an overview of software changes, as shown in the following example:

  # pkg history
  TIME                OPERATION                 CLIENT          OUTCOME
  2010-11-05T11:14:56 purge-history             pkg             Succeeded
  2011-03-14T09:16:00 uninstall                 pkg             Succeeded
  2011-03-14T09:16:09 uninstall                 pkg             Succeeded
  2011-03-14T09:16:15 set-property              pkg             Succeeded
  2011-03-14T09:16:17 update-publisher          pkg             Succeeded
  2011-03-14T09:16:18 set-property              pkg             Succeeded
  2011-03-14T10:58:31 refresh-publishers        pkg             Succeeded
  2011-03-14T10:58:31 install                   pkg             Succeeded
  2011-03-14T10:58:32 rebuild-image-catalogs    pkg             Succeeded
  2011-03-14T11:32:45 uninstall                 pkg             Succeeded
  2011-03-14T12:47:31 refresh-publishers        updatemanager   Succeeded
  2011-03-14T12:47:33 rebuild-image-catalogs    updatemanager   Succeeded
  2011-03-15T09:22:07 install                   pkg             Succeeded

Perhaps even more useful is the output provided with the -l option, which reports what changes were made, whether the changes succeeded, which user made the changes, when the changes were made, which command was used, and details about the starting and ending states.

For example, the following excerpt shows that user jdoe ran the pkg install gcc -3 command, which successfully installed four packages for the GNU compiler:

  Operation: install
          Outcome: Succeeded
           Client: pkg
          Version: 052adf36c3f4
             User: jdoe (101)
       Start Time: 2011-03-16T09:22:07
         End Time: 2011-03-16T09:23:41
          Command: /usr/bin/pkg install gcc-3
      Start State: 
  Solver: [ Variables: 887 Clauses: 6874 Iterations: 1 State: Succeeded]
  Timings: [phase 1:  0.577, phase 2:  0.116, phase 3:  0.256, phase 4:  0.000, 
  phase 5:  0.000, phase 6:  0.000, phase 7:  0.019, phase 8:  1.258, phase 9:  
  0.000, phase 10:  0.781, phase 11:  0.028, phase 12:  0.141]
  Maintained incorporations: pkg://solaris/consolidation/gfx/gfx-incorporation@0.5.11,5.11-0.151.0.1:20101105T053408Z
  .
  . 
  .
  pkg://solaris/consolidation/osnet/osnet-incorporation@0.5.11,5.11-0.151.0.1:20101104T230646Z
  Package version changes:
        End State:
  None -> pkg://solaris/developer/gnu-binutils@2.19,5.11-0.151.0.1:
  20101105T053803Z
  None -> pkg://solaris/developer/library/lint@0.5.11,5.11-0.151.0.1:
  20101104T231349Z
  None -> pkg://solaris/developer/gcc-3@3.4.3,5.11-0.151.0.1:20101105T053751Z
  None -> pkg://solaris/system/header@0.5.11,5.11-0.151.0.1:20101105T002136Z

XML records for pkg operations are stored in /var/pkg/history, which the pkg history command uses to compose its output. The command pkg purge-history wipes out these records, so use it with caution, because this history can be valuable when troubleshooting.

How Might I Use Update Histories to Troubleshoot?

Suppose you take a day off and, upon returning, users report problems with a server. Examining the system’s package history for installations or updates might be a place to start. Suppose the history indicates that another system administrator performed a package installation while you were gone, which is evident in the third line of the following history excerpt:

  2011-03-14T12:47:31 refresh-publishers        updatemanager   Succeeded
  2011-03-14T12:47:33 rebuild-image-catalogs    updatemanager   Succeeded
  2011-03-16T09:22:07 install                   pkg             Succeeded

If further investigation confirms this install is the root cause, one possible corrective action is to revert to a previous BE until you can resolve the issue. Using the install’s time stamp, you can look for a BE snapshot that predates the installation, and revert back to the earlier snapshot:

  # beadm create -e SLIM@zfs-auto-snap_frequent-2011-03-15-06h59 BEpre317
  # beadm activate BEpre317
  # reboot

Final Thoughts

In summary, Oracle Solaris 11 Express supplies new ways to help administrators cope with human errors, including their own. On desktops, enable Time Slider services and automatic snapshots of user directories (or the full /export/home file system). On servers, always create a new BE before performing an update, and optionally set up periodic snapshots for the current BE. Oracle Solaris 11 Express provides built-in safety nets, so best practice, of course, is to use them.

Revision 1.0, 05/06/2011