What You See Is What You Get Element

Playing with Swap Monitoring and Increasing Swap Space Using ZFS Volumes

In Oracle Solaris 11.1

by Alexandre BorgesOracle ACE

Part 2 of a series that describes the key features of ZFS in Oracle Solaris 11.1 and provides step-by-step procedures explaining how to use them. This article describes how to monitor swap space and how to increase or decrease the swap space using ZFS volumes.


Published April 2014


right arrow Part 1 - Using COMSTAR and ZFS to Configure a Virtualized Storage Environment
right arrow Part 2 - Playing with Swap Monitoring and Increasing Swap Space Using ZFS Volumes
right arrow Part 3 - Playing with ZFS Shadow Migration
right arrow Part 4 - Delegating a ZFS Dataset to a Non-Global Zone
right arrow Part 5 - Playing with ZFS Encryption

During installation, Oracle Solaris 11 usually makes the swap space around one quarter of the RAM size. System and, particularly, application requirements can vary for each environment, so it's often appropriate to alter the swap space size by adding or removing space.

Want to comment on this article? Post the link on Facebook's OTN Garage page.  Have a similar article to share? Bring it up on Facebook or Twitter and let's discuss.

The swap space is an area of disk dedicated to paged anonymous memory and processes that are moved because of a low amount of RAM.

Monitoring Swap Space

There are several ways to see the current size of the space swap for your system, for example:

root@solaris11-1:~# swap -l
swapfile                     dev  swaplo    blocks     free
/dev/zvol/dsk/rpool/swap   285,2       8   2097144  2097144

where:

  • swapfile indicates the swap space comes from a ZFS volume at /dev/zvol/dsk/rpool/swap.
  • dev shows the major number, which in this case confirms that the swap object is based on a ZFS volume:

    root@solaris11-1:~# more /etc/name_to_major | grep 285
    zfs 285
    
  • swaplo indicates the minimum possible swap space size, which represents the memory page size (8 sectors x 512 bytes = 4K). To check it, pagesize can be obtained by executing the following:

    root@solaris11-1:~# pagesize
    4096
    

    A value of 4K is typically found on Intel machines. However, with Oracle Solaris 11 on SPARC machines, the page size can vary from 16K to 2 GB (this upper limit also applies for Intel processors). This upper limit is mainly used as the page size for the System Global Area (SGA)—a dedicated shared-memory area for an instance of Oracle Database 11g. Additionally, it is worth noting that 2 GB pages are supported with Oracle Solaris 10 8/11 or later Oracle Solaris releases and Oracle's SPARC T4 processor, but this page size isn't enabled by default. If it's suitable for some applications, we have to enable it by inserting set max_uheap_lpsize=0x80000000 in the /etc/system file and then rebooting the system.

    Furthermore, Oracle Solaris 11 supports multiple page sizes, which can be set manually according the application profile or automatically through a new built-in memory prediction technology that is able to analyze the demands of applications in order to assign a suitable value.

    The supported page sizes can be shown by running the following command (in this case, on an Intel processor):

    root@solaris11-1:~# pagesize -a
    4096
    2097152
    

    The example above shows us that two page sizes are supported: 4K and 2 GB. The real reason for using larger memory pages is for improving the Memory Management Unit (MMU) performance by reducing TLB (Translation Lookaside Buffer) misses. The number of TLB misses can be verified by using the trapstat command (although trapstat is not usually implemented on Intel platforms).

  • blocks is the total size of the swap space (2097144 x 512 bytes = 1 GB).
  • free represents the free swap space (1 GB).

Another very good way to monitor the swap space is the following command:

root@solaris11-1:~# swap -s
total: 680180k bytes allocated + 266516k reserved = 946696k used, 2321756k available

From this command output, we can see the following:

  • 680180K bytes allocated indicates the amount of swap space that already has been used (that is, touched previously but not necessarily still being used at this time) and continues to be available and reserved for use. A rough comparison would be a high-watermark threshold.
  • 266516k reserved indicates swap space that has not been allocated yet, but has been claimed for possible future use. Remember that swap space is reserved when the virtual memory (heap segment or anonymous memory) for a process is created, and the reserved swap space is then allocated when the process is run. Anonymous memory is made of pages that don't have a counterpart in any file system and that are migrated to the swap space due to a shortage of physical memory (RAM)—probably because the sum of the stack, the shared memory, and the process heap (from the malloc function, for example) is larger than the amount of available memory.
  • 946696k used indicates the total amount of swap space that is either allocated or reserved.
  • 2321756k available indicates the swap space available for future allocation.

Additionally, we must remember that some swap space is reserved when the virtual memory for a process is created, but only part of this reserved space is really associated with the address space of the process; otherwise, the swap -s output can be misinterpreted, because it is telling us that 946696k is, at the end, reserved (in order to allocate a space, the space must has been reserved previously) and 680180K of swap space has been touched.

Another very important point is that the swap -l command reports the physical swap space (on disk) while swap -s reports virtual swap space, which is the sum of the physical swap space and the physical memory. Therefore, the available swap space from swap -s is the sum of free physical swap space plus free physical memory space. That's the reason that the swap -s command is not recommended for evaluating the physical swap space; instead, swap -l should be used for this goal.

If we want to try another way to get the swap information, we can use the echo ::swapinfo | mdb -k command, for example:

root@solaris11-1:~# echo ::swapinfo | mdb -k
            ADDR            VNODE     PAGES      FREE NAME
ffffc10007798260 ffffc10007a7db40    262143    262143 /dev/zvol/dsk/rpool/swap

It's simple to confirm that 262143 pages x 8K = 2097144K.

As mentioned earlier, it's good to remember that anonymous memory doesn't have a counterpart in the file system. Usually, anonymous pages are the private data of a process, which includes the process heap (anonymous data) and the thread structure (the stack area, for example).

Swapping—an operation in which the swapper process (sched) swaps out processes that have been sleeping for more than 20 seconds (first their thread structures and then the stack and heap data [anonymous page])—shouldn't be confused with paging, which is moving pages (normally 4 KB or 8 KB each) from memory to disk and usually results in very efficient memory management. However, one kind of paging has a horrible effect on system performance—anonymous paging (mainly anonymous page-in)—because it increases application latency for reading back data from a disk.

Also swapping shouldn't be confused with reaping, which is a technique to free memory from the kernel slab allocator caches and which is done by the function kmem_reap( ).

How can you verify whether a system is using anonymous pages? In the following output, the columns that are interesting are apo (anonymous page-out) and api (anonymous page-in), which both ideally should be equal to zero. The latter is responsible for an increase in application latency.

root@solaris11-1:~# vmstat -p 1
     memory           page          executable      anonymous      filesystem
   swap      free  re  mf  fr  de  sr  epi  epo  epf  api  apo  apf  fpi  fpo  fpf
 2973844  2609240   3 18   0   0   3    0    0    0    0    0    0    0    0    0
 2895156  2544236  26 47   0   0   0    0    0    0    0    0    0    0    0    0
 2895156  2544092   0  0   0   0   0    0    0    0    0    0    0    0    0    0

To find out what process is doing anonymous page-in, use the following command:

root@solaris11-1:~# dtrace -n 'vminfo:::anonpgin { @[pid, execname] = count(); }'

Swapping is the last-used resource when paging is not able to free enough memory to meet the demands of an application, which can be indicated by a high level of page scanning (searching for free memory pages).

Usually, when the amount of free memory goes below the amount specified by the desfree kernel parameter and then below the amount specified by the minfree kernel parameter, page scanning becomes more intensive. If the amount of free memory stays below the desfree value for 30 seconds or more, the system starts swapping.

The worst form of swapping is hard swapping, which is when some inactive kernel modules are unloaded and moved to the swap space.

We can monitor whether the system is hard swapping by using the following command:

root@solaris11-1:~# echo "hardswap/D" | mdb -k
hardswap:
hardswap:       0

Hard swapping is rare because following conditions must be met:

  • The amount of free memory needs to be below desfree for more than 30 seconds, AND
  • There must constantly be two pending processes on the run queue (the r column in the vmstat output below), AND
  • freemem must be below minfree OR the number of page-ins plus page-outs must be greater than maxpgio, where maxpgio is the number of page-out requests that can be queued by the paging system.

In other words, maxpgio is used to limit how many memory pages can be sent to swap causing a disk I/O bottleneck. Therefore, maxpgio depends on the number of swap devices using their own disk controller. Its default value is 40 pages.

More often, we might see a light kind of swapping called soft swapping, which happens when the amount of free memory is below the desfree value.

We can check for soft swapping by executing the following command:

root@solaris11-1:~# echo "softswap/D" | mdb -k
softswap:
softswap:       0  

By way of introduction (more details would be beyond the scope of this article), the minfree value equals desfree/2, and the desfree value equals lotsfree/2. The following is the formula for calculating lotsfree:

lotsfree = [memory - kernel]/(64 * page size)]

These values can be seen by running the following commands:

root@solaris11-1:~# prtconf | grep -i memory
Memory size: 4096 Megabytes

root@solaris11-1:~# echo lotsfree/D | mdb -k
lotsfree:
lotsfree:       16318  

root@solaris11-1:~# echo desfree/E | mdb -k
desfree:
desfree:        8159       
     
root@solaris11-1:~# echo minfree/D | mdb -k
minfree:
minfree:        4079        
    
root@solaris11-1:~# bc
16318 * 4096 * 64
4277665792
root@solaris11-1:~#

The best method for getting the values of lotsfree, desfree, and minfree is executing the following command:

root@solaris11-1:~# kstat -n system_pages
module: unix                            instance: 0     
name:   system_pages                    class:    pages
        availrmem                       409132
        crtime                          0
        desfree                         8159
        desscan                         25
        econtig                         4229439488
        fastscan                        522183
        freemem                         243665
        kernelbase                      0
        lotsfree                        16318
        minfree                         4079
        nalloc                          110633425
        nalloc_calls                    31285
        nfree                           107403292
        nfree_calls                     23611
        nscan                           0
        pagesfree                       243665
        pageslocked                     635234
        pagestotal                      1044366
        physmem                         1044366
        pp_kernel                       649290
        slowscan                        100
        snaptime                        26017.87927546

Furthermore, returning to the page scanning subject, there are different values for page scanning that happen at different times. For example, fastscan is the number of pages scanned per second when free memory is equal to zero, desscan is the scan rate goal during page scanning, and nscan is the number of pages scanned during the last page scan action. In this example, there is enough memory and there isn't any page scanning activity (nscan equals 0).

This same information from kstat can be collected by running the following commands:

root@solaris11-1:~# echo fastscan/E | mdb -k
fastscan:
fastscan:       522183          
root@solaris11-1:~# echo slowscan/E | mdb -k
slowscan:
slowscan:       100             
root@solaris11-1:~# echo desscan/E | mdb -k
desscan:
desscan:        25              
root@solaris11-1:~# echo nscan/E | mdb -k
nscan:
nscan:          0               

To monitor the swap space, we can check the past and the present (real time) swapping statistics by executing this command:

root@solaris11-1:~# vmstat 1
 kthr      memory            page            disk          faults      cpu
 r b w    swap    free  re  mf pi po fr de sr s0 s2 s3 s4   in   sy   cs us sy id
 0 0 0 2972960 2608516   3  18  0  0  0  0  3  0  0  0  0  659  480  723  1  4 95
 0 0 0 2895104 2544208  26  49  0  0  0  0  0  0  0  0  0  660  648  694  1  4 95
 0 0 0 2895104 2544056   0   2  0  0  0  0  0  0  0  0  0  690 1839  847  4  4 92

The important column for us is w, which shows swapped out threads caused by memory pressure that was probably caused by the amount of free memory dropping below minfree or desfree for more than 30 seconds and, thus, causing idle processes to be swapped out to the swap space.

The following command shows the real-time swap status:

root@solaris11-1:~# vmstat -S 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap     free  si  so pi po fr de sr s0 s2 s3 s4   in   sy   cs us sy id
 0 0 0 2972572 2608200   0   0  0  0  0  0  3  0  0  0  0  659  480  723  1  4 95
 0 0 0 2895032 2544000   0   0  0  0  0  0  0  0  0  0  0  706  875  901  2  5 93
 0 0 0 2895032 2544000   0   0  0  0  0  0  0  0  0  0  0  615  511  671  1  3 96

Columns so and si represent swapped-out pages and swapped-in pages, respectively, in real time. Again, ideally both should be zero for good performance.

Adding or Removing Swap Space Using a ZFS Volume

Now that we know how to monitor the swap space, it's time to learn to add space and delete disk space that is allocated to the swap area. The Oracle Solaris 11 host we are using (solaris11-1) has the following file system-related components:

root@solaris11-1:~# zfs list -r rpool
NAME                              USED  AVAIL  REFER  MOUNTPOINT
rpool                            28.5G  49.7G  4.91M  /rpool
rpool/ROOT                       25.4G  49.7G    31K  legacy
rpool/ROOT/solaris               25.4G  49.7G  24.4G  /
rpool/ROOT/solaris-backup-1       138K  49.7G  24.2G  /
rpool/ROOT/solaris-backup-1/var    64K  49.7G   291M  /var
rpool/ROOT/solaris/var            486M  49.7G   234M  /var
rpool/VARSHARE                     92K  49.7G    92K  /var/share
rpool/dump                       2.06G  49.8G  2.00G  -
rpool/export                      805K  49.7G    32K  /export
rpool/export/home                 773K  49.7G    32K  /export/home
rpool/export/home/ale             741K  49.7G   741K  /export/home/ale
rpool/swap                       1.03G  49.7G  1.00G  -

The last line indicates the swap space is 1GB and it's a ZFS volume. This information can be verified by executing the following:

root@solaris11-1:~# ls -l /dev/zvol/rdsk/rpool/swap
lrwxrwxrwx   1 root     root           0 Dec  2 06:31 /dev/zvol/rdsk/rpool/swap -> ../../../..//devices/pseudo/zfs@0:2,raw

Thus, it's feasible to change its size because the rpool has some free space and the swap volume belongs to the rpool storage pool:

root@solaris11-1:~# zfs get volsize rpool/swap
NAME        PROPERTY  VALUE  SOURCE
rpool/swap  volsize   1G     local

root@solaris11-1:~# zfs set volsize=2G rpool/swap
root@solaris11-1:~# zfs get volsize rpool/swap
NAME        PROPERTY  VALUE  SOURCE
rpool/swap  volsize   2G     local

root@solaris11-1:~# swap -l  
swapfile                  dev      swaplo    blocks     free
/dev/zvol/dsk/rpool/swap  285,2         8    097144  2097144
/dev/zvol/dsk/rpool/swap  285,2    097160    097144  2097144

root@solaris11-1:~# swap -s
total: 451556k bytes allocated + 259888k reserved = 711444k used, 3886000k available

root@solaris11-1:~# zfs list -r rpool/swap
NAME         USED  AVAIL  REFER  MOUNTPOINT
rpool/swap  2.06G  48.7G  2.00G  -
root@solaris11-1:~#

However, it is not always possible to change the properties of the swap space, because it could be busy. So sometimes it's necessary to add a second volume into the rpool storage pool and, afterwards, to insert a line at end of /etc/vfstab to mount this volume automatically:

root@solaris11-1:~# zfs create -V 2G rpool/newswap
root@solaris11-1:~# swap -a /dev/zvol/dsk/rpool/newswap  
root@solaris11-1:~# swap -l
swapfile                    dev    swaplo   blocks     free
/dev/zvol/dsk/rpool/swap    285,2       8  2097144  2097144
/dev/zvol/dsk/rpool/swap    285,2 2097160  2097144  2097144
/dev/zvol/dsk/rpool/newswap 285,4       8  4194296  4194296

root@solaris11-1:~# swap -s
total: 453668k bytes allocated + 260304k reserved = 713972k used, 5962264k available

root@solaris11-1:~# zfs list -r rpool   
NAME                              USED  AVAIL  REFER  MOUNTPOINT
rpool                            31.6G  46.6G  4.91M  /rpool
rpool/ROOT                       25.4G  46.6G    31K  legacy
rpool/ROOT/solaris               25.4G  46.6G  24.4G  /
rpool/ROOT/solaris-backup-1       138K  46.6G  24.2G  /
rpool/ROOT/solaris-backup-1/var    64K  46.6G   291M  /var
rpool/ROOT/solaris/var            486M  46.6G   234M  /var
rpool/VARSHARE                     92K  46.6G    92K  /var/share
rpool/dump                       2.06G  46.7G  2.00G  -
rpool/export                      805K  46.6G    32K  /export
rpool/export/home                 773K  46.6G    32K  /export/home
rpool/export/home/ale             741K  46.6G   741K  /export/home/ale
rpool/newswap                    2.06G  46.7G  2.00G  -
rpool/swap                       2.06G  46.7G  2.00G  -

root@solaris11-1:~# more /etc/vfstab
#device                    device   mount             FS      fsck    mount     mount
#to mount                  to fsck  point             type    pass    at boot   options
#
/devices                    -       /devices          devf    -       no        -
/proc                       -       /proc             proc    -       no        -
Ctfs                        -       /system/contract  ctfs    -       no        -
Objfs                       -       /system/object    objfs   -       no        -
Sharefs                     -       /etc/dfs/sharetab sharefs -       no        -
Fd                          -       /dev/fd           fd      -       no        -
Swap                        -       /tmp              tmpfs   -       yes       -

/dev/zvol/dsk/rpool/swap    -       -                 swap    -       no        -
/dev/zvol/dsk/rpool/newswap -       -                 swap    -       no        -

Obviously, the process of removing swap space is the reverse. For example, the following command is executed and then the last line in the /etc/vfstab file is deleted:

root@solaris11-1:~# swap -d /dev/zvol/dsk/rpool/newswap

See Also

Here are some links to other things I've written:

And here are some Oracle Solaris 11 resources:

About the Author

Alexandre Borges is an Oracle ACE and who worked as an employee and contracted instructor at Sun Microsystems from 2001 to 2010 teaching Oracle Solaris, Oracle Solaris Cluster, Oracle Solaris security, Java EE, Sun hardware, and MySQL courses. Nowadays, he teaches classes for Symantec, Oracle partners, Hitachi, and EC-Council, and he teaches several very specialized classes about information security. In addition, he is a regular writer and columnist at Linux Magazine Brazil.

Revision 1.0, 04/09/2014

Follow us:
Blog | Facebook | Twitter | YouTube