Best Practices for Oracle Solaris Kernel Zones

Oracle Solaris Kernel Zones provide all the flexibility, scalability, and efficiency of Oracle Solaris Zones while adding the capability to have zones that have independent kernels. This article discusses best practices for deploying kernel zones.

by John Levon
Published January 2015 (updated in October 2015 and July 2018)

Oracle Solaris Kernel Zones, available as of Oracle Solaris 11.2, are the newest type of Oracle Solaris Zone. Kernel zones provide all the flexibility, scalability, and efficiency of Oracle Solaris Zones while adding the capability to have zones that have independent kernels. This capability is highly useful when you are trying to coordinate the updating of multiple zone environments that belong to different owners.

With kernel zones, the updates can be done at the level of an individual kernel zone at a time that is convenient for each owner. In addition, applications that have specific version requirements can run side by side on the same system and benefit from the high consolidation ratios that Oracle Solaris Zones provide.

This article discusses general best practices for deploying kernel zones. It is worth noting that the best configuration for any particular environment is dependent on how the kernel zones will be used, and this article should be used as a starting point. For example, you might wish to dedicate specific resources to a kernel zone to optimize for repeatable performance. In this case, you wouldn't want to share CPU resources with other workloads on the system by using the default virtual-cpu resource used below.

Host Configuration

In general, you want to reserve your host for running zones rather than running applications directly in the global zone. It's OK to host different zone types on a single host, however.

When using kernel zones, is it necessary to provide a hint to the system about application memory usage. This information is used to limit the growth of the ZFS Adaptive Replacement Cache (ARC) so that more memory stays available for applications and, in this case, for the kernel zones themselves.

You provide this hint by setting the value of the user_reserve_hint_pct parameter. A script is provided for doing this. For example, the following command sets the value to 80.

root@global:~# ./ -f 80
Adjusting user_reserve_hint_pct from 0 to 80
Adjustment of user_reserve_hint_pct to 80 successful.

You can find this script and more information by visiting the My Oracle Support website and then accessing Doc ID 1663862.1.

Note that using this value can affect ZFS performance on the host, because the host will no longer have as much cache. You might need to tweak this value if you want to use native zones or you want to use kernel zones with ZFS volume storage (which both use the host ZFS driver).

Memory Configuration

The default solaris-kz memory size (capped-memory:physical) is 4Gb, and is a reasonable starting value, but you will likely need to adjust this for your application's workload. The SYSsolaris-kz-minimal) template uses the minimal size, 2Gb, but is not usually suitable.

The RAM size should also depend on the number of CPUs configured for the kernel zone: if a large number of CPUs are used, allocate more memory, or the zone may have trouble allocating the kernel memory it needs for the CPUs. Consider an additional 2Gb for every additional 32 CPUs.

CPU Configuration

With the default template, kernel zones are given 4 virtual CPUs (that is, four threads of execution).

Note that due to CPU architecture, there are significant differences between x86 and SPARC platforms. A single SPARC "CPU" is not directly comparable in terms of execution performance to a single x86 "CPU." It's usually better to think in terms of CPU cores, as described below.

As with native zones, there are two basic choices: You can either dedicate host CPUs to the kernel zone, or you can share host CPUs with other zones.

Dedicated CPUs

For best performance, use the dedicated-cpu resource to dedicate host CPUs to the kernel zone. Then, only the kernel zone will run on those CPUs.

Each of the zone's CPU threads can be scheduled to run on any of the physical CPUs inside the CPU set, and may move between physical CPUs as the guest is running. However, guest CPUs are core-bound; they will never move between two physical cores.

It's usually best to assign CPUs in core-sized chunks (see psrinfo -vp on the host).

By default, setting dedicated-cpu:ncpus does not provide any control over which host CPUs are allocated. This can lead to inconsistent results across host reboots. It's therefore recommended to use dedicated-cpu:cpus to specify the exact CPUs to use.

Latency Group Configuration

You might also want to specify CPUs from a particular latency group. For example, networking performance can sometimes be improved by using the same latency group as the underlying network device:

root# dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net8              Ethernet             up         10000  full      ixgbe0
root# lgrpinfo -d /dev/net/net8
lgroup ID : 1
root# lgrpinfo -G -c 1
lgroup 1 (leaf):
        CPUs: 0-9 40-49

To create an automatic network resource (anet) on top of the host's net8 datalink, you might want to allocate CPUs in the 0–9 or 40–49 range.

Virtual CPUs

Setting the virtual-cpu resource specifies the number of virtualized CPUs that the guest kernel sees. On the host, the virtualized CPUs behave similarly to an operating-system thread. They share CPU time with other zones' threads, including those of the global zone. This is the best option for maximizing consolidation, but it can have an effect on performance.

Networking Configuration

It's recommended to use an anet with kernel zones. As with any zone setup, be sure to configure the anet's security settings appropriately (including allowed-address and link-protection).

You might want to configure the kernel zone to use the same latency group the networking devices are using (see above).

Storage Configuration

By default, a kernel zone installation will use a 16 GB ZFS volume for the boot disk. If a 16 GB root pool is not big enough, you can specify a different size at installation time by using the following command:

# zoneadm -z kzone install -x install-size=32g

However, ZFS volumes are not as performant as "real" disks. They are also not portable across systems.

Instead, it's often desirable to specify real disk devices. Such a device could be a directly attached LUN on a host, or, if portability between different hosts is required, it could, for example, be an iSCSI disk. Use a "zones on shared storage" (ZOSS) URI to specify such shared storage, for example:

# zonecfg -z kzone info device id=0
match not specified
storage: iscsi://iscsi.server/luname.naa.600144F0CBF8AF19000051B06914000A
id: 0
bootpri: 0

If you are adding storage to a kernel zone that will not become part of the root pool, be sure to clear the bootpri property.

Suspend Image Storage

If you're configuring a suspend resource to allow suspend/resume capability for your kernel zone, make sure that the storage you're configuring is large enough for the maximum memory size of the kernel zone, plus a few megabytes.

If you're storing on the host system, use an area shared by all boot environments; for example:

zonecfg:kzone> select suspend
zonecfg:kzone:suspend> set path=/export/zones/%{zonename}/suspend.img
zonecfg:vzl-100> exit
# mkdir -p /export/zones/vzl-100/

To support warm migration between hosts, the suspend storage should be accessible to both hosts under the same path. For example, you can use a ZOSS URI as shown above; or you can specify an NFS path that is identical on both host machines.

Live Migration

As a reminder, to migrate a kernel zone, its configuration must be applicable to both the source and the destination zone. For example, all the zone's allocated storage must use a network-accessible method, such as a ZOSS URI, or NFS configured from within the zone itself. If possible, values such as anet:lower-link should use "auto" instead of specifying a particular link name, as that name might not apply on different systems. (If required, specific configuration can be set on each machine ahead of the actual migration.)

# usermod -P +"Zone Migration" -A zonemig

This will allow zonemig to run pfexec zoneadm migrate. If the configuration doesn't exist on the destination host, and you want the user to be able to define it, then you'll need to add the .config authorization:

#  usermod -P +"Zone Configuration" -A zonemig2

To live migrate a zone, a RAD connection to the destination host is required. It's recommended use an ssh RAD URI, in which case ssh must be configured in a passwordless mode. That is, a simple "ssh destination" (as the zonemig user) should not require interaction. This can be achieved either with user public-key authentication, and optionally, use of ssh-agent, as described in "System Administration Guide: Security Services".

Both hosts for live migration should have accurate time set via a network service such as NTP; this is needed to maintain proper time in the zone itself across the migration.

Migration is bandwidth-hungry; the available bandwidth directly affects both the total migration time, and the "blackout" time during which the zone is momentarily not executing. At least a 10Gbit link is recommended; the more the better. If minimizing downtime is critical, consider limiting other traffic on that link (including any simultaneous migrations).

Live migration can be costly in terms of CPU time, especially for larger zones. The CPU time is taken from the zone's allocation, so if you're using dedicated-cpu, or other ways to limit CPU for the zone, be sure to allocate enough headroom both for the zone's workload and the live migration process.

See Also

Here are some related resources:

Also see the following Oracle Solaris resources:

About the Author

John Levon is a principal software engineer in the Oracle Solaris Virtualization group. He is based in the UK, and has worked on Oracle Solaris development for over a decade.

Revision 1.2, 07/02/2015
Revision 1.1, 10/09/2015
Revision 1.0, 01/05/2015