by Venkat Chennuru
Published January 2013 (updated January 2013)More About Oracle VM Server for SPARC and Oracle Solaris Cluster
The configuration described in this article enables the protection of guest domains from planned and unplanned downtime by automating the failover of a guest domain through restart on an alternate cluster node. Automated failover provides protection in case there is a component outage or the guest domain needs to be migrated for preventive maintenance.
Oracle Solaris Cluster delivers two different solutions to protect Oracle VM Server for SPARC deployments (formerly known as Sun Logical Domains or LDoms).
First, it is possible to use Oracle VM Server for SPARC domains as cluster nodes. This configuration is similar to traditional "physical" server clusters, but Oracle Solaris Cluster is installed in a virtual machine (VM) domain (control, guest, or I/O domains). Applications running in that domain can be monitored through the built-in or custom cluster agents, and they are moved over to a another domain when the domain or the server fails or upon demand.
In this article, we will be discussing a second possibility. When installed in the server control domain, Oracle Solaris Cluster can not only manage applications but also guest domains as Solaris Cluster resources. In this case, a specific agent, which is called the Oracle Solaris Cluster Data Service for Oracle VM for SPARC, is used. This high availability (HA) agent can control and manage a guest domain as a "black box." It can fail over the guest domain in case of failure, but it can also use the domain migration procedures (live migration or warm migration) to operate a managed switchover.
The instructions in this article provide details on how to set up a guest domain under Oracle Solaris Cluster control. As a prerequisite, you will need to first install a two-node cluster on two SPARC T-Series systems from Oracle using two control domains.
Oracle VM Server for SPARC provides the ability to split a single physical system into multiple, independent virtual systems. This is achieved by an additional software application in the firmware layer—called the hypervisor—that is interposed between the operating system and the hardware platform. The hypervisor abstracts the hardware and can expose or hide various resources, allowing for the creation of resource partitions that can operate as discrete systems, complete with virtual CPU, memory, and I/O devices. The administrative operations to create and manage the VM domain are done in the control domain via the Logical Domains Manager interface.
Control domains need to be configured as Oracle Solaris Cluster nodes before they can host a failover guest domain service. The virtual services configuration must be identical on all the potential primary nodes. The guest domain that will be put under Oracle Solaris Cluster control can be created on any one of the nodes of the cluster. Once the domain is created, the domain configuration is retrieved by the
ldm list-constraints -x ldom command and stored in the Cluster Configuration Repository (CCR), which is accessible from all cluster nodes. This globally accessible information is used by the Oracle Solaris Cluster data service so Oracle VM Server for SPARC can create or destroy the domain on the node where the resource group is brought online or offline, respectively.
The Oracle Solaris Cluster Data Service for Oracle VM Server for SPARC provides a mechanism for orderly startup and shutdown, fault monitoring, and automatic failover of the Oracle VM Server for SPARC guest domain. In case the guest domain needs to be relocated to another cluster node, while under Oracle Solaris Cluster control, the data service tries live migration of the guest domain first; if that fails for any reason, it resorts to normal migration. This live migration feature requires that the boot disk be accessible from the current primary node and the new primary node simultaneously.
This section discusses several prerequisites, configuration assumptions, and preinstallation checks for the two-node cluster.
Install the two-node cluster. For help, see the article "How to Install and Configure a Two-Node Cluster" and the Oracle Solaris Cluster Software Installation Guide.
This article assumes the following conditions are met:
Figure 1. Oracle Solaris Cluster Hardware Configuration
For more information on the various topologies supported, please refer to the Oracle Solaris Cluster Concepts Guide.
Ensure the following criteria are met:
You can use
/usr/cluster/bin/scdidadm -L or
/usr/cluster/bin/cldevice list to see the shared disks. Each cluster node has a path to the shared disk, as shown in Listing 1.
root@phys-schost-1:~# /usr/cluster/bin/cldevice show d3 === DID Device Instances === DID Device Name: /dev/did/rdsk/d3 Full Device Path: phys-schost-2:/dev/rdsk/c0t60080E5000 17B52C00002B9D4EB40DB5d0 Full Device Path: phys-schost-1:/dev/rdsk/c0t60080E5000 17B52C00002B9D4EB40DB5d0 Replication: none default_fencing: global root@phys-schost-1:~#
Listing 1. Checking the Shared Disk
In this section, you perform two tasks: preparing the file system and preparing the domain configurations.
In a failover configuration, the logical domain's boot disk must be on a global file system, a network file system (NFS), or a raw shared disk. The boot disk must be accessible from all potential primary nodes simultaneously for live migration to work.
The example in this article uses a global file system mounted from a UFS file system created on an Oracle Solaris Volume Manager metadevice.
Oracle Solaris Cluster provides a specific service to manage a global file system or shared disk: the SUNW.HAStoragePlus service. We will be using a HAStoragePlus resource to manage the global file system used in this configuration.
phys-schost-1# /usr/cluster/bin/clrt register SUNW.HAStoragePlus
phys-schost-1# /usr/cluster/bin/clrg create ldom-rg
phys-schost-1# metaset -s bootdiskset -a -h phys-schost-1 phys-schost-2 phys-schost-1# metaset -s bootdiskset -a /dev/did/dsk/dX phys-schost-1# metainit -s bootdiskset d0 1 1 /dev/did/dsk/dXs0 phys-schost-1# newfs -v /dev/md/bootdiskset/dsk/d0
/etc/vfstabon both nodes, because you are using global file system:
dev/md/bootdiskset/dsk/d0 /dev/md/bootdiskset/rdsk/d0 /global/ldom 2 no global
The global file system will be used to host the boot disk.
phys-schost-1# /usr/cluster/bin/clrs create -g ldom-rg -p filesystemmountpoints=/global/ldom ldom-hasp-rs
phys-schost-1# /usr/cluster/bin/clrg online -emM ldom-rg
The master domain's failure policy is controlled by setting one of the following values for the
ignoreignores any slave domains when the master domain fails.
panicpanics any slave domains when the master domain fails.
resetresets any slave domains when the master domain fails.
stopstops any slave domains when the master domain fails.
# ldm set-domain failure-policy=reset primary # ldm list -o domain primary
The virtual service names have to be exactly the same on both the cluster nodes, as referred to in the guest domain configuration.
ldm add-vds primary-vds0 primary ldm add-vconscon port-range=5000-5100 primary-vcc0 primary ldm add-vsw net-dev=net0 primary-vsw0 primary ldm add-vdsdev bootdisk-path ldg1-boot@primary-vds0
The bootdisk-path depends on whether the boot disk is a raw disk or a file-backed virtual disk on a global file system.
If it is a raw disk, it should be specified as
/dev/global/dsk/dXs2. In this example, we are going to use a global file system and, hence, it is a file-backed virtual disk.
mkfile 20g /global/ldom/ldg1-boot ldm add-vdsdev /global/ldom/ldg1-boot ldg1-boot@primary-vds0
ldm list-servicesoutput, the
dvddisk services should match, because they are used by the guest domain when brought online.
phys-schost-1, as shown in Listing 2:
phys-schost-1# ldm list-services primary VCC NAME LDOM PORT-RANGE primary-vcc0 primary 5000-5100 VSW NAME LDOM MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-L INK primary-vsw0 primary 00:14:4f:f9:5c:1a net0 0 switch@0 1 1 1500 on VDS NAME LDOM VOLUME OPTIONS MPGROUP DEVICE primary-vds0 primary ldg1-boot /global/ldom/ldg1-boot dvd /var/tmp/sol-11_1-20-text-sparc.iso phys-schost-1#
Listing 2. Checking Services on
phys-schost-2, as shown in Listing 3:
phys-schost-2# ldm list-services primary VCC NAME LDOM PORT-RANGE primary-vcc0 primary 5000-5100 VSW NAME LDOM MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-L INK primary-vsw0 primary 00:14:4f:fb:02:5c net0 0 switch@0 1 1 1500 on VDS NAME LDOM VOLUME OPTIONS MPGROUP DEVICE primary-vds0 primary ldg1-boot /global/ldom/ldg1-boot dvd /var/tmp/sol-11_1-20-text-sparc.iso phys-schost-2#
Listing 3. Checking the Services on
The guest logical domain in the failover configuration must be configured only on one node and when the HA Oracle VM Server for SPARC resource is created, the configuration is stored in the CCR. When the logical domain resource comes online, it will create the logical domain on the node where it is coming online and it will start and boot the logical domain.
phys-schost-1# ldm add-domain ldg1 phys-schost-1# ldm set-vcpu 32 phys-schost-1# ldm set-mem 8g phys-schost-1# ldm add-vdisk ldg1-boot@primary-vds0 ldg1 phys-schost-1# ldm add-vdisk dvd@primary-vds0 ldg1 phys-schost-1# ldm add-vnet vnet0 primary-vsw0 ldg1
If there is a mix of architectures in the cluster setup, change
generic for the guest domain.
phys-schost-1# ldm set-domain cpu-arch=generic ldg1
ldg1, should be installed and started before placing the domain under Oracle Solaris Cluster control:
ldm bind ldg1 ldm start ldg1
telnet 0 5000 ok boot dvd
Phys-schost-2# ldm ls -l ldg1 NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME ldg1 active -n---- 5000 32 8G 0.0% 4d 17h 17m SOFTSTATE Solaris running UUID 9fbee96f-3896-c224-e384-cb24ed9650e1 MAC 00:14:4f:fb:4d:49 HOSTID 0x84fb4d49 CONTROL failure-policy=ignore extended-mapin-space=off cpu-arch=generic DEPENDENCY master=primary CORE CID CPUSET 4 (32, 33, 34, 35, 36, 37, 38, 39) 5 (40, 41, 42, 43, 44, 45, 46, 47) 6 (48, 49, 50, 51, 52, 53, 54, 55) 7 (56, 57, 58, 59, 60, 61, 62, 63) VCPU VID PID CID UTIL STRAND 0 32 4 0.3% 100% 1 33 4 0.0% 100% 2 34 4 0.0% 100% 3 35 4 0.0% 100% 4 36 4 0.0% 100% 5 37 4 0.0% 100% 6 38 4 0.0% 100% 7 39 4 0.0% 100% 8 40 5 0.0% 100% 9 41 5 1.2% 100% 10 42 5 0.0% 100% 11 43 5 0.0% 100% 12 44 5 0.0% 100% 13 45 5 0.0% 100% 14 46 5 0.1% 100% 15 47 5 0.0% 100% 16 48 6 0.0% 100% 17 49 6 0.0% 100% 18 50 6 0.0% 100% 19 51 6 0.0% 100% 20 52 6 0.0% 100% 21 53 6 0.0% 100% 22 54 6 0.0% 100% 23 55 6 0.0% 100% 24 56 7 0.0% 100% 25 57 7 0.0% 100% 26 58 7 0.0% 100% 27 59 7 0.0% 100% 28 60 7 0.0% 100% 29 61 7 0.0% 100% 30 62 7 0.0% 100% 31 63 7 0.0% 100% MEMORY RA PA SIZE 0x10000000 0x200000000 256M 0x400000000 0x220000000 7680M 0x800000000 0x840000000 256M CONSTRAINT threading=max-throughput VARIABLES auto-boot?=false NETWORK NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP vnet0 primary-vsw0@primary 0 network@0 00:14:4f:fa:31: 6c 1 1500 DISK NAME VOLUME TOUT ID DEVICE SERVER MPGROUP bootdisk ldg1-boot@primary-vds0 0 disk@0 primary dvd dvd@primary-vds0 1 disk@1 primary VCONS NAME SERVICE PORT LOGGING ldg1 primary-vcc0@primary 5000 on phys-schost-2# phys-schost-2# ls -ld /var/tmp/passwd -r-------- 1 root root 7 Jul 26 13:36 /var/tmp/passwd
Listing 4. Installation Procedure
masterproperty for the guest domain.
master property needs to be set to
primary, so that if the primary node panics or reboots, the guest logical domain will be rebooted.
Each slave domain can specify up to four master domains by setting the
phys-schost-1 # ldm set-domain master=primary ldg1 phys-schost-1 # ldm list -o domain ldg1
Each master domain can specify what happens to its slave domains in the event that the master domain fails. For instance, if a master domain fails, it might require its slave domains to panic. If a slave domain has more than one master domain, the first master domain to fail triggers its defined failure policy on all of its slave domains.
phys-schost-1 # /usr/cluster/bin/clrt register SUNW.ldom
ldg1domain under the control of the data service:
phys-schost-1 # /usr/cluster/bin/clrs create -g ldom-rg -t SUNW.ldom \ -p Domain_name=ldg1 -p Password_file=/global/ldom/pass \ -p Plugin_probe="/opt/SUNWscxvm/bin/ppkssh -P \ user1:/home/user1/.ssh/id_dsa:ldg1:multi-user-server:online" ldom-rs
The password file should contain the root password of the cluster nodes. For security reasons, this file should be owned by
root with only read permissions for
-p Plugin_probe command above runs
ssh, as shown below.
ssh -i /home/user1/.ssh/id_dsa -l user1 ldg1 svcs -H -o state multi-user-server:default
ldg1 is the host name of the domain
ldg1. If the host name and the domain name are different, you need to supply the host name of the domain.
phys-schost-2# /usr/cluster/bin/clrg status ldg1-rg === Cluster Resource Groups === Group Name Node Name Suspended Status ------------- --------- ----- -------------- ldg1-rg phys-schost-1 No Offline phys-schost-2 No Online phys-schost-2# /usr/cluster/bin/clrs status ldg1-rs === Cluster Resources === Resource Name Node Name State Status Message ------------- --------- ----- -------------- ldg1-rs phys-schost-1 Offline Offline - Successfully stopped ldg1 phys-schost-2 Online Online - ldg1 is active (normal)
Listing 5. Status of Cluster Resources and Resource Groups
telnet, according to the the configuration, and check the uptime:
ssh -l username hostname-of-failover-guest-domain w
The Oracle Solaris Cluster Data Service for Oracle VM Server for SPARC does live migration, which ensures that the
telnet connection survives the switchover.
phys-schost-1 # clrg switch -n phys-schost-2 ldom-rg
sshsession to the failover domain's host name to check the uptime and verify that the guest domain did not reboot but was live migrated.
phys-schost-2 # clrg switch -n phys-schost-1 ldom-rg
sshsession to the failover domain's host name to verify that the guest domain is alive.
This article described how to configure a failover guest domain using a two-node cluster with a global file system. It also explained how to verify that the cluster is behaving correctly by switching over the failover guest domain from the primary node to the secondary node and back.
For more information on configuring Oracle Solaris Cluster components, see the following resources.
Venkat Chennuru has been working as quality lead in the Oracle Solaris Cluster group for the last 12 years. Prior to that, Venkat worked as a system administrator at US West Communications in Minneapolis, MN. Before that, Venkat worked as system and network administrator at Intergraph India Pvt. Ltd. in Hyderabad, India. Venkat has a special diploma in Electronics and Communications with a specialization in Industrial Electronics and Instrumentation.
|Revision 1.2, 01/23/2013|