by Jon Senger, Aik Zu Shyong, and Suzanne Zorn
Switching the underlying operating system on a single server is not trivial. Neither is dealing with the related conversion and compatibility issues. Imagine what's involved in switching the operating system on thousands of servers spread globally across an enterprise, like Dell just did.
In June of 2010, Dell made the decision to migrate 1,700 systems from SUSE Linux to Oracle Linux, while leaving the hardware and application layers unchanged. Standardization across the Linux platforms helped make this large-scale conversion possible. The majority of the site-specific operating system and application configuration could simply be backed up and restored directly on the new operating system. Configuration changes were minimal and most could be automated, easing the administration effort required and helping achieve a reliable and consistent transition procedure.
Dell had approximately 1,700 physical systems running SUSE Linux at the start of this migration process. These systems, geographically dispersed around the world, used a mix of eighth-generation (Dell PowerEdge 2850 and 2950 servers) and newer Dell hardware. Fibre Channel SAN storage comprised EMC Symmetrix and CLARiiON devices. The software environment included SUSE Linux 10 Service Pack 1 with multipath I/O (MPIO), Oracle Database 10g Release 2, Oracle Real Application Clusters (Oracle RAC), and Oracle Automatic Storage Management, as shown in Figure 1.
Figure 1. Dell's Deployment Environment Before and After Migration to Oracle Linux
The migration primarily involved the operating system moving from SUSE Linux 10 to Oracle Linux 5.5. The same physical server and storage hardware was retained during the migration. Similarly, the Oracle software remained unchanged after the migration. An additional change for Dell was a switch from SUSE Linux's built-in multipath I/O support to EMC PowerPath for automated data path management. (Note: the actual conversion from MPIO to PowerPath is tangential to the operating system migration and is beyond the scope of this document.)
This migration process also served as a time for Dell to re-evaluate its servers running SUSE Linux to determine whether the applications running on these servers could be decommissioned or deployed on an existing MegaGrid environment instead. Dell uses 16-node racks, each capable of hosting 300 databases, for their MegaGrid deployment. In some cases, there was sufficient capacity on an existing MegaGrid infrastructure, and the applications and databases could be migrated to the grid and the SUSE Linux server powered down and decommissioned. This consolidation provided savings in power, cooling, and reduced space requirements. In other cases where consolidation wasn't feasible, the SUSE Linux system was migrated to Oracle Linux, using the processes described in this document.
Given the scale of the migration, planning and automation were essential to the project's success. Aik Zu Shyong, in Core Engineering at Dell, reflects: "We put significant focus on engineering the operating system conversion to make sure we could deliver a simple, reliable, and repeatable automated process. Additionally, by designing the migration to be done in-place instead of using a much slower and cost-prohibitive replacement method, we were able to further reduce downtime and save data center space."
Dell's migration process included three main steps: preparation, reimaging the operating system, and postinstallation configuration. First, in the preparation step, Dell saved the existing environment's configuration and safely shut down the applications and database. Next, they reimaged the operating system from SUSE Linux 10 to Oracle Linux 5.5. After the reimaging was completed, the postinstallation steps were used to configure the new environment and restore the previous data.
The following preinstallation steps were used by Dell to prepare for their migration from SUSE Linux to Oracle Linux.
ORACLE_HOMEdirectory. Sufficient space was reserved to hold this directory plus the various system configuration files that needed to be backed up.
init.dprocesses. Dell followed the Oracle-recommended shutdown order to stop the Oracle Database, Oracle Automatic Storage Management, applications running on the cluster nodes, and Cluster Ready Services (CRS). The
chkconfigcommand was used to disable running services:
# chkconfig service_name off
lsofcommand to list any open files and make sure any NFS mounts were not in use. They also confirmed the
ORACLE_HOMEdirectory was free from resource utilization.
|Hardware info||Archive hardware info using Dell OpenManage Server Administrator (OMSA) or native Linux commands; save to file (for example, |
|Network card info||Archive IP address, subnet and gateway info, MAC address, link speed/duplex info, and network bonding configuration; save to file (for example, |
|Memory info||(Optional) Archive the memory utilization records; use a maximum of one-week snapshot, if necessary, to prove equivalent or better performance; save to file (for example, |
|OS || |
|Kernel modules info|| |
|Authentication (PAM), users and groups, and || |
|Device manager ( || |
|Automount file system info|| |
|Bootloader configuration files|| |
| || |
|Runlevel config|| |
| || |
|Cron job config|| |
|MPIO config|| |
|Network config (NIC, routing, and so on)|| |
|NTP config|| |
|NFS config|| |
|Name service config|| |
|Hosts config|| |
|System configuration ( || |
| || |
|SSH config|| |
|SAR data files|| |
|Apache config files||Optional (needed only if running Apache); |
|FTP||Optional; needed only if running FTP services|
|CIFS||Optional; needed only if running CIFS services|
|Shell/profile information|| |
|Prelogin message|| |
| || |
|PowerPath licensing and config files|| |
|Additional software/applications||Back up any third-party non-Oracle software applications|
|Profiles for || |
|LUN mapping info|| |
|Oracle Inventory Pointer ( || |
|Oracle inventory file ( || |
|OCR file|| |
|SSH trusted key for || |
|Database-specific kernel settings|| |
|Home directory of ||Site-specific; |
|Home directory of Oracle software ( ||Site-specific; |
After configuration information was saved and all essential services were moved to backup servers, the system was ready to have the new Oracle Linux operating system installed. The kickstart installation method was used to automatically perform the installation of the Oracle Linux 5.5 operating system across the network. Using kickstart helped ensure quick, efficient, and consistent operating system installations on the client systems.
The standard kickstart configuration and installation were employed, with a central kickstart server on the network used for the installations. ISO images for Oracle Linux 5.5 were copied to Dell's regional imaging server and made available over the network. A kickstart configuration file was created that specified kickstart options and the packages to be installed. The client machine was booted using a USB flash drive, and the kickstart configuration file was downloaded. Installation proceeded automatically and was completed without requiring user intervention.
Caution: Make sure that the installation process does not erase the backup disk, which is used to store the archived system information. Dell's kickstart process specifically touched only the
/dev/sda disk, leaving
/dev/sdb available for safely archiving the backup information.
The following key steps were part of the postinstallation process in Dell's migration to Oracle Linux:
grub.conffile is different for Oracle Linux, the equivalent SUSE Linux file could not be copied directly. Instead, an entry for the preferred ioscheduler was added to the new Oracle Linux
/boot/grub/grub.confconfiguration file, for example:
kernel KERNEL_PARAMETERS elevator=deadline
/etc/shadowfiles could not be copied directly. Instead, the passwords for the few local user accounts (for example, the
oracleuser) were manually restored.
ssh. After installing the new Oracle Linux operating system, the host keys returned by the
sshdaemon changed. Therefore, new
known_hostskey files, needed for
sshclient access, were regenerated for the hosts in the cluster.
Note: While Dell chose to generate new client keys, it would also be possible to restore the old host keys from backup.
ifcfg-bondNconfiguration files. In contrast, Oracle Linux uses the
/etc/modprobe.conffile to load the bonding kernel module and its options. Therefore, entries were added to the new Oracle Linux
/etc/modprobe.conffile to load the bonding kernel modules and set options, for example:
alias bond0 bonding options bond0 mode=active-backup miimon=100 downdelay=100 updelay=200
Both SUSE Linux and Oracle Linux store the network device information in
ethN files. However, SUSE Linux stores these files in the
/etc/sysconfig/network directory, and Oracle Linux uses the
/etc/sysconfig/network-scripts directory. Table 3 shows an example
ifcfg-bondN file for both the previous SUSE Linux environment and the new Oracle Linux environment.
|SUSE Linux||Oracle Linux|
| || |
DEVICE=bond1 BOOTPROTO='static' BROADCAST='192.168.255.255' IPADDR='192.168.0.190' NETMASK='255.255.0.0' NETWORK='192.168.0.0' REMOTE_IPADDR='' MTU='' STARTMODE='onboot' BONDING_MASTER='yes' BONDING_SLAVE_0='eth2' BONDING_SLAVE_1='eth3' BONDING_MODULE_OPTS='mode=active-backup miimon=100 downdelay=100 updelay=200'
DEVICE=bond0 BOOTPROTO=none ONBOOT=yes IPADDR='192.168.0.190' NETMASK=255.255.0.0 NETWORK=192.168.0.0 USERCTL=no
Table 4 shows an example
ifcfg-eth2 file for both the previous SUSE Linux environment and the new Oracle Linux environment.
|SUSE Linux||Oracle Linux|
| || |
DEVICE=eth2 STARTMODE='onboot' BOOTPROTO='none' MASTER='bond1' SLAVE='yes'
DEVICE=eth2 HWADDR=00:15:17:97:CD:4E BOOTPROTO=none ONBOOT=yes MASTER=bond0 SLAVE=yes USERCTL=no
Refer to the Oracle Linux system administration documentation for more complete details on setting up network bonding on an Oracle Linux system.
.profilefile, and Oracle Linux uses a
.bash_profilefile. Therefore, the
.profilefiles for the
svcgridusers were copied to
.bash_profilefiles in the new Oracle Linux environment.
inittabfile is different for the two operating systems. Therefore, entries for the three startup scripts for Oracle software were copied into the new
inittabfile rather than directly copying the
inittabfile in its entirety.
The three relevant lines in the original SUSE Linux
inittab file—entries for the Event Manager daemon (
evmd), Oracle Cluster Services Synchronization daemon (
cssd), and Cluster Ready Services daemon (
crsd)—were copied from the archived file and added to the end of new Oracle Linux
inittab file, for example:
Note: As a best practice, the Oracle product executables might need be relinked after installing the new operating system. For more information, refer to How to Relink Oracle Software on Unix [ID 131321.1] on My Oracle Support (requires a valid Customer Support Identifier [CSI] to view).
Migrating 1,700 servers from SUSE Linux to Oracle Linux was an aggressive IT decision, one deemed necessary by Dell to gain better stability and support, easier administration, and lower costs. Extracting the underlying operating system layer and replacing it, while leaving the application layer intact, was possible only because of standardization across the Linux platforms. The bulk of the site-specific operating system configuration could simply be backed up and restored directly on the new operating system. Similarly, Oracle Database and other applications required only minor configuration changes to transition from SUSE Linux to Oracle Linux.
At the time this document was written in December of 2011, Dell was approximately halfway through the migration process, with an anticipated June 2012 completion date. Careful planning before starting the migration to itemize the needed site-specific configuration files and identify the files that required conversion was key to Dell's success. Automation via scripts and kickstart installations, plus attention to detail via checklists during the actual conversion process, reduced risk and provided consistency during the migration process.
Dell is confident that the decision to migrate these servers from SUSE Linux to Oracle Linux was the right decision for their business. According to Jon Senger, Enterprise Architect at Dell, "We took a risk doing a migration of such a large scale with such a challenging infrastructure, but it's really paid off. Not only do we lower our TCO for the environment, but since we have been able to standardize on Oracle Linux, we have achieved the stability and support that our customers demand."
The following resources are available for Oracle Linux:
Download Oracle Linux:
Follow Oracle Linux: