Articles
Server and Storage Administration
by Jon Senger, Aik Zu Shyong, and Suzanne Zorn
|
Switching the underlying operating system on a single server is not trivial. Neither is dealing with the related conversion and compatibility issues. Imagine what's involved in switching the operating system on thousands of servers spread globally across an enterprise, like Dell just did.
|
In June of 2010, Dell made the decision to migrate 1,700 systems from SUSE Linux to Oracle Linux, while leaving the hardware and application layers unchanged. Standardization across the Linux platforms helped make this large-scale conversion possible. The majority of the site-specific operating system and application configuration could simply be backed up and restored directly on the new operating system. Configuration changes were minimal and most could be automated, easing the administration effort required and helping achieve a reliable and consistent transition procedure.
Dell had approximately 1,700 physical systems running SUSE Linux at the start of this migration process. These systems, geographically dispersed around the world, used a mix of eighth-generation (Dell PowerEdge 2850 and 2950 servers) and newer Dell hardware. Fibre Channel SAN storage comprised EMC Symmetrix and CLARiiON devices. The software environment included SUSE Linux 10 Service Pack 1 with multipath I/O (MPIO), Oracle Database 10g Release 2, Oracle Real Application Clusters (Oracle RAC), and Oracle Automatic Storage Management, as shown in Figure 1.

Figure 1. Dell's Deployment Environment Before and After Migration to Oracle Linux
The migration primarily involved the operating system moving from SUSE Linux 10 to Oracle Linux 5.5. The same physical server and storage hardware was retained during the migration. Similarly, the Oracle software remained unchanged after the migration. An additional change for Dell was a switch from SUSE Linux's built-in multipath I/O support to EMC PowerPath for automated data path management. (Note: the actual conversion from MPIO to PowerPath is tangential to the operating system migration and is beyond the scope of this document.)
This migration process also served as a time for Dell to re-evaluate its servers running SUSE Linux to determine whether the applications running on these servers could be decommissioned or deployed on an existing MegaGrid environment instead. Dell uses 16-node racks, each capable of hosting 300 databases, for their MegaGrid deployment. In some cases, there was sufficient capacity on an existing MegaGrid infrastructure, and the applications and databases could be migrated to the grid and the SUSE Linux server powered down and decommissioned. This consolidation provided savings in power, cooling, and reduced space requirements. In other cases where consolidation wasn't feasible, the SUSE Linux system was migrated to Oracle Linux, using the processes described in this document.
Given the scale of the migration, planning and automation were essential to the project's success. Aik Zu Shyong, in Core Engineering at Dell, reflects: "We put significant focus on engineering the operating system conversion to make sure we could deliver a simple, reliable, and repeatable automated process. Additionally, by designing the migration to be done in-place instead of using a much slower and cost-prohibitive replacement method, we were able to further reduce downtime and save data center space."
Dell's migration process included three main steps: preparation, reimaging the operating system, and postinstallation configuration. First, in the preparation step, Dell saved the existing environment's configuration and safely shut down the applications and database. Next, they reimaged the operating system from SUSE Linux 10 to Oracle Linux 5.5. After the reimaging was completed, the postinstallation steps were used to configure the new environment and restore the previous data.
The following preinstallation steps were used by Dell to prepare for their migration from SUSE Linux to Oracle Linux.
ORACLE_HOME directory. Sufficient space was reserved to hold this directory plus the various system configuration files that needed to be backed up.init.d processes. Dell followed the Oracle-recommended shutdown order to stop the Oracle Database, Oracle Automatic Storage Management, applications running on the cluster nodes, and Cluster Ready Services (CRS). The chkconfig command was used to disable running services: # chkconfig service_name off
lsof command to list any open files and make sure any NFS mounts were not in use. They also confirmed the ORACLE_HOME directory was free from resource utilization.# lsof
| Archive Step | Comments |
|---|---|
| Hardware info | Archive hardware info using Dell OpenManage Server Administrator (OMSA) or native Linux commands; save to file (for example, hardware.txt) |
| Network card info | Archive IP address, subnet and gateway info, MAC address, link speed/duplex info, and network bonding configuration; save to file (for example, network.txt) |
| Memory info | (Optional) Archive the memory utilization records; use a maximum of one-week snapshot, if necessary, to prove equivalent or better performance; save to file (for example, memory.txt) |
OS *-release file | /etc/SuSE-release |
| Kernel modules info | /lib/modules/*, /etc/{modprobe.conf, modprobe.conf.local, modprobe.d/*} |
Authentication (PAM), users and groups, and nsswitch.conf | /etc/pam.d/*, /etc/nsswitch.conf, /etc/passwd, /etc/shadow, /etc/group, /etc/sudoers, /etc/security/* |
Device manager (udev) rules | /etc/udev/udev.conf, /etc/udev/rules.d/* |
| Automount file system info | /etc/auto.* (Optional; needed only if you are using automount) |
| Bootloader configuration files | /boot/grub/*, /etc/grub.conf, /etc/sysconfig/bootloader |
/var/log/messages file | /var/log/{boot.msg, boot.omsg, localmessages, messages} |
| Runlevel config | /etc/inittab, /etc/init.d/boot.local |
rclocal script | /etc/rc.d/rclocal |
| Cron job config | /etc/cron/{daily, hourly, monthly}/*, /var/spool/cron/tabs/* |
| MPIO config | /etc/multipath.conf |
| Network config (NIC, routing, and so on) | /etc/sysconfig/network/ifcfg-*, /etc/sysconfig/network/*, /etc/resolv.conf |
| NTP config | /etc/ntp.conf |
| NFS config | /etc/exports, /etc/fstab |
| Name service config | /etc/nscd.conf |
| Hosts config | /etc/{hosts, host.conf, hosts.allow, hosts.deny, HOSTNAME} |
System configuration (sysconfig) | /etc/sysconfig/* (including all subdirectories), /etc/sys/* (including all subdirectories) |
/proc/info files | /proc/* (including all subdirectories) |
| SSH config | /etc/ssh/*, /etc/sshd.config, /etc/pam.d/ssh |
| SAR data files | /var/log/sa/sa* |
| Apache config files | Optional (needed only if running Apache);/etc/httpd* |
| FTP | Optional; needed only if running FTP services |
| CIFS | Optional; needed only if running CIFS services |
| Shell/profile information | /etc/{bash.bashrc, csh.cshrc, csh.login, ksh.kshrc}, /etc/profile, /etc/profile.d/* |
| Prelogin message | /etc/issue |
/etc/default directory files | /etc/default/* |
| PowerPath licensing and config files | /etc/emcp* |
| Additional software/applications | Back up any third-party non-Oracle software applications |
| Archive Step | Comments |
|---|---|
Profiles for oracle and svcgrid users | .profile files for oracle and svcgrid users (Note: Oracle Linux files are named .bash_profile) |
| LUN mapping info | /u02 (this directory contained the symbolic links for the LUN mappings in the SUSE Linux environment) |
Oracle Inventory Pointer (oraInst.loc) and oratab files | /etc/oraInst.loc, /etc/oratab |
Oracle inventory file (oraInventory) | /etc/oracle/oraInventory |
| OCR file | /etc/oracle/ocr.loc |
SSH trusted key for oracle user | ~oracle/.ssh/* |
| Database-specific kernel settings | /etc/sysctl.conf |
Home directory of oracle user | Site-specific; /home/oracle for Dell configurations |
Home directory of Oracle software (ORACLE_BASE) | Site-specific; /u01/app/oracle for Dell configurations |
tar utility.
After configuration information was saved and all essential services were moved to backup servers, the system was ready to have the new Oracle Linux operating system installed. The kickstart installation method was used to automatically perform the installation of the Oracle Linux 5.5 operating system across the network. Using kickstart helped ensure quick, efficient, and consistent operating system installations on the client systems.
The standard kickstart configuration and installation were employed, with a central kickstart server on the network used for the installations. ISO images for Oracle Linux 5.5 were copied to Dell's regional imaging server and made available over the network. A kickstart configuration file was created that specified kickstart options and the packages to be installed. The client machine was booted using a USB flash drive, and the kickstart configuration file was downloaded. Installation proceeded automatically and was completed without requiring user intervention.
Caution: Make sure that the installation process does not erase the backup disk, which is used to store the archived system information. Dell's kickstart process specifically touched only the /dev/sda disk, leaving /dev/sdb available for safely archiving the backup information.
The following key steps were part of the postinstallation process in Dell's migration to Oracle Linux:
grub.conf file is different for Oracle Linux, the equivalent SUSE Linux file could not be copied directly. Instead, an entry for the preferred ioscheduler was added to the new Oracle Linux /boot/grub/grub.conf configuration file, for example:
kernel KERNEL_PARAMETERS elevator=deadline
/etc/passwd and /etc/shadow files could not be copied directly. Instead, the passwords for the few local user accounts (for example, the oracle user) were manually restored.ssh. After installing the new Oracle Linux operating system, the host keys returned by the ssh daemon changed. Therefore, new known_hosts key files, needed for ssh client access, were regenerated for the hosts in the cluster.Note: While Dell chose to generate new client keys, it would also be possible to restore the old host keys from backup.
ifcfg-bondN configuration files. In contrast, Oracle Linux uses the /etc/modprobe.conf file to load the bonding kernel module and its options. Therefore, entries were added to the new Oracle Linux /etc/modprobe.conf file to load the bonding kernel modules and set options, for example:
alias bond0 bonding
options bond0 mode=active-backup miimon=100 downdelay=100 updelay=200
Both SUSE Linux and Oracle Linux store the network device information in ifcfg-bondN and ethN files. However, SUSE Linux stores these files in the /etc/sysconfig/network directory, and Oracle Linux uses the /etc/sysconfig/network-scripts directory. Table 3 shows an example ifcfg-bondN file for both the previous SUSE Linux environment and the new Oracle Linux environment.
ifcfg-bondN Files | SUSE Linux | Oracle Linux |
|---|---|
/etc/sysconfig/network/ifcfg-bond1 | /etc/sysconfig/network-scripts/ifcfg-bond0 |
DEVICE=bond1
BOOTPROTO='static'
BROADCAST='192.168.255.255'
IPADDR='192.168.0.190'
NETMASK='255.255.0.0'
NETWORK='192.168.0.0'
REMOTE_IPADDR=''
MTU=''
STARTMODE='onboot'
BONDING_MASTER='yes'
BONDING_SLAVE_0='eth2'
BONDING_SLAVE_1='eth3'
BONDING_MODULE_OPTS='mode=active-backup
miimon=100 downdelay=100 updelay=200'
|
DEVICE=bond0
BOOTPROTO=none
ONBOOT=yes
IPADDR='192.168.0.190'
NETMASK=255.255.0.0
NETWORK=192.168.0.0
USERCTL=no
|
Table 4 shows an example ifcfg-eth2 file for both the previous SUSE Linux environment and the new Oracle Linux environment.
ifcfg-eth2 Files | SUSE Linux | Oracle Linux |
|---|---|
/etc/sysconfig/network/ifcfg-eth2 | /etc/sysconfig/network-scripts/ifcfg-eth2 |
DEVICE=eth2
STARTMODE='onboot'
BOOTPROTO='none'
MASTER='bond1'
SLAVE='yes'
|
DEVICE=eth2
HWADDR=00:15:17:97:CD:4E
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no
|
Refer to the Oracle Linux system administration documentation for more complete details on setting up network bonding on an Oracle Linux system.
inittab file.
.profile file, and Oracle Linux uses a .bash_profile file. Therefore, the .profile files for the oracle and svcgrid users were copied to .bash_profile files in the new Oracle Linux environment. inittab file. The inittab file is different for the two operating systems. Therefore, entries for the three startup scripts for Oracle software were copied into the new inittab file rather than directly copying the inittab file in its entirety. The three relevant lines in the original SUSE Linux inittab file—entries for the Event Manager daemon (evmd), Oracle Cluster Services Synchronization daemon (cssd), and Cluster Ready Services daemon (crsd)—were copied from the archived file and added to the end of new Oracle Linux inittab file, for example:
Note: As a best practice, the Oracle product executables might need be relinked after installing the new operating system. For more information, refer to How to Relink Oracle Software on Unix [ID 131321.1] on My Oracle Support (requires a valid Customer Support Identifier [CSI] to view).
Migrating 1,700 servers from SUSE Linux to Oracle Linux was an aggressive IT decision, one deemed necessary by Dell to gain better stability and support, easier administration, and lower costs. Extracting the underlying operating system layer and replacing it, while leaving the application layer intact, was possible only because of standardization across the Linux platforms. The bulk of the site-specific operating system configuration could simply be backed up and restored directly on the new operating system. Similarly, Oracle Database and other applications required only minor configuration changes to transition from SUSE Linux to Oracle Linux.
At the time this document was written in December of 2011, Dell was approximately halfway through the migration process, with an anticipated June 2012 completion date. Careful planning before starting the migration to itemize the needed site-specific configuration files and identify the files that required conversion was key to Dell's success. Automation via scripts and kickstart installations, plus attention to detail via checklists during the actual conversion process, reduced risk and provided consistency during the migration process.
Dell is confident that the decision to migrate these servers from SUSE Linux to Oracle Linux was the right decision for their business. According to Jon Senger, Enterprise Architect at Dell, "We took a risk doing a migration of such a large scale with such a challenging infrastructure, but it's really paid off. Not only do we lower our TCO for the environment, but since we have been able to standardize on Oracle Linux, we have achieved the stability and support that our customers demand."
The following resources are available for Oracle Linux:
Download Oracle Linux:
Follow Oracle Linux: