Build Your Own Oracle RAC Cluster on Oracle Enterprise Linux and iSCSI

1. Overview

One of the most efficient ways to become familiar with Oracle Real Application Cluster (RAC) 11g technology is to have access to an actual Oracle RAC 11g cluster. In learning this new technology, you will soon start to realize the benefits Oracle RAC 11g has to offer like fault tolerance, new levels of security, load balancing, and the ease of upgrading capacity.

Unfortunately, for many organizations, the price of the hardware required for a typical production RAC configuration makes this goal impossible. A small two-node cluster can cost from US$10,000 to well over US$20,000. This cost would not even include the heart of a production RAC environment, the shared storage. In most cases, this would be a Storage Area Network (SAN), which generally start at US$10,000.

For those who simply want to become familiar with Oracle RAC 11g, this article provides a low cost alternative to configure an Oracle RAC 11g system using commercial off the shelf components and downloadable software for educational purposes. The estimated cost for this configuration could be anywhere from US$2,000 to US$2,700. The system will consist of a dual node cluster (two i386 nodes with a single processor), both running Linux (CentOS 5.1 for x86), Oracle 11g Release 1 for Linux x86, OCFS2, and ASMLib 2.0. All shared disk storage for Oracle RAC will be based on iSCSI using a Network Storage Server; namely Openfiler Release 2.2 (respin 2).

Powered by rPath Linux, Openfiler is a free browser-based network storage management utility that delivers file-based Network Attached Storage (NAS) and block-based Storage Area Networking (SAN) in a single framework. Openfiler supports CIFS, NFS, HTTP/DAV, FTP, however, we will only be making use of its iSCSI capabilities to implement an inexpensive SAN for the shared storage components required by Oracle RAC 11g. A 500GB internal hard drive will be connected to the network storage server (sometimes referred to in this article as the Openfiler server) through an internal embedded SATA II controller. The Openfiler server will be configured to use this disk for iSCSI based storage and will be used in our Oracle RAC 11g configuration to store the shared files required by Oracle Clusterware as well as all Oracle ASM volumes.

This article is provided for educational purposes only, so the setup is kept simple to demonstrate ideas and concepts. For example, the disk mirroring configured in this article will be setup on one physical disk only, while in practice that should be performed on multiple physical drives. Also note that while this article provides detailed instructions for successfully installing a complete Oracle RAC 11g system, it is by no means a substitute for the official Oracle documentation. In addition to this article, users should also consult the following Oracle documents to gain a full understanding of alternative configuration options, installation, and administration with Oracle RAC 11g. Oracle's official documentation site is docs.oracle.com.

Oracle Clusterware Installation Guide - 11g Release 1 (11.1) for Linux
Oracle Clusterware Administration and Deployment Guide - 11g Release 1 (11.1)
Oracle Real Application Clusters Installation Guide - 11g Release 1 (11.1) for Linux and UNIX
Oracle Database 2 Day + Real Application Clusters Guide - 11g Release 1 (11.1)
Oracle Database Storage Administrator's Guide - 11g Release 1 (11.1)

Although in past articles I used raw partitions for storing files on shared storage, here we will make use of the Oracle Cluster File System V2 (OCFS2) and Oracle Automatic Storage Management (ASM). The two Oracle RAC nodes will be configured as follows:

Oracle Database Files
RAC Node Name	Instance Name	Database Name	$ORACLE_BASE	File System - Volume Manager for DB Files
linux1	orcl1	orcl	/u01/app/oracle	ASM
linux2	orcl2	orcl	/u01/app/oracle	ASM
Oracle Clusterware Shared Files
File Type	File Name	iSCSI Volume Name	Mount Point	File System
Oracle Cluster Registry (OCR)	/u02/oradata/orcl/OCRFile	crs	/u02	OCFS2
Voting Disk	/u02/oradata/orcl/CSSFile	crs	/u02	OCFS2

Note that as of Oracle Database 10g Release 2 (10.2), Cluster Ready Services, or CRS, is now called Oracle Clusterware.

The Oracle Clusterware software will be installed to /u01/app/crs on both of the nodes that make up the RAC cluster. Oracle Clusterware should be installed in a separate Oracle Clusterware home directory which is non-release specific (/u01/app/oracle/product/11.1.0/... for example) and must never be a subdirectory of the ORACLE_BASE directory (/u01/app/oracle for example). This is a change to the Optimal Flexible Architecture (OFA) rules.

While the Oracle Clusterware software will be installed independently to the local disk on both Oracle RAC nodes, the Clusterware software requires that two of its files, the "Oracle Cluster Registry (OCR)" file and the "Voting Disk" file be shared with both nodes in the cluster. These two files will be installed on shared storage using Oracle's Cluster File System, Release 2 (OCFS2). It is also possible to use RAW devices for these files, however, it is not possible to use ASM for these two shared Clusterware files.

The Oracle Database 11g Release 1 software will be installed into a separate Oracle Home; namely /u01/app/oracle/product/11.1.0/db_1 on both of the nodes that make up the RAC cluster. All of the Oracle physical database files (data, online redo logs, control files, archived redo logs) will be installed to shared volumes being managed by Automatic Storage Management (ASM). The Oracle database files can just as easily be stored on OCFS2. Using ASM, however, makes the article that much more interesting!

This article is only designed to work as documented with absolutely no substitutions!

The only exception here is the choice of vendor hardware (i.e. machines, networking equipment, and internal / external hard drives). Ensure that the hardware you purchase from the vendor is supported on Red Hat Linux 5 and Openfiler 2.2 (respin 2). I tend to stick with Dell hardware given their superb quality and compatibility with Linux. For a test system of this nature, I highly recommend purchasing pre-owned or refurbished Dell hardware from a reputable company like Stallard Technologies, Inc.. Stallard Technologies has a proven track record of delivering the best value on pre-owned hardware combined with a commitment to superior customer service. I base my recommendation on my own outstanding personal experience with their organization. To learn more about Stallard Technologies, visit their website or or contact William Buchanan.

The following is a list of past articles which describe configuring a similar Oracle RAC Cluster
using various versions of Oracle, Operating System, and shared storage medium:

2. Oracle RAC 11g Overview

Before introducing the details for building a RAC cluster, it might be helpful to first clarify what a cluster is. A cluster is a group of two or more interconnected computers or servers that appear as if they are one server to end users and applications and generally share the same set of physical disks. The key benefit of clustering is to provide a highly available framework where the failure of one node (for example a database server) does not bring down an entire application. In the case of failure with one of the servers, the other surviving server (or servers) can take over the workload from the failed server and the application continue to function normally as if nothing has happened.

The concept of clustering computers actually started several decades ago. The first successful cluster product was developed by DataPoint in 1977 named ARCnet. The ARCnet product enjoyed much success by academia types in research labs, but didn't really take off in the commercial market. It wasn't until the 1980's when Digital Equipment Corporation (DEC) released its VAX cluster product for the VAX/VMS operating system.
With the release of Oracle 6 for the Digital VAX cluster product, Oracle Corporation was the first commercial database to support clustering at the database level. It wasn't long, however, before Oracle realized the need for a more efficient and scalable distributed lock manager (DLM) as the one included with the VAX/VMS cluster product was not well suited for database applications. Oracle decided to design and write their own DLM for the VAX/VMS cluster product which provided the fine-grain block level locking required by the database. Oracle's own DLM was included in Oracle 6.2 which gave birth to Oracle Parallel Server (OPS) - the first database to run the parallel server.
By Oracle 7, OPS was extended to included support for not only the VAX/VMS cluster product but also with most flavors of UNIX. This framework required vendor-supplied clusterware which worked well, but made for a complex environment to setup and manage given the multiple layers involved. By Oracle 8, Oracle introduced a generic lock manager which was integrated into the Oracle kernel. In later releases of Oracle, this became known as the Integrated Distributed Lock Manager (IDLM) and relied on an additional layer known as the Operating System Dependant (OSD) layer. This new model paved the way for Oracle to not only have their own DLM, but to also create their own clusterware product in future releases.
Oracle Real Application Clusters (RAC), introduced with Oracle9i, is the successor to Oracle Parallel Server. Using the same IDLM, Oracle 9i could still rely on external clusterware but was the first release to include their own clusterware product named Cluster Ready Services (CRS). With Oracle 9i, CRS was only available for Windows and Linux. By Oracle 10g, Oracle's clusterware product was available for all operating systems. With the release of Oracle Database 10g Release 2 (10.2), Cluster Ready Services was renamed to Oracle Clusterware. When using Oracle 10g or higher, Oracle Clusterware is the only clusterware that you need for most platforms on which Oracle RAC operates (except for Tru cluster, where you need vendor clusterware). You can still use clusterware from other vendors if the clusterware is certified for Oracle RAC. This guide uses Oracle Clusterware 11g.
Like OPS, Oracle RAC allows multiple instances to access the same database (storage) simultaneously. RAC provides fault tolerance, load balancing, and performance benefits by allowing the system to scale out, and at the same time since all instances access the same database, the failure of one node will not cause the loss of access to the database.
At the heart of Oracle RAC is a shared disk subsystem. Each instance in the cluster must be able to access all of the data, redo log files, control files and parameter file for all other instances in the cluster. The data disks must be globally available in order to allow all instances to access the database. Each instance has its own redo log files and UNDO tablespace that are locally read-writeable. The other instances in the cluster must be able to access them (read-only) in order to recover that instance in the event of a system failure. The redo log files for an instance are only writeable by that instance and will only be read from another instance during system failure. The UNDO, on the other hand, is read all the time during normal database operation (e.g. for CR fabrication).
The biggest difference between Oracle RAC and OPS is the addition of Cache Fusion. With OPS a request for data from one instance to another required the data to be written to disk first, then the requesting instance can read that data (after acquiring the required locks). With cache fusion, data is passed along a high-speed interconnect using a sophisticated locking algorithm.
Not all database clustering solutions use shared storage. Some vendors use an approach known as a Federated Cluster, in which data is spread across several machines rather than shared by all. With Oracle RAC, however, multiple instances use the same set of disks for storing data. Oracle's approach to clustering leverages the collective processing power of all the nodes in the cluster and at the same time provides failover security.
Pre-configured Oracle RAC solutions are available from vendors such as Dell, IBM and HP for production environments. This article, however, focuses on putting together your own Oracle RAC 11g environment for development and testing by using Linux servers and a low cost shared disk solution; iSCSI.
For more background about Oracle RAC, visit the Oracle RAC Product Center on OTN.
3. Shared-Storage Overview

Today, fibre channel is one of the most popular solutions for shared storage. As mentioned earlier, fibre channel is a high-speed serial-transfer interface that is used to connect systems and storage devices in either point-to-point (FC-P2P), arbitrated loop (FC-AL), or switched topologies (FC-SW). Protocols supported by Fibre Channel include SCSI and IP. Fibre channel configurations can support as many as 127 nodes and have a throughput of up to 2.12 gigabits per second in each direction, and 4.25 Gbps is expected.

Fibre channel, however, is very expensive. Just the fibre channel switch alone can start at around US$1,000. This does not even include the fibre channel storage array and high-end drives, which can reach prices of about US$300 for a 36GB drive. A typical fibre channel setup which includes fibre channel cards for the servers is roughly US$10,000, which does not include the cost of the servers that make up the cluster.
A less expensive alternative to fibre channel is SCSI. SCSI technology provides acceptable performance for shared storage, but for administrators and developers who are used to GPL-based Linux prices, even SCSI can come in over budget, at around US$2,000 to US$5,000 for a two-node cluster.

Another popular solution is the Sun NFS (Network File System) found on a NAS. It can be used for shared storage but only if you are using a network appliance or something similar. Specifically, you need servers that guarantee direct I/O over NFS, TCP as the transport protocol, and read/write block sizes of 32K. See the Certify page on Oracle Metalink for supported Network Attached Storage (NAS) devices that can be used with Oracle RAC. One of the key drawbacks that has limited the benefits of using NFS and NAS for database storage has been performance degradation and complex configuration requirements. Standard NFS client software (client systems that use the operating system provided NFS driver) is not optimized for Oracle database file I/O access patterns. With the introduction of Oracle 11g, a new feature known as Direct NFS Client integrates the NFS client functionality directly in the Oracle software. Through this integration, Oracle is able to optimize the I/O path between the Oracle software and the NFS server resulting in significant performance gains. Direct NFS Client can simplify, and in many cases automate, the performance optimization of the NFS client configuration for database workloads.

The shared storage that will be used for this article is based on iSCSI technology using a network storage server installed with Openfiler. This solution offers a low-cost alternative to fibre channel for testing and educational purposes, but given the low-end hardware being used, it is not recommended to be used in a production environment.

4. iSCSI Technology

For many years, the only technology that existed for building a network based storage solution was a Fibre Channel Storage Area Network (FC SAN). Based on an earlier set of ANSI protocols called Fiber Distributed Data Interface (FDDI), Fibre Channel was developed to move SCSI commands over a storage network.

Several of the advantages to FC SAN include greater performance, increased disk utilization, improved availability, better scalability, and most important to us — support for server clustering! Still today, however, FC SANs suffer from three major disadvantages. The first is price. While the costs involved in building a FC SAN have come down in recent years, the cost of entry still remains prohibitive for small companies with limited IT budgets. The second is incompatible hardware components. Since its adoption, many product manufacturers have interpreted the Fibre Channel specifications differently from each other which has resulted in scores of interconnect problems. When purchasing Fibre Channel components from a common manufacturer, this is usually not a problem. The third disadvantage is the fact that a Fibre Channel network is not Ethernet! It requires a separate network technology along with a second set of skill sets that need to exist with the datacenter staff.

With the popularity of Gigabit Ethernet and the demand for lower cost, Fibre Channel has recently been given a run for its money by iSCSI-based storage systems. Today, iSCSI SANs remain the leading competitor to FC SANs.

Ratified on February 11^th, 2003 by the Internet Engineering Task Force (IETF), the Internet Small Computer System Interface, better known as iSCSI, is an Internet Protocol (IP)-based storage networking standard for establishing and managing connections between IP-based storage devices, hosts, and clients. iSCSI is a data transport protocol defined in the SCSI-3 specifications framework and is similar to Fibre Channel in that it is responsible for carrying block-level data over a storage network. Block-level communication means that data is transferred between the host and the client in chunks called blocks. Database servers depend on this type of communication (as opposed to the file level communication used by most NAS systems) in order to work properly. Like a FC SAN, an iSCSI SAN should be a separate physical network devoted entirely to storage, however, its components can be much the same as in a typical IP network (LAN).

While iSCSI has a promising future, many of its early critics were quick to point out some of its inherent shortcomings with regards to performance. The beauty of iSCSI is its ability to utilize an already familiar IP network as its transport mechanism. The TCP/IP protocol, however, is very complex and CPU intensive. With iSCSI, most of the processing of the data (both TCP and iSCSI) is handled in software and is much slower than Fibre Channel which is handled completely in hardware. The overhead incurred in mapping every SCSI command onto an equivalent iSCSI transaction is excessive. For many the solution is to do away with iSCSI software initiators and invest in specialized cards that can offload TCP/IP and iSCSI processing from a server's CPU. These specialized cards are sometimes referred to as an iSCSI Host Bus Adaptor (HBA) or a TCP Offload Engine (TOE) card. Also consider that 10-Gigabit Ethernet is a reality today!

As with any new technology, iSCSI comes with its own set of acronyms and terminology. For the purpose of this article, it is only important to understand the difference between an iSCSI initiator and an iSCSI target.

iSCSI Initiator

Basically, an iSCSI initiator is a client device that connects and initiates requests to some service offered by a server (in this case an iSCSI target). The iSCSI initiator software will need to exist on each of the Oracle RAC nodes (linux1 and linux2).

An iSCSI initiator can be implemented using either software or hardware. Software iSCSI initiators are available for most major operating system platforms. For this article, we will be using the free Linux Open-iSCSI software driver found in the iscsi-initiator-utils RPM. The iSCSI software initiator is generally used with a standard network interface card (NIC) — a Gigabit Ethernet card in most cases. A hardware initiator is an iSCSI HBA (or a TCP Offload Engine (TOE) card), which is basically just a specialized Ethernet card with a SCSI ASIC on-board to offload all the work (TCP and SCSI commands) from the system CPU. iSCSI HBAs are available from a number of vendors, including Adaptec, Alacritech, Intel, and QLogic.

iSCSI Target

An iSCSI target is the "server" component of an iSCSI network. This is typically the storage device that contains the information you want and answers requests from the initiator(s). For the purpose of this article, the node openfiler1 will be the iSCSI target.

So with all of this talk about iSCSI, does this mean the death of Fibre Channel anytime soon? Probably not. Fibre Channel has clearly demonstrated its capabilities over the years with its capacity for extremely high speeds, flexibility, and robust reliability. Customers who have strict requirements for high performance storage, large complex connectivity, and mission critical reliability will undoubtedly continue to choose Fibre Channel.

Before closing out this section, I thought it would be appropriate to present the following chart that shows speed comparisons of the various types of disk interfaces and network technologies. For each interface, I provide the maximum transfer rates in kilobits (kb), kilobytes (KB), megabits (Mb), megabytes (MB), gigabits (Gb), and gigabytes (GB) per second with some of the more common ones highlighted in grey.

Disk Interface / Network / BUS	Speed
	Kb	KB	mb	MB	Gb	GB
	Serial	115	14.375	0.115	0.014
Parallel - (standard)	920	115	0.92	0.115
10Base-T Ethernet			10	1.25
IEEE 802.11b wireless Wi-Fi - (2.4 GHz band)			11	1.375
USB 1.1			12	1.5
Parallel - (ECP/EPP)			24	3
SCSI-1			40	5
IEEE 802.11g wireless WLAN - (2.4 GHz band)			54	6.75
SCSI-2 - (Fast SCSI / Fast Narrow SCSI)			80	10
100Base-T Ethernet - (Fast Ethernet)			100	12.5
ATA/100 - (parallel)			100	12.5
IDE			133.6	16.7
Fast Wide SCSI - (Wide SCSI)			160	20
Ultra SCSI - (SCSI-3 / Fast-20 / Ultra Narrow)			160	20
Ultra IDE			264	33
Wide Ultra SCSI - (Fast Wide 20)			320	40
Ultra2 SCSI			320	40
FireWire 400 - (IEEE1394a)			400	50
USB 2.0			480	60
Wide Ultra2 SCSI			640	80
Ultra3 SCSI			640	80
FireWire 800 - (IEEE1394b)			800	100
1000Base-T Ethernet - (Gigabit Ethernet)			1000	125	1
PCI - (33 MHz / 32-bit)			1064	133	1.064
Serial ATA I - (SATA I)			1200	150	1.2
Wide Ultra3 SCSI			1280	160	1.28
Ultra160 SCSI			1280	160	1.28
PCI - (33 MHz / 64-bit)			2128	266	2.128
PCI - (66 MHz / 32-bit)			2128	266	2.128
AGP 1x - (66 MHz / 32-bit)			2128	266	2.128
Serial ATA II - (SATA II)			2400	300	2.4
Ultra320 SCSI			2560	320	2.56
FC-AL Fibre Channel			3200	400	3.2
PCI-Express x1 - (bidirectional)			4000	500	4
PCI - (66 MHz / 64-bit)			4256	532	4.256
AGP 2x - (133 MHz / 32-bit)			4264	533	4.264
Serial ATA III - (SATA III)			4800	600	4.8
PCI-X - (100 MHz / 64-bit)			6400	800	6.4
PCI-X - (133 MHz / 64-bit)				1064	8.512	1
AGP 4x - (266 MHz / 32-bit)				1066	8.528	1
10G Ethernet - (IEEE 802.3ae)				1250	10	1.25
PCI-Express x4 - (bidirectional)				2000	16	2
AGP 8x - (533 MHz / 32-bit)				2133	17.064	2.1
PCI-Express x8 - (bidirectional)				4000	32	4
PCI-Express x16 - (bidirectional)				8000	64	8

5. Hardware and Costs

The hardware used to build our example Oracle RAC 11g environment consists of three Linux servers (two Oracle RAC nodes and one Network Storage Server) and components that can be purchased at many local computer stores or over the Internet (i.e. Stallard Technologies, Inc.).

Oracle RAC Node 1 - (linux1)
Dell Dimension 2400 Series - Intel(R) Pentium(R) 4 Processor at 2.80GHz - 2GB DDR SDRAM (at 333MHz) - 40GB 7200 RPM Internal Hard Drive - Integrated Intel 3D AGP Graphics - Integrated 10/100 Ethernet - (Broadcom BCM4401) - CDROM (48X Max Variable) - 3.5" Floppy - No Keyboard, Monitor, or Mouse - (Connected to KVM Switch)	US$620
1 - Ethernet LAN Card Used for RAC interconnect to linux2 and Openfiler networked storage. Each Linux server for Oracle RAC should contain two NIC adapters. The Dell Dimension includes an integrated 10/100 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private network (RAC interconnect and Openfiler networked storage). Select the appropriate NIC adapter that is compatible with the maximum data transmission speed of the network switch to be used for the private network. For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards) for the private network. Gigabit Ethernet Intel 10/100/1000Mbps PCI Desktop Adapter - (PWLA8391GT)	US$35
Oracle RAC Node 2 - (linux2)
Dell Dimension 2400 Series - Intel(R) Pentium(R) 4 Processor at 2.80GHz - 2GB DDR SDRAM (at 333MHz) - 40GB 7200 RPM Internal Hard Drive - Integrated Intel 3D AGP Graphics - Integrated 10/100 Ethernet - (Broadcom BCM4401) - CDROM (48X Max Variable) - 3.5" Floppy - No Keyboard, Monitor, or Mouse - (Connected to KVM Switch)	US$620
1 - Ethernet LAN Card Used for RAC interconnect to linux1 and Openfiler networked storage. Each Linux server for Oracle RAC should contain two NIC adapters. The Dell Dimension includes an integrated 10/100 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private network (RAC interconnect and Openfiler networked storage). Select the appropriate NIC adapter that is compatible with the maximum data transmission speed of the network switch to be used for the private network. For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards) for the private network. Gigabit Ethernet Intel 10/100/1000Mbps PCI Desktop Adapter - (PWLA8391GT)	US$35
Network Storage Server - (openfiler1)
Dell PowerEdge 1800 - Dual 3.0GHz Xeon / 1MB Cache / 800FSB (SL7PE) - 2GB of ECC Memory - 40GB IDE Hard Drive - Single embedded Intel 10/100/1000 Gigabit NIC - 4 x Integrated USB 2.0 Ports - CDROM (48X Max Variable) - No Keyboard, Monitor, or Mouse - (Connected to KVM Switch)	US$650
1 - Ethernet LAN Card Used for networked storage on the private network. The Network Storage Server (Openfiler server) should contain two NIC adapters. The Dell PowerEdge 1800 machine included an integrated 10/100/1000 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private network (Openfiler networked storage). Select the appropriate NIC adapter that is compatible with the maximum data transmission speed of the network switch to be used for the private network. For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards) for the private network. Gigabit Ethernet Intel 10/100/1000Mbps PCI Desktop Adapter - (PWLA8391GT)	US$35
Miscellaneous Components
Storage Device(s) - Internal Hard Drive The Openfiler server being used for this article (openfiler1) includes an internal 40GB hard drive which will be used to store the Openfiler software. For the database storage, I installed a separate internal Maxtor SATA hard drive (500GB) which connected to the Openfiler server through an integrated SATA controller. The Openfiler server will be configured to use this disk for iSCSI based storage and will be used in our Oracle RAC 11g configuration to store the shared files required by Oracle Clusterware as well as all Oracle ASM volumes. Note: Please be aware that any type of hard disk (internal or external) should work for database storage as long as it can be recognized by the network storage server (Openfiler) and has adequate space. Maxtor SATA hard drive (500GB)	US$260
1 - Ethernet Switch Used for the interconnect between linux1-priv and linux2-priv. This switch will also be used for network storage traffic for Openfiler. For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards) for the private network. Gigabit Ethernet D-Link 8-port 10/100/1000 Desktop Switch - (DGS-2208)	US$50
6 - Network Cables Category 5e patch cable - (Connect linux1 to public network) Category 5e patch cable - (Connect linux2 to public network) Category 5e patch cable - (Connect openfiler1 to public network) Category 5e patch cable - (Connect linux1 to interconnect Ethernet switch) Category 5e patch cable - (Connect linux2 to interconnect Ethernet switch) Category 5e patch cable - (Connect openfiler1 to interconnect Ethernet switch)	US$5 US$5 US$5 US$5 US$5 US$5
Optional Components
KVM Switch This article requires access to the console of all nodes (servers) in order to install the operating system and perform several of the configuration tasks. When managing a very small number of servers, it might make sense to connect each server with its own monitor, keyboard, and mouse in order to access its console. However, as the number of servers to manage increases, this solution becomes unfeasible. A more practical solution would be to configure a dedicated computer which would include a single monitor, keyboard, and mouse that would have direct access to the console of each server. This solution is made possible using a Keyboard, Video, Mouse Switch —better known as a KVM Switch. A KVM switch is a hardware device that allows a user to control multiple computers from a single keyboard, video monitor and mouse. Avocent provides a high quality and economical 4-port switch which includes four 6' cables: SwitchView® 1000 - (4SV1000BND1-001) For a detailed explanation and guide on the use and KVM switches, please see the article "KVM Switches For the Home and the Enterprise".	US$340
Total	US$2,675

We are about to start the installation process. Now that we have talked about the hardware that will be used in this example, let's take a conceptual look at what the environment would look like (click on the graphic below to view larger image):

Figure 1: Oracle RAC 11g Release 1 Testing Configuration

As we start to go into the details of the installation, it should be noted that most of the tasks within this document will need to be performed on both Oracle RAC nodes (linux1 and linux2). I will indicate at the beginning of each section whether or not the task(s) should be performed on both Oracle RAC nodes or on the network storage server (openfiler1).

6. Install the Linux Operating System
Perform the following installation on both Oracle RAC nodes in the cluster!

After procuring the required hardware, it is time to start the configuration process. The first task we need to perform is to install the Linux operating system. As already mentioned, this article will use CentOS 5.1. Although I have used Red Hat Fedora in the past, I wanted to switch to a Linux environment that would guarantee all of the functionality contained with Oracle. This is where CentOS comes in. The CentOS project takes the Red Hat Enterprise Linux 5 source RPMs and compiles them into a free clone of the Red Hat Enterprise Server 5 product. This provides a free and stable version of the Red Hat Enterprise Linux 5 (AS/ES) operating environment that I can now use for testing different Oracle configurations. I have moved away from Fedora as I need a stable environment that is not only free, but as close to the actual Oracle supported operating system as possible. While CentOS is not the only project performing the same functionality, I tend to stick with it as it is stable and reacts fast with regards to updates by Red Hat.

Downloading CentOS
Use the links below to download CentOS 5.1 for either x86 or x86_64 depending on your hardware architecture. The following example uses x86. After downloading CentOS, you will then want to burn each of the ISO images to CD.
- CentOS.org
If you are downloading the above ISO files to a MS Windows machine, there are many options for burning these images (ISO files) to a CD. You may already be familiar with and have the proper software to burn images to CD. If you are not familiar with this process and do not have the required software to burn images to CD, here are just two (of many) software packages that can be used:
- UltraISO
- Magic ISO Maker
Installing CentOS

This section provides a summary of the screens used to install CentOS. For more detailed installation instructions, it is possible to use the manuals from Red Hat Linux http://www.redhat.com/docs/manuals/. I would suggest, however, that the instructions I have provided below be used for this Oracle RAC 11g configuration.

Before installing the Linux operating system on both nodes, you should have the two NIC interfaces (cards) installed.

After downloading and burning the CentOS images (ISO files) to CD, insert CentOS Disk #1 into the first server (linux1 in this example), power it on, and answer the installation screen prompts as noted below. After completing the Linux installation on the first node, perform the same Linux installation on the second node while substituting the node name linux1 for linux2 and the different IP addresses were appropriate.

Boot Screen

The first screen is the CentOS boot screen. At the boot: prompt, hit [Enter] to start the installation process.

Media Test

When asked to test the CD media, tab over to [Skip] and hit [Enter]. If there were any errors, the media burning software would have warned us. After several seconds, the installer should then detect the video card, monitor, and mouse. The installer then goes into GUI mode.

Welcome to CentOS

At the welcome screen, click [Next] to continue.

Language / Keyboard Selection

The next two screens prompt you for the Language and Keyboard settings. Make the appropriate selection for your configuration and click [Next] to continue.

Detect Previous Installation

Note that if the installer detects a previous version of CentOS, it will ask if you would like to "Install CentOS" or "Upgrade an existing Installation". Always select to "Install CentOS".

Disk Partitioning Setup

Keep the default selection to [Remove linux partitions on selected drives and create default layout] and check the option to [Review and modify partitioning layout]. Click [Next] to continue.
You will then be prompted with a dialog window asking if you really want to remove all partitions. Click [Yes] to acknowledge this warning.

Partitioning

The installer will then allow you to view (and modify if needed) the disk partitions it automatically selected. For most automatic layouts, the installer will choose 100MB for /boot, double the amount of RAM (systems with < 2GB RAM) or an amount equal to RAM (systems with > 2GB RAM) for swap, and the rest going to the root (/) partition. Starting with EL 4, the installer will create the same disk configuration as just noted but will create them using the Logical Volume Manager (LVM). For example, it will partition the first hard drive (/dev/hda for my configuration) into two partitions — one for the /boot partition (/dev/hda1) and the remainder of the disk dedicate to a LVM named VolGroup00 (/dev/hda2). The LVM Volume Group (VolGroup00) is then partitioned into two LVM partitions - one for the root filesystem (/) and another for swap.
The main concern during the partitioning phase is to ensure enough swap space is allocated as required by Oracle (which is a multiple of the available RAM). The following is Oracle's requirement for swap space:

Available RAM Swap Space Required

Between 1 GB and 2 GB 1.5 times the size of RAM

Between 2 GB and 8 GB Equal to the size of RAM

More than 8 GB .75 times the size of RAM

For the purpose of this install, I will accept all automatically preferred sizes. (Including 2GB for swap since I have 2GB of RAM installed.)
If for any reason, the automatic layout does not configure an adequate amount of swap space, you can easily change that from this screen. To increase the size of the swap partition, [Edit] the volume group VolGroup00. This will bring up the "Edit LVM Volume Group: VolGroup00" dialog. First, [Edit] and decrease the size of the root file system (/) by the amount you want to add to the swap partition. For example, to add another 512MB to swap, you would decrease the size of the root file system by 512MB (i.e. 36,032MB - 512MB = 35,520MB). Now add the space you decreased from the root file system (512MB) to the swap partition. When completed, click [OK] on the "Edit LVM Volume Group: VolGroup00" dialog.
Once you are satisfied with the disk layout, click [Next] to continue.

Boot Loader Configuration

The installer will use the GRUB boot loader by default. To use the GRUB boot loader, accept all default values and click [Next] to continue.

Network Configuration

I made sure to install both NIC interfaces (cards) in each of the Linux machines before starting the operating system installation. This screen should have successfully detected each of the network devices. Since we will be using this machine to host an Oracle database, there will be several changes that need to be made to the network configuration. The settings you make here will, of course, depend on your network configuration. The key point to make is that the machine should never be configured with DHCP since it will be used to host the Oracle database server. You will need to configure the machine with static IP addresses. You will also need to configure the server with a real host name.

First, make sure that each of the network devices are checked to [Active on boot]. The installer may choose to not activate eth1 by default.
Second, [Edit] both eth0 and eth1 as follows. Verify that the option "Enable IPv4 support" is selected. Click off the option for "Use dynamic IP configuration (DHCP)" and configure a static IP address and Netmask for your environment. Click off the option to "Enable IPv6 support". You may choose to use different IP addresses for both eth0 and eth1 that I have documented in this guide and that is OK. Put eth1 (the interconnect) on a different subnet than eth0 (the public network):
eth0:
- Check ON the option to [Enable IPv4 support]
- Check OFF the option to [Use dynamic IP configuration (DHCP)] - (select Manual configuration)
IPv4 Address: 192.168.1.100
Prefix (Netmask): 255.255.255.0
- Check OFF the option to [Enable IPv6 support]
eth1:
- Check ON the option to [Enable IPv4 support]
- Check OFF the option to [Use dynamic IP configuration (DHCP)] - (select Manual configuration)
IPv4 Address: 192.168.2.100
Prefix (Netmask): 255.255.255.0
- Check OFF the option to [Enable IPv6 support]
Continue by manually setting your hostname. I used "linux1" for the first node and "linux2" for the second. Finish this dialog off by supplying your gateway and DNS servers.

Time Zone Selection

Select the appropriate time zone for your environment and click [Next] to continue.

Set Root Password

Select a root password and click [Next] to continue.

Package Installation Defaults

By default, CentOS installs most of the software required for a typical server. There are several other packages (RPMs), however, that are required to successfully install the Oracle Database software. For the purpose of this article, select the radio button [Customize now] and click [Next] to continue.
This is where you pick the packages to install. Most of the packages required for the Oracle software are grouped into "Package Groups" (i.e. Application -> Editors). Since these nodes will be hosting the Oracle Clusterware and Oracle RAC software, verify that at least the following package groups are selected for install. For many of the Linux package groups, not all of the packages associated with that group get selected for installation. (Note the "Optional packages" button after selecting a package group.) So although the package group gets selected for install, some of the packages required by Oracle do not get installed. In fact, there are some packages that are required by Oracle that do not belong to any of the available package groups (i.e. libaio-devel). Not to worry. A complete list of required packages for Oracle Clusterware 11g and Oracle RAC 11g will be provided at the end of this section. These packages will need to be manually installed from the CentOS CDs after the operating system install. For now, install the following package groups:
- Desktop Environments
  - GNOME Desktop Environment
- Applications
  - Editors
  - Graphical Internet
  - Text-based Internet
- Development
  - Development Libraries
  - Development Tools
  - Legacy Software Development
- Servers
  - Server Configuration Tools
- Base System
  - Administration Tools
  - Base
  - Java
  - Legacy Software Support
  - System Tools
  - X Window System
In addition to the above packages, select any additional packages you wish to install for this node. After selecting the packages to install click [Next] to continue.

About to Install

This screen is basically a confirmation screen. Click [Continue] to start the installation. During the installation process, you will be asked to switch CDs depending on which packages you selected to install.

Congratulations

And that's it. You have successfully installed CentOS on the first node (linux1). The installer will eject the CD from the CD-ROM drive. Take out the CD and click [Reboot] to reboot the system.

Post Installation Wizard Welcome Screen

When the system boots into CentOS for the first time, it will prompt you with another Welcome screen for the "Post Installation Wizard". The post installation wizard allows you to make final O/S configuration settings. On the "Welcome" screen, click [Forward] to continue.

Firewall

On this screen, make sure to select the [Disabled] option and click [Forward] to continue.
You will be prompted with a warning dialog about not setting the firewall. When this occurs, click [Yes] to continue.

SELinux

On the SELinux screen, choose the [Disabled] option and click [Forward] to continue.
You will be prompted with a warning dialog warning that changing the SELinux setting will require rebooting the system so the entire file system can be relabeled. When this occurs, click [Yes] to acknowledge a reboot of the system will occur after firstboot (Post Installation Wizard) is completed.

Kdump

Accept the default setting on the Kdump screen (disabled) and click [Forward] to continue.

Date and Time Settings

Adjust the date and time settings if necessary and click [Forward] to continue.

Create User

Create any additional (non-oracle) operating system user accounts if desired and click [Forward] to continue. For the purpose of this article, I will not be creating any additional operating system accounts. I will be creating the "oracle" user account during the Oracle database installation later in this guide.
If you chose not to define any additional operating system user accounts, click [Continue] to acknowledge the warning dialog.

Sound Card

This screen will only appear if the wizard detects a sound card. On the sound card screen click [Forward] to continue.
Additional CDs

On the "Additional CDs" screen click [Finish] to continue.

Reboot System

Given we changed the SELinux option (to disabled), we are prompted to reboot the system. Click [OK] to reboot the system for normal use.

Login Screen

After rebooting the machine, you are presented with the login screen. Login using the "root" user account and the password you provided during the installation.

Perform the same installation on the second node

After completing the Linux installation on the first node, repeat the above steps for the second node (linux2). When configuring the machine name and networking, ensure to configure the appropriate values for your environment. For my installation, this is what I configured for linux2:

First, make sure that each of the network devices are checked to [Active on boot]. The installer may choose to not activate eth1.
Second, [Edit] both eth0 and eth1 as follows. Verify that the option "Enable IPv4 support" is selected. Click off the option for "Use dynamic IP configuration (DHCP)" and configure a static IP address and Netmask for your environment. Click off the option to "Enable IPv6 support". You may choose to use different IP addresses for both eth0 and eth1 that I have documented in this guide and that is OK. Put eth1 (the interconnect) on a different subnet than eth0 (the public network):
eth0:
- Check ON the option to [Enable IPv4 support]
- Check OFF the option to [Use dynamic IP configuration (DHCP)] - (select Manual configuration)
IPv4 Address: 192.168.1.101
Prefix (Netmask): 255.255.255.0
- Check OFF the option to [Enable IPv6 support]
eth1:
- Check ON the option to [Enable IPv4 support]
- Check OFF the option to [Use dynamic IP configuration (DHCP)] - (select Manual configuration)
IPv4 Address: 192.168.2.101
Prefix (Netmask): 255.255.255.0
- Check OFF the option to [Enable IPv6 support]
Continue by setting your hostname manually. I used "linux2" for the second node. Finish this dialog off by supplying your gateway and DNS servers.
7. Install Required Linux Packages for Oracle RAC
Install the following required Linux packages on both Oracle RAC nodes in the cluster!

After installing CentOS, the next step is to verify and install all packages (RPMs) required by both Oracle Clusterware and Oracle RAC. The Oracle Universal Installer (OUI) performs checks on your machine during installation to verify that it meets the appropriate operating system package requirements. To ensure that these checks complete successfully, verify the software requirements documented in this section before starting the Oracle installs.
Although many of the required packages for Oracle were installed during the CentOS installation, several will be missing either because they were considered optional within the package group or simply didn't exist in any package group!
The packages listed in this section (or later versions) are required for Oracle Clusterware 11g Release 1 and Oracle RAC 11g Release 1 running on the x86 (32-bit) CentOS 5 platform.
- binutils-2.17.50.0.6-2.el5
- compat-libstdc++-33-3.2.3-61
- elfutils-libelf-0.97-5
- elfutils-libelf-devel-0.125
- glibc-2.5-12
- glibc-common-2.5-12
- glibc-devel-2.5-12
- gcc-4.1.1-52
- gcc-c++-4.1.1-52
- libaio-0.3.106
- libaio-devel-0.3.106
- libgcc-4.1.1-52
- libstdc++-4.1.1
- libstdc++-devel-4.1.1-52
- make-3.81-1.1
- sysstat-7.0.0
- unixODBC-2.2.11
- unixODBC-devel-2.2.11
Each of the packages listed above can be found on CD #1, CD #2, or CD #3 on the CentOS 5 CDs. While it is possible to query each individual package to determine which ones are missing and need to be installed, an easier method is to run the rpm -Uvh PackageName command from the three CDs as follows. For packages that already exist and are up to date, the RPM command will simply ignore the install and print a warning message to the console that the package is already installed.
```
# From CentOS 5.1 - [CD #1]
mkdir -p /media/cdrom
mount -r /dev/cdrom /media/cdrom
cd /media/cdrom/CentOS
rpm -Uvh binutils-2.*
rpm -Uvh elfutils-libelf-0.*
rpm -Uvh glibc-2.*
rpm -Uvh glibc-common-2.*
rpm -Uvh libaio-0.*
rpm -Uvh libgcc-4.*
rpm -Uvh libstdc++-4.*
rpm -Uvh make-3.*
cd /
eject

# From CentOS 5.1 - [CD #2]
mount -r /dev/cdrom /media/cdrom
cd /media/cdrom/CentOS
rpm -Uvh elfutils-libelf-devel-0.*
rpm -Uvh glibc-devel-2.*
rpm -Uvh gcc-4.*
rpm -Uvh gcc-c++-4.*
rpm -Uvh libstdc++-devel-4.*
rpm -Uvh unixODBC-2.*
cd /
eject

# From CentOS 5.1 - [CD #3]
mount -r /dev/cdrom /media/cdrom
cd /media/cdrom/CentOS
rpm -Uvh compat-libstdc++-33*
rpm -Uvh libaio-devel-0.*
rpm -Uvh sysstat-7.*
rpm -Uvh unixODBC-devel-2.*
cd /
eject
```

Available RAM	Swap Space Required
Between 1 GB and 2 GB	1.5 times the size of RAM
Between 2 GB and 8 GB	Equal to the size of RAM
More than 8 GB	.75 times the size of RAM

8. Network Configuration

Perform the following network configuration on both Oracle RAC nodes in the cluster!

Although we configured several of the network settings during the installation of CentOS, it is important to not skip this section as it contains critical steps that are required for a successful RAC environment.

Introduction to Network Settings

During the Linux O/S install we already configured the IP address and host name for both of the Oracle RAC nodes. We now need to configure the /etc/hosts file as well as adjusting several of the network settings for the interconnect.

Both of the Oracle RAC nodes should have one static IP address for the public network and one static IP address for the private cluster interconnect. Do not use DHCP naming for the public IP address or the interconnects; you need static IP addresses! The private interconnect should only be used by Oracle to transfer Cluster Manager and Cache Fusion related data along with data for the network storage server (Openfiler). Note that Oracle does not support using the public network interface for the interconnect. You must have one network interface for the public network and another network interface for the private interconnect. For a production RAC implementation, the interconnect should be at least gigabit (or more) and only be used by Oracle as well as having the network storage server (Openfiler) on a separate gigabit network.

Configuring Public and Private Network

In our two node example, we need to configure the network on both Oracle RAC nodes for access to the public network as well as their private interconnect.

The easiest way to configure network settings in Red Hat Enterprise Linux is with the program "Network Configuration". This application can be started from the command-line as the "root" user account as follows:


# su -
# /usr/bin/system-config-network &

Do not use DHCP naming for the public IP address or the interconnects - we need static IP addresses!

Using the Network Configuration application, you need to configure both NIC devices as well as the /etc/hosts file. Both of these tasks can be completed using the Network Configuration GUI. Notice that the /etc/hosts settings are the same for both nodes and that I removed any entry that has to do with IPv6 (for example, ::1 localhost6.localdomain6 localhost6).

Our example configuration will use the following settings:

Oracle RAC Node 1 - (linux1)
Device	IP Address	Subnet	Gateway	Purpose
eth0	192.168.1.100	255.255.255.0	192.168.1.1	Connects linux1 to the public network
eth1	192.168.2.100	255.255.255.0		Connects linux1 (interconnect) to linux2 (linux2-priv)
/etc/hosts
`127.0.0.1 localhost.localdomain localhost # Public Network - (eth0) 192.168.1.100 linux1 192.168.1.101 linux2 # Private Interconnect - (eth1) 192.168.2.100 linux1-priv 192.168.2.101 linux2-priv # Public Virtual IP (VIP) addresses - (eth0:1) 192.168.1.200 linux1-vip 192.168.1.201 linux2-vip # Private Storage Network for Openfiler - (eth1) 192.168.1.195 openfiler1 192.168.2.195 openfiler1-priv`

Oracle RAC Node 2 - (linux2)
Device	IP Address	Subnet	Gateway	Purpose
eth0	192.168.1.101	255.255.255.0	192.168.1.1	Connects linux2 to the public network
eth1	192.168.2.101	255.255.255.0		Connects linux2 (interconnect) to linux1 (linux1-priv)
/etc/hosts
`127.0.0.1 localhost.localdomain localhost # Public Network - (eth0) 192.168.1.100 linux1 192.168.1.101 linux2 # Private Interconnect - (eth1) 192.168.2.100 linux1-priv 192.168.2.101 linux2-priv # Public Virtual IP (VIP) addresses - (eth0:1) 192.168.1.200 linux1-vip 192.168.1.201 linux2-vip # Private Storage Network for Openfiler - (eth1) 192.168.1.195 openfiler1 192.168.2.195 openfiler1-priv`

Note that the virtual IP addresses only need to be defined in the /etc/hosts file (or your DNS) for both Oracle RAC nodes. The public virtual IP addresses will be configured automatically by Oracle when you run the Oracle Universal Installer, which starts Oracle's Virtual Internet Protocol Configuration Assistant (VIPCA). All virtual IP addresses will be activated when the srvctl start nodeapps -n <node_name> command is run. This is the Host Name/IP Address that will be configured in the client(s) tnsnames.ora file (more details later).

Although I am getting ahead of myself, this is the Host Name/IP Address that will be configured in the client(s) tnsnames.ora file for each Oracle Net Service Name. All of this will be explained much later in this article!

In the screen shots below, only Oracle RAC Node 1 (linux1) is shown. Be sure to make all the proper network settings to both Oracle RAC nodes.

Figure 2: Network Configuration Screen - Node 1 (linux1)

Figure 3: Ethernet Device Screen - eth0 (linux1)

Figure 4: Ethernet Device Screen - eth1 (linux1)

Figure 5: Network Configuration Screen - /etc/hosts (linux1)

Once the network is configured, you can use the ifconfig command to verify everything is working. The following example is from linux1:


# /sbin/ifconfig -a

eth0      Link encap:Ethernet  HWaddr 00:14:6C:76:5C:71
          inet addr:192.168.1.100  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::214:6cff:fe76:5c71/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:474176 errors:0 dropped:0 overruns:0 frame:0
          TX packets:295424 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:646960868 (616.9 MiB)  TX bytes:138068364 (131.6 MiB)
          Interrupt:177 Base address:0x6f00

eth1      Link encap:Ethernet  HWaddr 00:0E:0C:64:D1:E5
          inet addr:192.168.2.100  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20e:cff:fe64:d1e5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1452602 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1804263 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1165724771 (1.0 GiB)  TX bytes:2205826792 (2.0 GiB)
          Base address:0xddc0 Memory:fe9c0000-fe9e0000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1371 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1371 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2169814 (2.0 MiB)  TX bytes:2169814 (2.0 MiB)

sit0      Link encap:IPv6-in-IPv4
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

About Virtual IP

Why do we have a Virtual IP (VIP) in 11g? Why does it just return a dead connection when its primary node fails?

It's all about availability of the application. When a node fails, the VIP associated with it is supposed to be automatically failed over to some other node. When this occurs, two things happen.

The new node re-arps the world indicating a new MAC address for the address. For directly connected clients, this usually causes them to see errors on their connections to the old address.
Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

This means that when the client issues SQL to the node that is now down, or traverses the address list while connecting, rather than waiting on a very long TCP/IP time-out (~10 minutes), the client receives a TCP reset. In the case of SQL, this is ORA-3113. In the case of connect, the next address in tnsnames is used.

Without using VIPs, clients connected to a node that died will often wait a 10 minute TCP timeout period before getting an error. As a result, you don't really have a good HA solution without using VIPs.

Source - Metalink: "RAC Frequently Asked Questions" (Note:220970.1)

Confirm the RAC Node Name is Not Listed in Loopback Address

Ensure that the node names (linux1 or linux2) are not included for the loopback address in the /etc/hosts file. If the machine name is listed in the in the loopback address entry as below:

127.0.0.1 linux1 localhost.localdomain localhost

it will need to be removed as shown below:

127.0.0.1 localhost.localdomain localhost

If the RAC node name is listed for the loopback address, you will receive the following error during the RAC installation:

ORA-00603: ORACLE server session terminated by fatal error

ORA-29702: error occurred in Cluster Group Service operation

Adjusting Network Settings

With Oracle 9.2.0.1 and later, Oracle makes use of UDP as the default protocol on Linux for inter-process communication (IPC), such as Cache Fusion and Cluster Manager buffer transfers between instances within the RAC cluster.

Oracle strongly suggests to adjust the default and maximum receive buffer size (SO_RCVBUF socket option) to 4MB and the default and maximum send buffer size (SO_SNDBUF socket option) to 256KB.

The receive buffers are used by TCP and UDP to hold received data until it is read by the application. The receive buffer cannot overflow because the peer is not allowed to send data beyond the buffer size window. This means that datagrams will be discarded if they don't fit in the socket receive buffer, potentially causing the sender to overwhelm the receiver.

The default and maximum window size can be changed in the /proc file system without reboot:


# su - root

# sysctl -w net.core.rmem_default=4194304
net.core.rmem_default = 4194304

# sysctl -w net.core.rmem_max=4194304
net.core.rmem_max = 4194304

# sysctl -w net.core.wmem_default=262144
net.core.wmem_default = 262144

# sysctl -w net.core.wmem_max=262144
net.core.wmem_max = 262144

The above commands make the changes to the already running O/S. You should now make the above changes permanent (for each reboot) by adding the following lines to the /etc/sysctl.conf file for both nodes in your RAC cluster:


# +---------------------------------------------------------+
# | ADJUSTING NETWORK SETTINGS                              |
# +---------------------------------------------------------+
# | With Oracle 9.2.0.1 and onwards, Oracle now makes use   |
# | of UDP as the default protocol on Linux for             |
# | inter-process communication (IPC), such as Cache Fusion |
# | and Cluster Manager buffer transfers between instances  |
# | within the RAC cluster. Oracle strongly suggests to     |
# | adjust the default and maximum receive buffer size      |
# | (SO_RCVBUF socket option) to 4096 KB, and the default   |
# | and maximum send buffer size (SO_SNDBUF socket option)  |
# | to 256 KB. The receive buffers are used by TCP and UDP  |
# | to hold received data until it is read by the           |
# | application. The receive buffer cannot overflow because |
# | the peer is not allowed to send data beyond the buffer  |
# | size window. This means that datagrams will be          |
# | discarded if they don't fit in the socket receive       |
# | buffer. This could cause the sender to overwhelm the    |
# | receiver.                                               |
# +---------------------------------------------------------+

# +---------------------------------------------------------+
# | Default setting in bytes of the socket "receive" buffer |
# | which may be set by using the SO_RCVBUF socket option.  |
# +---------------------------------------------------------+
net.core.rmem_default=4194304

# +---------------------------------------------------------+
# | Maximum setting in bytes of the socket "receive" buffer |
# | which may be set by using the SO_RCVBUF socket option.  |
# +---------------------------------------------------------+
net.core.rmem_max=4194304

# +---------------------------------------------------------+
# | Default setting in bytes of the socket "send" buffer    |
# | which may be set by using the SO_SNDBUF socket option.  |
# +---------------------------------------------------------+
net.core.wmem_default=262144

# +---------------------------------------------------------+
# | Maximum setting in bytes of the socket "send" buffer    |
# | which may be set by using the SO_SNDBUF socket option.  |
# +---------------------------------------------------------+
net.core.wmem_max=262144

Check and turn off UDP ICMP rejections:

During the Linux installation process, I indicated to not configure the firewall option. By default the option to configure a firewall is selected by the installer. This has burned me several times so I like to do a double-check that the firewall option is not configured and to ensure udp ICMP filtering is turned off.

If UDP ICMP is blocked or rejected by the firewall, the Oracle Clusterware software will crash after several minutes of running. When the Oracle Clusterware process fails, you will have something similar to the following in the <machine_name>_evmocr.log file:


08/29/2005 22:17:19
oac_init:2: Could not connect to server, clsc retcode = 9
08/29/2005 22:17:19
a_init:12!: Client init unsuccessful : [32]
ibctx:1:ERROR: INVALID FORMAT
proprinit:problem reading the bootblock or superbloc 22

When experiencing this type of error, the solution is to remove the udp ICMP (iptables) rejection rule - or to simply have the firewall option turned off. The Oracle Clusterware software will then start to operate normally and not crash. The following commands should be executed as the root user account:

Check to ensure that the firewall option is turned off. If the firewall option is stopped (like it is in my example below) you do not have to proceed with the following steps.


# /etc/rc.d/init.d/iptables status
Firewall is stopped.

If the firewall option is operating you will need to first manually disable UDP ICMP rejections:


# /etc/rc.d/init.d/iptables stop

Flushing firewall rules: [  OK  ]
Setting chains to policy ACCEPT: filter [  OK  ]
Unloading iptables modules: [  OK  ]

Then, to turn UDP ICMP rejections off for next server reboot (which should always be turned off):

# chkconfig iptables off

9. Install Openfiler
Perform the following installation on the network storage server (openfiler1)!

With the network configured on both Oracle RAC nodes, the next step is to install the Openfiler software to the network storage server (openfiler1). Later in this article, the network storage server will be configured as an iSCSI storage device for all Oracle RAC 11g shared storage requirements.
Powered by rPath Linux, Openfiler is a free browser-based network storage management utility that delivers file-based Network Attached Storage (NAS) and block-based Storage Area Networking (SAN) in a single framework. The entire software stack interfaces with open source applications such as Apache, Samba, LVM2, ext3, Linux NFS and iSCSI Enterprise Target. Openfiler combines these ubiquitous technologies into a small, easy to manage solution fronted by a powerful web-based management interface.
Openfiler supports CIFS, NFS, HTTP/DAV, and FTP, however, we will only be making use of its iSCSI capabilities to implement an inexpensive SAN for the shared storage components required by Oracle RAC 11g. The Openfiler server being used for this article (openfiler1) includes an internal 40GB hard drive which will be used to store the Openfiler software. For the database storage, I installed a separate internal Maxtor SATA hard drive (500GB) which connected to the Openfiler server through a Maxtor SATA PCI Card. The Openfiler server will be configured to use this disk for iSCSI based storage and will be used in our Oracle RAC 11g configuration to store the shared files required by Oracle Clusterware as well as all Oracle ASM volumes.

Please be aware that any type of hard disk (internal or external) should work for database storage as long as it can be recognized by the network storage server (Openfiler) and has adequate space.

To learn more about Openfiler, please visit their website at http://www.openfiler.com/
Download Openfiler
Use the links below to download Openfiler 2.2 x86 (respin 2) for either x86 or x86_64 depending on your hardware architecture. This example uses x86. After downloading Openfiler, you will then need to burn the ISO image to CD.

Note: At this time, Openfiler 2.3 is not supported to work with this article. Please use Openfiler 2.2 (respin 2). My current plans are to have this article updated and fully tested to work with CentOS 5.3 and Openfiler 2.3 (Final) by Q3 2009.
openfiler-2.2-x86-disc1.iso

Processor Type x86

Size 323 MB

SHA1SUM cae69e2452eb660a3b73c315c6435c99fc25976d

openfiler-2.2-x86_64-disc1.iso

Processor Type x86_64

Size 329 MB

SHA1SUM bbe345362a49db5ff7c19ac5768fc2c67f48037c

If you are downloading the above ISO file to a MS Windows machine, there are many options for burning the ISO image (ISO file) to a CD. You may already be familiar with and have the proper software to burn images to CD. If you are not familiar with this process and do not have the required software to burn images to CD, here are just two (of many) software packages that can be used:
- UltraISO
- Magic ISO Maker
Install Openfiler

This section provides a summary of the screens used to install the Openfiler software. For the purpose of this article, I opted to install Openfiler with all default options. The only manual change required was for configuring the local network settings.

Once the install has completed, the server will reboot to make sure all required components, services and drivers are started and recognized. After the reboot, any external and internal hard drives should be discovered by the Openfiler server.
For more detailed installation instructions, please visit http://www.openfiler.com/learn/. I would suggest, however, that the instructions I have provided below be used for this article.

Before installing the Openfiler software to the network storage server, you should have both NIC interfaces (cards) installed and any internal / external hard drives connected and turned on.

After downloading and burning the Openfiler ISO image (ISO file) to CD, insert the CD into the network storage server (openfiler1 in this example), power it on, and answer the installation screen prompts as noted below.
Boot Screen

The first screen is the Openfiler boot screen. At the boot: prompt, hit [Enter] to start the installation process.

Media Test

When asked to test the CD media, tab over to [Skip] and hit [Enter]. If there were any errors, the media burning software would have warned us. After several seconds, the installer should then detect the video card, monitor, and mouse. The installer then goes into GUI mode.

Welcome to Openfiler NAS/SAN Appliance

At the welcome screen, click [Next] to continue.

Keyboard Configuration

The next screen prompts you for the Keyboard settings. Make the appropriate selection for your configuration.

Disk Partitioning Setup

The next screen asks whether to perform disk partitioning using "Automatic Partitioning" or "Manual Partitioning with Disk Druid". You can choose either method here, although the official Openfiler documentation suggests to use Manual Partitioning. Since the first internal hard drive I will be using for this install is small and only going to be used to store the Openfiler software (I will not be using any space on the internal 40GB hard drive for iSCSI storage), I opted to use "Automatic Partitioning".

Select [Automatically partition] and click [Next] continue.
If there were a previous installation of Linux on this machine, the next screen will ask if you want to "remove" or "keep" old partitions. Select the option to [Remove all partitions on this system].
Important: Ensure that ONLY the hard drive you are going to use for the Openfiler software is selected for this installation (i.e. [hda]). If Openfiler detected any other internal or external disks that will be used for database storage, un-select them now. For example, in addition to the [hda] drive showing up, Openfiler also detected and selected the internal 500GB SATA hard drive (as [sda]) which I needed to "un-select".
I also keep the checkbox [Review (and modify if needed) the partitions created] selected. Click [Next] to continue.
You will then be prompted with a dialog window asking if you really want to remove all partitions. Click [Yes] to acknowledge this warning.

Partitioning

The installer will then allow you to view (and modify if needed) the disk partitions it automatically selected for /dev/hda. In almost all cases, the installer will choose 100MB for /boot, double the amount of RAM (systems with < 2GB RAM) or an amount equal to RAM (systems with > 2GB RAM) for swap, and the rest going to the root (/) partition. I like to have a minimum of 2GB for swap. For the purpose of this install, I will accept all automatically preferred sizes. (Including 2GB for swap since I have 2GB of RAM installed.)

Network Configuration

I made sure to install both NIC interfaces (cards) in the network storage server before starting the Openfiler installation. This screen should have successfully detected each of the network devices.

First, make sure that each of the network devices are checked to [Active on boot]. The installer may choose to not activate eth1 by default.
Second, [Edit] both eth0 and eth1 as follows. You may choose to use different IP addresses for both eth0 and eth1 and that is OK. You must, however, configure eth1 (the storage network) to be on the same subnet you configured for eth1 on linux1 and linux2:
eth0:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.1.195
- Netmask: 255.255.255.0
eth1:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.2.195
- Netmask: 255.255.255.0
Continue by setting your hostname manually. I used a hostname of "openfiler1". Finish this dialog off by supplying your gateway and DNS servers.

Time Zone Selection

The next screen allows you to configure your time zone information. Make the appropriate selection for your location.

Set Root Password

Select a root password and click [Next] to continue.

About to Install

This screen is basically a confirmation screen. Click [Next] to start the installation.

Congratulations

And that's it. You have successfully installed Openfiler on the network storage server. The installer will eject the CD from the CD-ROM drive. Take out the CD and click [Reboot] to reboot the system.
If everything was successful after the reboot, you should now be presented with a text login screen and the URL(s) to use for administering the Openfiler server.

Modify /etc/hosts File on Openfiler Server

Although not mandatory, I typically copy the contents of the /etc/hosts file from one of the Oracle RAC nodes to the new Openfiler server. This allows convenient name resolution when testing the network for the cluster.

10. Configure iSCSI Volumes using Openfiler

Perform the following configuration tasks on the network storage server (openfiler1)!

Openfiler administration is performed using the Openfiler Storage Control Center — a browser based tool over an https connection on port 446. For example:

https://openfiler1:446/

From the Openfiler Storage Control Center home page, login as an administrator. The default administration login credentials for Openfiler are:

Username: openfiler
Password: password

The first page the administrator sees is the [Accounts] / [Authentication] screen. Configuring user accounts and groups is not necessary for this article and will therefore not be discussed.

To use Openfiler as an iSCSI storage server, we have to perform three major tasks; set up iSCSI services, configure network access, and create physical storage.

Services

To control services, use the Openfiler Storage Control Center and navigate to [Services] / [Enable/Disable]:

Figure 6: Enable iSCSI Openfiler Service

To enable the iSCSI service, click on 'Enable' under the 'iSCSI target' service name. After that, the 'iSCSI target' status should change to 'Disable'.

The ietd program implements the user level part of iSCSI Enterprise Target software for building an iSCSI storage system on Linux. With the iSCSI target enabled, we should be able to SSH into the Openfiler server and see the iscsi-target service running:


[root@openfiler1 ~]# service iscsi-target status
ietd (pid 3784) is running...

Network Access Restriction

The next step is to configure network access in Openfiler so both Oracle RAC nodes (linux1 and linux2) have permissions to our iSCSI volumes through the storage (private) network.

iSCSI volumes will be created in the next section!

Again, this task can be completed using the Openfiler Storage Control Center by navigating to [General] / [Local Networks]. The Local Networks screen allows an administrator to setup networks and/or hosts that will be allowed to access resources exported by the Openfiler appliance. For the purpose of this article, we will want to add both Oracle RAC nodes individually rather than allowing the entire 192.168.2.0 network have access to Openfiler resources.

When entering each of the Oracle RAC nodes, note that the 'Name' field is just a logical name used for reference only. As a convention when entering nodes, I simply use the node name defined for that IP address. Next, when entering the actual node in the 'Network/Host' field, always use it's IP address even though its host name may already be defined in your /etc/hosts file or DNS. Lastly, when entering actual hosts in our Class C network, use a subnet mask of 255.255.255.255.

It is important to remember that you will be entering the IP address of the private network (eth1) for each of the RAC nodes in the cluster.

The following image shows the results of adding both Oracle RAC nodes:

Figure 7: Configure Openfiler Host Access for Oracle RAC Nodes

Physical Storage

Storage devices like internal IDE/SATA/SCSI disks, external USB or FireWire drives, or any other storage can be connected to the Openfiler server, and served to the clients. Once these devices are discovered at the OS level, Openfiler Storage Control Center can be used to set up and manage all that storage.

In this section, we will be creating the five iSCSI volumes to be used as shared storage by both of the Oracle RAC nodes in the cluster. This involves multiple steps that will be performed on the internal SATA 500GB hard drive connected to the Openfiler server.

For the purpose of this article, I have a 500GB SATA hard drive dedicated for all shared storage needs. On the Openfiler server my 500GB SATA hard drive was configured on /dev/sda (with description ATA Maxtor 6H500F0). To see this and to start the process of creating our iSCSI volumes, navigate to [Volumes] / [Physical Storage Mgmt.] from the Openfiler Storage Control Center:

Figure 8: Openfiler Physical Storage

Partitioning the Physical Disk

The first step we will perform is to create a single primary partition on the /dev/sda internal hard drive. By clicking on the /dev/sda link, we are presented with the options to 'Edit' or 'Create' a partition. Since we will be creating a single primary partition that spans the entire disk, most of the options can be left to their default setting where the only modification would be to change the 'Partition Type' from 'Extended partition' to 'Physical volume'. Here are the values I specified to create the primary partition on /dev/sda:

Mode: Primary
Partition Type: Physical volume
Starting Cylinder: 1
Ending Cylinder: 60801

The size now shows 465.76 GB. To accept that, we click on the Create button. This results in a new partition (/dev/sda1) on our internal hard drive:

Figure 9: Partition the Physical Volume

Volume Group Management

The next step is to create a Volume Group. We will be creating a single volume group named rac1 that contains the newly created primary partition.

From the Openfiler Storage Control Center, navigate to [Volumes] / [Volume Group Mgmt.]. There we would see any existing volume groups, or none as in our case. Using the Volume Group Management screen, enter the name of the new volume group (rac1), click on the checkbox in front of /dev/sda1 to select that partition, and finally click on the 'Add volume group' button. After that we are presented with the list that now shows our newly created volume group named "rac1":

Figure 10: New Volume Group Created

Logical Volumes

We can now create the five logical volumes in the newly created volume group (rac1).

From the Openfiler Storage Control Center, navigate to [Volumes] / [Create New Volume]. There we will see the newly created volume group (rac1) along with its block storage statistics. Also available at the bottom of this screen is the option to create a new volume in the selected volume group. Use this screen to create the following five logical (iSCSI) volumes. After creating each logical volume, the application will point you to the "List of Existing Volumes" screen. You will then need to click back to the "Create New Volume" tab to create the next logical volume until all five iSCSI volumes are created:

iSCSI / Logical Volumes
Volume Name	Volume Description	Required Space (MB)	Filesystem Type
crs	Oracle Clusterware	2,048	iSCSI
asm1	Oracle ASM Volume 1	118,720	iSCSI
asm2	Oracle ASM Volume 2	118,720	iSCSI
asm3	Oracle ASM Volume 3	118,720	iSCSI
asm4	Oracle ASM Volume 4	118,720	iSCSI

In effect we have created five iSCSI disks that can now be presented to iSCSI clients (linux1 and linux2) on the network. The "List of Existing Volumes" screen should look as follows:

Figure 11: New Logical (iSCSI) Volumes

Grant Access Rights to New Logical Volumes

Before an iSCSI client can have access to the newly created iSCSI volumes, it needs to be granted the appropriate permissions. Awhile back, we configured Openfiler with two hosts (the Oracle RAC nodes) that can be configured with access rights to resources. We now need to grant both of the Oracle RAC nodes access to each of the newly created iSCSI volumes.

From the Openfiler Storage Control Center, navigate to [Volumes] / [List of Existing Volumes]. This will present the screen shown in the previous section. For each of the logical volumes, click on the 'Edit' link (under the Properties column). This will bring up the 'Edit properties' screen for that volume. Scroll to the bottom of this screen; change both hosts from 'Deny' to 'Allow' and click the 'Update' button. Perform this task for all five logical volumes.

Figure 12: Grant Host Access to Logical (iSCSI) Volumes

Make iSCSI Targets Available to Clients

Every time a new logical volume is added, we need to restart the associated service on the Openfiler server. In our case we created five iSCSI logical volumes, so we have to restart the iSCSI target (iscsi-target) service. This will make the new iSCSI targets available to all clients on the network who have privileges to access them.

To restart the iSCSI target service, use the Openfiler Storage Control Center and navigate to [Services] / [Enable/Disable]. The iSCSI target service should already be enabled (several sections back). If so, disable the service then enable it again. (See Figure 6)

The same task can be achieved through an SSH session on the Openfiler server:


[root@openfiler1 ~]# service iscsi-target restart
Stopping iSCSI target service: [OK  ]
Starting iSCSI target service: [ OK  ]

11. Configure iSCSI Volumes on Oracle RAC Nodes

Configure the iSCSI initiator on both Oracle RAC nodes in the cluster! Creating partitions, however, should only be executed on one of nodes in the RAC cluster.

An iSCSI client can be any system (Linux, Unix, MS Windows, Apple Mac, etc.) for which iSCSI support (a driver) is available. In our case, the clients are two Linux servers, linux1 and linux2, running CentOS 5.1.

In this section we will be configuring the iSCSI software initiator on both of the Oracle RAC nodes. CentOS 5.1 includes the Open-iSCSI iSCSI software initiator which can be found in the iscsi-initiator-utils RPM. This is a change from previous versions of CentOS (4.x) which included the Linux iscsi-sfnet software driver developed as part of the Linux-iSCSI Project. All iSCSI management tasks like discovery and logins will use the command-line interface iscsiadm which is included with Open-iSCSI.

The iSCSI software initiator will be configured to automatically login to the network storage server (openfiler1) and discover the iSCSI volumes created in the previous section. We will then go through the steps of creating persistent local SCSI device names (i.e. /dev/iscsi/asm1) for each of the iSCSI target names discovered using udev. Having a consistent local SCSI device name and which iSCSI target it maps to is required in order to know which volume (device) is to be used for OCFS2 and which volumes belong to ASM. Before we can do any of this, however, we must first install the iSCSI initiator software!

Installing the iSCSI (initiator) Service

With CentOS 5.1, the Open-iSCSI iSCSI software initiator does not get installed by default. The software is included in the iscsi-initiator-utils package which can be found on CD #1. To determine if this package is installed (which in most cases, it will not be), perform the following on both Oracle RAC nodes:

# rpm -qa | grep iscsi-initiator-utils

If the iscsi-initiator-utils package is not installed, load CD #1 into each of the Oracle RAC nodes and perform the following:


# mount -r /dev/cdrom /media/cdrom
# cd /media/cdrom/CentOS
# rpm -Uvh iscsi-initiator-utils-6.2.0.865-0.8.el5.i386.rpm
# cd /
# eject

Configure the iSCSI (initiator) Service

After verifying that the iscsi-initiator-utils package is installed on both Oracle RAC nodes, start the iscsid service and enable it to automatically start when the system boots. We will also configure the iscsi service to automatically start which logs into iSCSI targets needed at system startup.




  # service iscsid start
Turning off network shutdown. Starting iSCSI daemon: [  OK  ]
[  OK  ]

# chkconfig iscsid on<
# chkconfig iscsi on

Now that the iSCSI service is started, use the iscsiadm command-line interface to discover all available targets on the network storage server. This should be performed on both Oracle RAC nodes to verify the configuration is functioning properly:


# iscsiadm -m discovery -t sendtargets -p openfiler1-priv
192.168.2.195:3260,1 iqn.2006-01.com.openfiler:rac1.asm1
192.168.2.195:3260,1 iqn.2006-01.com.openfiler:rac1.asm2
192.168.2.195:3260,1 iqn.2006-01.com.openfiler:rac1.asm3
192.168.2.195:3260,1 iqn.2006-01.com.openfiler:rac1.asm4
192.168.2.195:3260,1 iqn.2006-01.com.openfiler:rac1.crs

Manually Login to iSCSI Targets

At this point the iSCSI initiator service has been started and each of the Oracle RAC nodes were able to discover the available targets from the network storage server. The next step is to manually login to each of the available targets which can be done using the iscsiadm command-line interface. This needs to be run on both Oracle RAC nodes. Note that I had to specify the IP address and not the host name of the network storage server (openfiler1-priv) - I believe this is required given the discovery (above) shows the targets using the IP address.


# iscsiadm -m node -T iqn.2006-01.com.openfiler:rac1.asm1 -p 192.168.2.195 -l
# iscsiadm -m node -T iqn.2006-01.com.openfiler:rac1.asm2 -p 192.168.2.195 -l
# iscsiadm -m node -T iqn.2006-01.com.openfiler:rac1.asm3 -p 192.168.2.195 -l
# iscsiadm -m node -T iqn.2006-01.com.openfiler:rac1.asm4 -p 192.168.2.195 -l
# iscsiadm -m node -T iqn.2006-01.com.openfiler:rac1.crs -p 192.168.2.195 -l

Configure Automatic Login

The next step is to ensure the client will automatically login to each of the targets listed above when the machine is booted (or the iSCSI initiator service is started/restarted). As with the manual login process described above, perform the following on both Oracle RAC nodes:


# iscsiadm -m node -T iqn.2006-01.com.openfiler:rac1.asm1 -p 192.168.2.195 --op update -n node.startup -v automatic
# iscsiadm -m node -T iqn.2006-01.com.openfiler:rac1.asm2 -p 192.168.2.195 --op update -n node.startup -v automatic
# iscsiadm -m node -T iqn.2006-01.com.openfiler:rac1.asm3 -p 192.168.2.195 --op update -n node.startup -v automatic
# iscsiadm -m node -T iqn.2006-01.com.openfiler:rac1.asm4 -p 192.168.2.195 --op update -n node.startup -v automatic
# iscsiadm -m node -T iqn.2006-01.com.openfiler:rac1.crs -p 192.168.2.195 --op update -n node.startup -v automatic

Create Persistent Local SCSI Device Names

In this section, we will go through the steps to create persistent local SCSI device names for each of the iSCSI target names. This will be done using udev. Having a consistent local SCSI device name and which iSCSI target it maps to is required in order to know which volume (device) is to be used for OCFS2 and which volumes belong to ASM.

When either of the Oracle RAC nodes boot and the iSCSI initiator service is started, it will automatically login to each of the targets configured in a random fashion and map them to the next available local SCSI device name. For example, the target iqn.2006-01.com.openfiler:rac1.asm1 may get mapped to /dev/sda. I can actually determine the current mappings for all targets by looking at the /dev/disk/by-path directory:


# (cd /dev/disk/by-path; ls -l *openfiler* | awk '{FS=" "; print $9 " " $10 " " $11}')
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.asm1 -> ../../sda
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.asm2 -> ../../sdb
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.asm3 -> ../../sdc
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.asm4 -> ../../sdd
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.crs -> ../../sde

Using the output from the above listing, we can establish the following current mappings:

Current iSCSI Target Name to local SCSI Device Name Mappings
iSCSI Target Name	SCSI Device Name
`iqn.2006-01.com.openfiler:rac1.asm1`	`/dev/sda`
`iqn.2006-01.com.openfiler:rac1.asm2`	`/dev/sdb`
`iqn.2006-01.com.openfiler:rac1.asm3`	`/dev/sdc`
`iqn.2006-01.com.openfiler:rac1.asm4`	`/dev/sdd`
`iqn.2006-01.com.openfiler:rac1.crs`	`/dev/sde`

This mapping, however, may change every time the Oracle RAC node is rebooted. For example, after a reboot it may be determined that the iSCSI target iqn.2006-01.com.openfiler:rac1.asm1 gets mapped to the local SCSI device /dev/sdd. It is therefore impractical to rely on using the local SCSI device name given there is no way to predict the iSCSI target mappings after a reboot.

What we need is a consistent device name we can reference (i.e. /dev/iscsi/asm1) that will always point to the appropriate iSCSI target through reboots. This is where the Dynamic Device Management tool named udev comes in. udev provides a dynamic device directory using symbolic links that point to the actual device using a configurable set of rules. When udev receives a device event (for example, the client logging in to an iSCSI target), it matches its configured rules against the available device attributes provided in sysfs to identify the device. Rules that match may provide additional device information or specify a device node name and multiple symlink names and instruct udev to run additional programs (a SHELL script for example) as part of the device event handling process.

The first step is to create a new rules file. The file will be named /etc/udev/rules.d/55-openiscsi.rules and contain only a single line of name=value pairs used to receive events we are interested in. It will also define a call-out SHELL script (/etc/udev/scripts/iscsidev.sh) to handle the event.

Create the following rules file /etc/udev/rules.d/55-openiscsi.rules on both Oracle RAC nodes:

/etc/udev/rules.d/55-openiscsi.rules


# /etc/udev/rules.d/55-openiscsi.rules
KERNEL=="sd*", BUS=="scsi", PROGRAM="/etc/udev/scripts/iscsidev.sh %b",SYMLINK+="iscsi/%c/part%n"

We now need to create the UNIX SHELL script that will be called when this event is received. Let's first create a separate directory on both Oracle RAC nodes where udev scripts can be stored:

# mkdir -p /etc/udev/scripts

Next, create the UNIX shell script /etc/udev/scripts/iscsidev.sh on both Oracle RAC nodes:

/etc/udev/scripts/iscsidev.sh


#!/bin/sh

# FILE: /etc/udev/scripts/iscsidev.sh

BUS=${1}
HOST=${BUS%%:*}

[ -e /sys/class/iscsi_host ] || exit 1

file="/sys/class/iscsi_host/host${HOST}/device/session*/iscsi_session*/targetname"

target_name=$(cat ${file})

# This is not an open-scsi drive
if [ -z "${target_name}" ]; then
   exit 1
fi

# Check if QNAP drive
check_qnap_target_name=${target_name%%:*}
if [ $check_qnap_target_name = "iqn.2004-04.com.qnap" ]; then
    target_name=`echo "${target_name%.*}"`
fi

echo "${target_name##*.}"

After creating the UNIX SHELL script, change it to executable:

# chmod 755 /etc/udev/scripts/iscsidev.sh

Now that udev is configured, restart the iSCSI service on both Oracle RAC nodes:


# service iscsi stop
Logout session [sid: 1, target: iqn.2006-01.com.openfiler:rac1.asm1, portal: 192.168.2.195,3260]
Logout session [sid: 2, target: iqn.2006-01.com.openfiler:rac1.asm2, portal: 192.168.2.195,3260]
Logout session [sid: 3, target: iqn.2006-01.com.openfiler:rac1.asm3, portal: 192.168.2.195,3260]
Logout session [sid: 4, target: iqn.2006-01.com.openfiler:rac1.asm4, portal: 192.168.2.195,3260]
Logout session [sid: 5, target: iqn.2006-01.com.openfiler:rac1.crs, portal: 192.168.2.195,3260]
Stopping iSCSI daemon: /etc/init.d/iscsi: line 33:  3277 Killed                  /etc/init.d/iscsid stop


# service iscsi start
iscsid dead but pid file exists
Turning off network shutdown. Starting iSCSI daemon: [  OK  ]
[  OK  ]
Setting up iSCSI targets: Login session [iface: default, target: iqn.2006-01.com.openfiler:rac1.crs, portal: 192.168.2.195,3260]
Login session [iface: default, target: iqn.2006-01.com.openfiler:rac1.asm3, portal: 192.168.2.195,3260]
Login session [iface: default, target: iqn.2006-01.com.openfiler:rac1.asm4, portal: 192.168.2.195,3260]
Login session [iface: default, target: iqn.2006-01.com.openfiler:rac1.asm2, portal: 192.168.2.195,3260]
Login session [iface: default, target: iqn.2006-01.com.openfiler:rac1.asm1, portal: 192.168.2.195,3260]
[  OK  ]

Let's see if our hard work paid off:


# ls -l /dev/iscsi/*
/dev/iscsi/asm1:
total 0
lrwxrwxrwx 1 root root 9 Dec 12 18:25 part -> ../../sde

/dev/iscsi/asm2:
total 0
lrwxrwxrwx 1 root root 9 Dec 12 18:25 part -> ../../sdd

/dev/iscsi/asm3:
total 0
lrwxrwxrwx 1 root root 9 Dec 12 18:25 part -> ../../sdb

/dev/iscsi/asm4:
total 0
lrwxrwxrwx 1 root root 9 Dec 12 18:25 part -> ../../sdc

/dev/iscsi/crs:
total 0
lrwxrwxrwx 1 root root 9 Dec 12 18:25 part -> ../../sda

The listing above shows that udev did the job it was suppose to do! We now have a consistent set of local device names that can be used to reference the iSCSI targets. For example, we can safely assume that the device name /dev/iscsi/asm1/part will always reference the iSCSI target iqn.2006-01.com.openfiler:rac1.asm1. We now have a consistent iSCSI target name to local device name mapping which is described in the following table:

iSCSI Target Name to Local Device Name Mappings
iSCSI Target Name	Local Device Name
`iqn.2006-01.com.openfiler:rac1.asm1`	`/dev/iscsi/asm1/part`
`iqn.2006-01.com.openfiler:rac1.asm2`	`/dev/iscsi/asm2/part`
`iqn.2006-01.com.openfiler:rac1.asm3`	`/dev/iscsi/asm3/part`
`iqn.2006-01.com.openfiler:rac1.asm4`	`/dev/iscsi/asm4/part`
`iqn.2006-01.com.openfiler:rac1.crs`	`/dev/iscsi/crs/part`

Create Partitions on iSCSI Volumes

We now need to create a single primary partition on each of the iSCSI volumes that spans the entire size of the volume. As mentioned earlier in this article, I will be using Oracle's Cluster File System, Release 2 (OCFS2) to store the two files to be shared for Oracle's Clusterware software. We will then be using Automatic Storage Management (ASM) to create four ASM volumes; two for all physical database files (data/index files, online redo log files, and control files) and two for the Flash Recovery Area (RMAN backups and archived redo log files).

The following table lists the five iSCSI volumes and what file systems they will support:

Oracle Shared Drive Configuration
File System Type	iSCSI Target (short) Name	Size	Mount Point	ASM Diskgroup Name	File Types
OCFS2	crs	2 GB	/u02		Oracle Cluster Registry (OCR) File - (~250 MB) Voting Disk - (~20MB)
ASM	asm1	118 GB	ORCL:VOL1	+ORCL_DATA1	Oracle Database Files
ASM	asm2	118 GB	ORCL:VOL2	+ORCL_DATA1	Oracle Database Files
ASM	asm3	118 GB	ORCL:VOL3	+FLASH_RECOVERY_AREA	Oracle Flash Recovery Area
ASM	asm4	118 GB	ORCL:VOL4	+FLASH_RECOVERY_AREA	Oracle Flash Recovery Area
Total		474 GB

As shown in the table above, we will need to create a single Linux primary partition on each of the five iSCSI volumes. The fdisk command is used in Linux for creating (and removing) partitions. For each of the five iSCSI volumes, you can use the default values when creating the primary partition as the default action is to use the entire disk. You can safely ignore any warnings that may indicate the device does not contain a valid DOS partition (or Sun, SGI or OSF disklabel).

In this example, I will be running the fdisk command from linux1 to create a single primary partition on each iSCSI target using the local device names created by udev in the previous section:

/dev/iscsi/asm1/part
/dev/iscsi/asm2/part
/dev/iscsi/asm3/part
/dev/iscsi/asm4/part
/dev/iscsi/crs/part

Creating the single partition on each of the iSCSI volumes must only be run from one of the nodes in the Oracle RAC cluster! (i.e. linux1)


# ---------------------------------------

# fdisk /dev/iscsi/asm1/part
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-15134, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-15134, default 15134): 15134

Command (m for help): p

Disk /dev/iscsi/asm1/part: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

               Device Boot      Start         End      Blocks   Id  System
/dev/iscsi/asm1/part1               1       15134   121563823+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

# ---------------------------------------

# fdisk /dev/iscsi/asm2/part
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-15134, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-15134, default 15134): 15134

Command (m for help): p

Disk /dev/iscsi/asm2/part: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

               Device Boot      Start         End      Blocks   Id  System
/dev/iscsi/asm2/part1               1       15134   121563823+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

# ---------------------------------------

# fdisk /dev/iscsi/asm3/part
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p Partition number (1-4): 1
First cylinder (1-15134, default 1): <1
Last cylinder or +size or +sizeM or +sizeK (1-15134, default 15134): 15134

Command (m for help): p

Disk /dev/iscsi/asm3/part: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

               Device Boot      Start         End      Blocks   Id  System
/dev/iscsi/asm3/part1               1       15134   121563823+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

# ---------------------------------------

# fdisk /dev/iscsi/asm4/part
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p<
Partition number (1-4): 1
First cylinder (1-15134, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-15134, default 15134): 15134

Command (m for help): p

Disk /dev/iscsi/asm4/part: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

               Device Boot      Start         End      Blocks   Id  System
/dev/iscsi/asm4/part1               1       15134   121563823+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

# ---------------------------------------

# fdisk /dev/iscsi/crs/part
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1009, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-1009, default 1009): 1009

Command (m for help): p

Disk /dev/iscsi/crs/part: 2147 MB, 2147483648 bytes
67 heads, 62 sectors/track, 1009 cylinders
Units = cylinders of 4154 * 512 = 2126848 bytes

              Device Boot      Start         End      Blocks   Id  System
/dev/iscsi/crs/part1               1        1009     2095662   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Verify New Partitions

After creating all required partitions from linux1, you should now inform the kernel of the partition changes using the following command as the "root" user account from all remaining nodes in the Oracle RAC cluster (linux2). Note that the mapping of iSCSI target names discovered from Openfiler and the local SCSI device name will be different on both Oracle RAC nodes. This is not a concern and will not cause any problems since we will not be using the local SCSI device names but rather the local device names created by udev in the previous section.

From linux2, run the following commands:


# partprobe

# fdisk -l

Disk /dev/hda: 40.0 GB, 40000000000 bytes
255 heads, 63 sectors/track, 4863 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          13      104391   83  Linux
/dev/hda2              14        4863    38957625   8e  Linux LVM

Disk /dev/sda: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1       15134   121563823+  83  Linux

Disk /dev/sdb: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       15134   121563823+  83  Linux

Disk /dev/sdc: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1       15134   121563823+  83  Linux

Disk /dev/sdd: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       15134   121563823+  83  Linux

Disk /dev/sde: 2147 MB, 2147483648 bytes
67 heads, 62 sectors/track, 1009 cylinders
Units = cylinders of 4154 * 512 = 2126848 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1        1009     2095662   83  Linux

As a final step you should run the following command on both Oracle RAC nodes to verify that udev created the new symbolic links for each new partition:


# (cd /dev/disk/by-path; ls -l *openfiler* | awk '{FS=" "; print $9 " " $10 " " $11}')
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.asm1 -> ../../sde
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.asm1-part1 -> ../../sde1
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.asm2 -> ../../sdd
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.asm2-part1 -> ../../sdd1
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.asm3 -> ../../sdb
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.asm3-part1 -> ../../sdb1
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.asm4 -> ../../sdc
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.asm4-part1 -> ../../sdc1
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.crs -> ../../sda
ip-192.168.2.195:3260-iscsi-iqn.2006-01.com.openfiler:rac1.crs-part1 -> ../../sda1

The listing above shows that udev did indeed create new device names for each of the new partitions. We will be using these new device names when configuring the volumes for OCFS2 and ASMlib later in this guide:

/dev/iscsi/asm1/part1
/dev/iscsi/asm2/part1
/dev/iscsi/asm3/part1
/dev/iscsi/asm4/part1
/dev/iscsi/crs/part1

Processor Type	`x86`
Size	`323 MB`
SHA1SUM	`cae69e2452eb660a3b73c315c6435c99fc25976d`

Processor Type	`x86_64`
Size	`329 MB`
SHA1SUM	`bbe345362a49db5ff7c19ac5768fc2c67f48037c`

Build Your Own Oracle RAC Cluster on Oracle Enterprise Linux and iSCSI

1. Overview

2. Oracle RAC 11g Overview

3. Shared-Storage Overview

4. iSCSI Technology

5. Hardware and Costs

6. Install the Linux Operating System

7. Install Required Linux Packages for Oracle RAC

8. Network Configuration

9. Install Openfiler

10. Configure iSCSI Volumes using Openfiler

11. Configure iSCSI Volumes on Oracle RAC Nodes