DBA: Linux

  DOWNLOAD
Oracle Database 10g
  TAGS
linux, rac, clustering All
Build Your Own Oracle RAC Cluster on Oracle Enterprise Linux and iSCSI

by Jeffrey Hunter

Learn how to set up and configure an Oracle RAC 10g Release 2 development cluster on Oracle Enterprise Linux for less than US$2,700.

The information in this guide is not validated by Oracle, is not supported by Oracle, and should only be used at your own risk; it is for educational purposes only.

Updated September 2008

Contents

  1. Introduction
  2. Oracle RAC 10g Overview
  3. Shared-Storage Overview
  4. iSCSI Technology
  5. Hardware & Costs
  6. Install the Linux Operating System
  7. Network Configuration
  8. Install Openfiler
  9. Configure iSCSI Volumes using Openfiler
  10. Configure iSCSI Volumes on Oracle RAC Nodes
  11. Create "oracle" User and Directories
  12. Configure the Linux Servers for Oracle
  13. Configure the hangcheck-timer Kernel Module
  14. Configure RAC Nodes for Remote Access
  15. All Startup Commands for Both Oracle RAC Nodes
  16. Install & Configure Oracle Cluster File System (OCFS2)
  17. Install & Configure Automatic Storage Management (ASMLib 2.0)
  18. Download Oracle RAC 10g Software
  19. Pre-Installation Tasks for Oracle Database 10g Release 2
  20. Install Oracle 10g Clusterware Software
  21. Install Oracle Database 10g Software
  22. Install Oracle Database 10g Companion CD Software
  23. Create TNS Listener Process
  24. Create the Oracle Cluster Database
  25. Post-Installation Tasks - (Optional)
  26. Verify TNS Networking Files
  27. Create / Alter Tablespaces
  28. Verify the RAC Cluster & Database Configuration
  29. Starting / Stopping the Cluster
  30. Transparent Application Failover - (TAF)
  31. Troubleshooting
  32. Conclusion
  33. Acknowledgements

Downloads for this guide:
Oracle Enterprise Linux Release 4 Update 5(Available for x86 and x86_64)
Oracle Database 10g Release 2 EE, Clusterware, Companion CD - (10.2.0.1.0)
Openfiler 2.2 (respin 2)(openfiler-2.2-x86-disc1.iso -OR- openfiler-2.2-x86_64-disc1.iso
ASMLib 2.0 Library - (2.0.2-1)oracleasmlib-2.0.2-1.i386.rpm
Support files

 


See the Oracle RAC 11g Release 1 version of this guide here

1. Introduction

One of the most efficient ways to become familiar with Oracle Real Application Clusters (RAC) 10g technology is to have access to an actual Oracle RAC 10g cluster. There's no better way to understand its benefits—including fault tolerance, security, load balancing, and scalability—than to experience them directly.

Unfortunately, for many shops, the price of the hardware required for a typical production RAC configuration makes this goal impossible. A small two-node cluster can cost from US$10,000 to well over US$20,000. That cost would not even include the heart of a production RAC environment—typically a storage area network—which can start at US$10,000.

For those who want to become familiar with Oracle RAC 10g without a major cash outlay, this guide provides a low-cost alternative to configuring an Oracle RAC 10g Release 2 system using commercial off-the-shelf components and downloadable software at an estimated cost of US$2,200 to US$2,700. The system will consist of a dual node cluster (each with a single processor), both running Oracle's Enterprise Linux (Release 4 Update 5), Oracle10g Release 2, OCFS2, and ASMLib 2.0. All shared disk storage for Oracle RAC will be based on iSCSI using a Network Storage Server; namely Openfiler Release 2.2.

Although this article should work with Red Hat Enterprise Linux, Oracle's Enterprise Linux (available for free) will provide the same if not better stability and will already include the OCFS2 and ASMLib software packages (with the exception of the ASMLib userspace libraries which is a separate download).

Powered by rPath Linux, Openfiler is a free browser-based network storage management utility that delivers file-based Network Attached Storage (NAS) and block-based Storage Area Networking (SAN) in a single framework. Openfiler supports CIFS, NFS, HTTP/DAV, FTP, however, we will only be making use of its iSCSI capabilities to implement an inexpensive SAN for the shared storage components required by Oracle RAC 10g. A 500GB external hard drive will be connected to the network storage server (sometimes referred to in this article as the Openfiler server) via its USB 2.0 interface. The Openfiler server will be configured to use this disk for iSCSI based storage and will be used in our Oracle RAC 10g configuration to store the shared files required by Oracle Clusterware as well as all Oracle ASM volumes.

Note: This article is provided for educational purposes only, so the setup is kept simple to demonstrate ideas and concepts. For example, the disk mirroring configured in this article will be setup on one physical disk only, while in practice that should be performed on multiple physical drives. Also note that while this article provides detailed instructions for successfully installing a complete Oracle RAC 10g system, it is by no means a substitute for the official Oracle documentation. In addition to this article, users should also consult the following Oracle documents to gain a full understanding of alternative configuration options, installation, and administration with Oracle RAC 10g. Oracle's official documentation site is docs.oracle.com.

This is not the only way to build a low-cost Oracle RAC 10g system. I have worked on other solutions that utilize an implementation based on SCSI for the shared storage component. In some cases, SCSI will cost more than the implementation described in this article where an inexpensive SCSI configuration will consist of:

  • SCSI Controller:Two SCSI controllers priced from $20 (Adaptec AHA-2940UW) to $220 (Adaptec 39320A-R) each
  • SCSI Enclosure: $70 - (Inclose 1 Bay 3.5" U320 SCSI Case)
  • SCSI Hard Drive: $140 - (36GB 15K 68p U320 SCSI Hard Drive)
  • SCSI Cables: Two SCSI cables priced at $20 each - (3ft External HD68 to HD68 U320 Cable)

Keep in mind that some motherboards may already include built-in SCSI controllers.

The previous Oracle9i and Oracle 10g Release 1 guides used raw partitions for storing files on shared storage, but here we will make use of the Oracle Cluster File System Release 2 (OCFS2) and Oracle Automatic Storage Management (ASM) feature. The two Oracle RAC nodes will be configured as follows:

Oracle Database Files
RAC Node Name Instance Name Database Name $ORACLE_BASE File System / Volume Manager for DB Files
linux1 orcl1 orcl /u01/app/oracle ASM
linux2 orcl2 orcl /u01/app/oracle ASM
Oracle Clusterware Shared Files
File Type File Name iSCSI Volume Name Mount Point File System
Oracle Cluster Registry /u02/oradata/orcl/OCRFile crs /u02/oradata/orcl OCFS2
CRS Voting Disk /u02/oradata/orcl/CSSFile crs /u02/oradata/orcl OCFS2

Note that with Oracle Database 10g Release 2 (10.2), Cluster Ready Services, or CRS, is now called Oracle Clusterware.

The Oracle Clusterware software will be installed to /u01/app/crs on both of the nodes that make up the RAC cluster. Starting with Oracle Database 10g Release 2 (10.2), Oracle Clusterware should be installed in a separate Oracle Clusterware home directory which is non-release specific (/u01/app/oracle/product/10.2.0/... for example) and must never be a subdirectory of the ORACLE_BASE directory (/u01/app/oracle for example). This is a change to the Optimal Flexible Architecture (OFA) rules. Note that the Oracle Clusterware and Oracle Real Application Clusters installation documentation from Oracle incorrectly state that the Oracle Clusterware home directory can be a subdirectory of the ORACLE_BASE directory. For example, in Chapter 2, "Preinstallation", in the section "Oracle Clusterware home directory", it incorrectly lists the path /u01/app/oracle/product/crs as a possible Oracle Clusterware home (or CRS home) path. This is incorrect. The default ORACLE_BASE path is /u01/app/oracle, and the Oracle Clusterware home must never be a subdirectory of the ORACLE_BASE directory. This issue is tracked with Oracle documentation bug "5843155" - (B14203-08 HAS CONFLICTING CRS_HOME LOCATIONS ) and is fixed in Oracle 11g.

The Oracle Clusterware software will be installed to /u01/app/crs on both of the nodes that make up the RAC cluster, however, the Clusterware software requires that two of its files, the "Oracle Cluster Registry (OCR)" file and the "Voting Disk" file be shared with both nodes in the cluster. These two files will be installed on shared storage using Oracle's Cluster File System, Release 2 (OCFS2). It is also possible to use RAW devices for these files, however, it is not possible to use ASM for these two shared Clusterware files.

The Oracle10g Release 2 Database software will be installed into a separate Oracle Home; namely /u01/app/oracle/product/10.2.0/db_1 on both of the nodes that make up the RAC cluster. All of the Oracle physical database files (data, online redo logs, control files, archived redo logs) will be installed to shared volumes being managed by Automatic Storage Management (ASM). (The Oracle database files can just as easily be stored on OCFS2. Using ASM, however, makes the article that much more interesting!)

Note: This article is only designed to work as documented with absolutely no substitutions. The only exception here is the choice of vendor hardware (i.e. machines, networking equipment, and internal / external hard drives). Ensure that the hardware you purchase from the vendor is supported on Oracle Enterprise Linux (Release 4 Update 5). I tend to stick with Dell hardware given their superb quality and compatibility with Linux. For a test system of this nature, I highly recommend purchasing pre-owned or refurbished Dell hardware from a reputable company like Stallard Technologies, Inc.. Stallard Technologies has a proven track record of delivering the best value on pre-owned hardware combined with a commitment to superior customer service. I base my recommendation on my own outstanding personal experience with their organization. To learn more about Stallard Technologies, visit their website or contact John Brauer.

If you are looking for an example that takes advantage of Oracle RAC 11g Release 1 with OEL using iSCSI, click here.

If you are looking for an example that takes advantage of Oracle RAC 10g Release 2 with RHEL 4 using FireWire, click here.

 


2. Oracle RAC 10g Overview

Oracle RAC, introduced with Oracle9i, is the successor to Oracle Parallel Server (OPS). Oracle RAC allows multiple instances to access the same database (storage) simultaneously. RAC provides fault tolerance, load balancing, and performance benefits by allowing the system to scale out, and at the same time since all nodes access the same database, the failure of one instance will not cause the loss of access to the database.

At the heart of Oracle10g RAC is a shared disk subsystem. All nodes in the cluster must be able to access all of the data, redo log files, control files and parameter files for all nodes in the cluster. The data disks must be globally available in order to allow all nodes to access the database. Each node has its own redo log file(s) and UNDO tablespace, but the other nodes must be able to access them (and the shared control file) in order to recover that node in the event of a system failure.

The biggest difference between Oracle RAC and OPS is the addition of Cache Fusion. With OPS a request for data from one node to another required the data to be written to disk first, then the requesting node can read that data. With cache fusion, data is passed along a high-speed interconnect using a sophisticated locking algorithm.

Not all clustering solutions use shared storage. Some vendors use an approach known as a Federated Cluster, in which data is spread across several machines rather than shared by all. With Oracle10g RAC, however, multiple nodes use the same set of disks for storing data. With Oracle10g RAC, the data files, redo log files, control files, and archived log files reside on shared storage on raw-disk devices, a NAS, ASM, or on a clustered file system. Oracle's approach to clustering leverages the collective processing power of all the nodes in the cluster and at the same time provides failover security.

Pre-configured Oracle10g RAC solutions are available from vendors such as Dell, IBM and HP for production environments. This article, however, focuses on putting together your own Oracle10g RAC environment for development and testing by using Linux servers and a low cost shared disk solution; iSCSI.

For more background about Oracle RAC, visit the Oracle RAC Product Center on OTN.

 


3. Shared-Storage Overview

Today, fibre channel is one of the most popular solutions for shared storage. As mentioned earlier, fibre channel is a high-speed serial-transfer interface that is used to connect systems and storage devices in either point-to-point (FC-P2P), arbitrated loop (FC-AL), or switched topologies (FC-SW). Protocols supported by Fibre Channel include SCSI and IP. Fibre channel configurations can support as many as 127 nodes and have a throughput of up to 2.12 gigabits per second in each direction, and 4.25 Gbps is expected.

Fibre channel, however, is very expensive. Just the fibre channel switch alone can start at around US$1,000. This does not even include the fibre channel storage array and high-end drives, which can reach prices of about US$300 for a 36GB drive. A typical fibre channel setup which includes fibre channel cards for the servers is roughly US$10,000, which does not include the cost of the servers that make up the cluster.

A less expensive alternative to fibre channel is SCSI. SCSI technology provides acceptable performance for shared storage, but for administrators and developers who are used to GPL-based Linux prices, even SCSI can come in over budget, at around US$2,000 to US$5,000 for a two-node cluster.

Another popular solution is the Sun NFS (Network File System) found on a NAS. It can be used for shared storage but only if you are using a network appliance or something similar. Specifically, you need servers that guarantee direct I/O over NFS, TCP as the transport protocol, and read/write block sizes of 32K.

The shared storage that will be used for this article is based on iSCSI technology using a network storage server installed with Openfiler. This solution offers a low-cost alternative to fibre channel for testing and educational purposes, but given the low-end hardware being used, it should not be used in a production environment.

 


4. iSCSI Technology

For many years, the only technology that existed for building a network based storage solution was a Fibre Channel Storage Area Network (FC SAN). Based on an earlier set of ANSI protocols called Fiber Distributed Data Interface (FDDI), Fibre Channel was developed to move SCSI commands over a storage network.

Several of the advantages to FC SAN include greater performance, increased disk utilization, improved availability, better scalability, and most important to us — support for server clustering! Still today, however, FC SANs suffer from three major disadvantages. The first is price. While the costs involved in building a FC SAN have come down in recent years, the cost of entry still remains prohibitive for small companies with limited IT budgets. The second is incompatible hardware components. Since its adoption, many product manufacturers have interpreted the Fibre Channel specifications differently from each other which has resulted in scores of interconnect problems. When purchasing Fibre Channel components from a common manufacturer, this is usually not a problem. The third disadvantage is the fact that a Fibre Channel network is not Ethernet! It requires a separate network technology along with a second set of skill sets that need to exist with the datacenter staff.

With the popularity of Gigabit Ethernet and the demand for lower cost, Fibre Channel has recently been given a run for its money by iSCSI-based storage systems. Today, iSCSI SANs remain the leading competitor to FC SANs.

Ratified on February 11, 2003 by the Internet Engineering Task Force (IETF), the Internet Small Computer System Interface, better known as iSCSI, is an Internet Protocol (IP)-based storage networking standard for establishing and managing connections between IP-based storage devices, hosts, and clients. iSCSI is a data transport protocol defined in the SCSI-3 specifications framework and is similar to Fibre Channel in that it is responsible for carrying block-level data over a storage network. Block-level communication means that data is transferred between the host and the client in chunks called blocks. Database servers depend on this type of communication (as opposed to the file level communication used by most NAS systems) in order to work properly. Like a FC SAN, an iSCSI SAN should be a separate physical network devoted entirely to storage, however, its components can be much the same as in a typical IP network (LAN).

While iSCSI has a promising future, many of its early critics were quick to point out some of its inherent shortcomings with regards to performance. The beauty of iSCSI is its ability to utilize an already familiar IP network as its transport mechanism. The TCP/IP protocol, however, is very complex and CPU intensive. With iSCSI, most of the processing of the data (both TCP and iSCSI) is handled in software and is much slower than Fibre Channel which is handled completely in hardware. The overhead incurred in mapping every SCSI command onto an equivalent iSCSI transaction is excessive. For many the solution is to do away with iSCSI software initiators and invest in specialized cards that can offload TCP/IP and iSCSI processing from a server's CPU. These specialized cards are sometimes referred to as an iSCSI Host Bus Adaptor (HBA) or a TCP Offload Engine (TOE) card. Also consider that 10-Gigabit Ethernet is a reality today!

As with any new technology, iSCSI comes with its own set of acronyms and terminology. For the purpose of this article, it is only important to understand the difference between an iSCSI initiator and an iSCSI target.

iSCSI Initiator

Basically, an iSCSI initiator is a client device that connects and initiates requests to some service offered by a server (in this case an iSCSI target). The iSCSI initiator software will need to exist on each of the Oracle RAC nodes (linux1 and linux2).

An iSCSI initiator can be implemented using either software or hardware. Software iSCSI initiators are available for most major operating system platforms. For this article, we will be using the free Linux iscsi-sfnet software driver found in the iscsi-initiator-utils RPM — developed as part of the Linux-iSCSI Project. The iSCSI software initiator is generally used with a standard network interface card (NIC) — a Gigabit Ethernet card in most cases. A hardware initiator is an iSCSI HBA (or a TCP Offload Engine (TOE) card), which is basically just a specialized Ethernet card with a SCSI ASIC on-board to offload all the work (TCP and SCSI commands) from the system CPU. iSCSI HBAs are available from a number of vendors, including Adaptec, Alacritech, Intel, and QLogic.

iSCSI Target

An iSCSI target is the "server" component of an iSCSI network. This is typically the storage device that contains the information you want and answers requests from the initiator(s). For the purpose of this article, the node openfiler1 will be the iSCSI target.

So with all of this talk about iSCSI, does this mean the death of Fibre Channel anytime soon? Probably not. Fibre Channel has clearly demonstrated its capabilities over the years with its capacity for extremely high speeds, flexibility, and robust reliability. Customers who have strict requirements for high performance storage, large complex connectivity, and mission critical reliability will undoubtedly continue to choose Fibre Channel.

Before closing out this section, I thought it would be appropriate to present the following chart that shows speed comparisons of the various types of disk interfaces and network technologies. For each interface, I provide the maximum transfer rates in kilobits (kb), kilobytes (KB), megabits (Mb), megabytes (MB), gigabits (Gb), and gigabytes (GB) per second with some of the more common ones highlighted in grey.

Disk Interface / Network / BUS Speed
Kb KB Mb MB Gb GB
Serial 115 14.375 0.115 0.014    
Parallel (standard) 920 115 0.92 0.115    
10Base-T Ethernet     10 1.25    
IEEE 802.11b wireless Wi-Fi (2.4 GHz band)     11 1.375    
USB 1.1     12 1.5    
Parallel (ECP/EPP)     24 3    
SCSI-1     40 5    
IEEE 802.11g wireless WLAN (2.4 GHz band)     54 6.75    
SCSI-2 (Fast SCSI / Fast Narrow SCSI)     80 10    
100Base-T Ethernet (Fast Ethernet)     100 12.5    
ATA/100 (parallel)     100 12.5    
IDE     133.6 16.7    
Fast Wide SCSI (Wide SCSI)     160 20    
Ultra SCSI (SCSI-3 / Fast-20 / Ultra Narrow)     160 20    
Ultra IDE     264 33    
Wide Ultra SCSI (Fast Wide 20)     320 40    
Ultra2 SCSI     320 40    
FireWire 400 - (IEEE1394a)     400 50    
USB 2.0     480 60    
Wide Ultra2 SCSI     640 80    
Ultra3 SCSI     640 80    
FireWire 800 - (IEEE1394b)     800 100    
Gigabit Ethernet     1000 125 1  
PCI - (33 MHz / 32-bit)     1064 133 1.064  
Serial ATA I - (SATA I)     1200 150 1.2  
Wide Ultra3 SCSI     1280 160 1.28  
Ultra160 SCSI     1280 160 1.28  
PCI - (33 MHz / 64-bit)     2128 266 2.128  
PCI - (66 MHz / 32-bit)     2128 266 2.128  
AGP 1x - (66 MHz / 32-bit)     2128 266 2.128  
Serial ATA II - (SATA II)     2400 300 2.4  
Ultra320 SCSI     2560 320 2.56  
FC-AL Fibre Channel     3200 400 3.2  
PCI-Express x1 - (bidirectional)     4000 500 4  
PCI - (66 MHz / 64-bit)     4256 532 4.256  
AGP 2x - (133 MHz / 32-bit)     4264 533 4.264  
Serial ATA III - (SATA III)     4800 600 4.8  
PCI-X - (100 MHz / 64-bit)     6400 800 6.4  
PCI-X - (133 MHz / 64-bit)       1064 8.512 1
AGP 4x - (266 MHz / 32-bit)       1066 8.528 1
10G Ethernet - (IEEE 802.3ae)       1250 10 1.25
PCI-Express x4 - (bidirectional)       2000 16 2
AGP 8x - (533 MHz / 32-bit)       2133 17.064 2.1
PCI-Express x8 - (bidirectional)       4000 32 4
PCI-Express x16 - (bidirectional)       8000 64 8

 


5. Hardware & Costs

The hardware used to build our example Oracle RAC 10g environment consists of three Linux servers (two Oracle RAC nodes and one Network Storage Server) and components that can be purchased at many local computer stores or over the Internet (i.e. Stallard Technologies, Inc.).

Oracle RAC Node 1 - (linux1)
Dimension 2400 Series
  • Intel(R) Pentium(R) 4 Processor at 2.80GHz
  • 1GB DDR SDRAM (at 333MHz)
  • 40GB 7200 RPM Internal Hard Drive
  • Integrated Intel 3D AGP Graphics
  • Integrated 10/100 Ethernet - (Broadcom BCM4401)
  • CDROM (48X Max Variable)
  • 3.5" Floppy
  • No Keyboard, Monitor, or Mouse - (Connected to KVM Switch)
  • US$620
    1 - Ethernet LAN Card

    Used for RAC interconnect to linux2 and Openfiler networked storage.

    Each Linux server for Oracle RAC should contain two NIC adapters. The Dell Dimension includes an integrated 10/100 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private network (RAC interconnect and Openfiler networked storage). Select the appropriate NIC adapter that is compatible with the maximum data transmission speed of the network switch to be used for the private network. For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards) for the private network.

         Gigabit Ethernet

         10/100 Ethernet
    US$35
    Oracle RAC Node 2 - (linux2)
    Dimension 2400 Series
  • Intel(R) Pentium(R) 4 Processor at 2.80GHz
  • 1GB DDR SDRAM (at 333MHz)
  • 40GB 7200 RPM Internal Hard Drive
  • Integrated Intel 3D AGP Graphics
  • Integrated 10/100 Ethernet - (Broadcom BCM4401)
  • CDROM (48X Max Variable)
  • 3.5" Floppy
  • No Keyboard, Monitor, or Mouse - (Connected to KVM Switch)
  • US$620
    1 - Ethernet LAN Card

    Used for RAC interconnect to linux1 and Openfiler networked storage.

    Each Linux server for Oracle RAC should contain two NIC adapters. The Dell Dimension includes an integrated 10/100 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private network (RAC interconnect and Openfiler networked storage). Select the appropriate NIC adapter that is compatible with the maximum data transmission speed of the network switch to be used for the private network. For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards) for the private network.

         Gigabit Ethernet

         10/100 Ethernet
    US$35
    Network Storage Server - (openfiler1)
    Dell PowerEdge 1800
  • Dual 3.0GHz Xeon / 1MB Cache / 800FSB (SL7PE)
  • 2GB of ECC Memory
  • 40GB IDE Hard Drive
  • Single embedded Intel 10/100/1000 Gigabit NIC
  • 4 x Integrated USB 2.0 Ports
  • No Keyboard, Monitor, or Mouse - (Connected to KVM Switch)
  • US$650
    1 - Ethernet LAN Card

    Used for networked storage on the private network.

    The Network Storage Server (Openfiler server) should contain two NIC adapters. The Dell PowerEdge 1800 machine includes an integrated 10/100/1000 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private network (Openfiler networked storage). Select the appropriate NIC adapter that is compatible with the maximum data transmission speed of the network switch to be used for the private network. For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards) for the private network.

         Gigabit Ethernet

         10/100 Ethernet
    US$35
    Miscellaneous Components
    Storage Device(s) - External Hard Drive

    For the database storage I used a single external LaCie d2 Hard Drive Extreme with Triple Interface (500GB) drive which was connected to the Openfiler server via its USB 2.0 interface. The Openfiler server will be configured to use this disk for iSCSI based storage and will be used in our Oracle RAC 10g configuration to store the shared files required by Oracle Clusterware as well as all Oracle ASM volumes.

    Note: Since the writing of this article, LaCie has discontinued the 500GB version of this external hard drive and only the 250GB and 320GB capacities exist. Please be aware that any type of hard disk (internal or external) should work for database storage as long as it can be recognized by the network storage server (Openfiler) and has adequate space.

    US$260
    1 - Ethernet Switch

    Used for the interconnect between linux1-priv and linux2-priv. This switch will also be used for network storage traffic for Openfiler. For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards) for the private network.

         Gigabit Ethernet

         10/100 Ethernet
    US$50
    6 - Network Cables US$5
    US$5
    US$5
    US$5
    US$5
    US$5
    Optional Components
    KVM Switch

    This article requires access to the console of all nodes (servers) in order to install the operating system and perform several of the configuration tasks. When managing a very small number of servers, it might make sense to connect each server with its own monitor, keyboard, and mouse in order to access its console. However, as the number of servers to manage increases, this solution becomes unfeasible. A more practical solution would be to configure a dedicated computer which would include a single monitor, keyboard, and mouse that would have direct access to the console of each server. This solution is made possible using a Keyboard, Video, Mouse Switch —better known as a KVM Switch. A KVM switch is a hardware device that allows a user to control multiple computers from a single keyboard, video monitor and mouse. Avocent provides a high quality and economical 4-port switch which includes four 6' cables:

    For a detailed explanation and guide on the use and KVM switches, please see the article "KVM Switches For the Home and the Enterprise".

    US$340
    Total    US$2,675  

    We are about to start the installation process. Now that we have talked about the hardware that will be used in this example, let's take a conceptual look at what the environment would look like (click on the graphic below to view larger image):



    Figure 1 Architecture

    As we start to go into the details of the installation, it should be noted that most of the tasks within this document will need to be performed on both Oracle RAC nodes (linux1 and linux2). I will indicate at the beginning of each section whether or not the task(s) should be performed on both Oracle RAC nodes or on the network storage server (openfiler1).

     


    6. Install the Linux Operating System

    Perform the following installation on both Oracle RAC nodes in the cluster!

    This section provides a summary of the screens used to install the Linux operating system. This guide is designed to work with Oracle's Enterprise Linux Release 4 Update 5.

    For more detailed installation instructions, it is possible to use the manuals from Red Hat Linux. I would suggest, however, that the instructions I have provided below be used for this configuration.

    Before installing the Enterprise Linux operating system on both Oracle RAC nodes, you should have the two NIC interfaces (cards) installed.

    Download the following ISO images for Enterprise Linux Release 4 Update 5:

    Oracle E-Delivery Web site for Enterprise Linux
    • V10378-01_1of4.zip   (572 MB)
    • V10378-01_2of4.zip   (619 MB)
    • V10378-01_3of4.zip   (621 MB)
    • V10378-01_4of4.zip   (269 MB)

    After downloading the Enterprise Linux software, unzip each of the files. You will then have the following ISO images which will need to be burned to CDs:

    • Enterprise-R4-U5-i386-disc1.iso
    • Enterprise-R4-U5-i386-disc2.iso
    • Enterprise-R4-U5-i386-disc3.iso
    • Enterprise-R4-U5-i386-disc4.iso

    If you are downloading the above ISO files to a MS Windows machine, there are many options for burning these images (ISO files) to a CD. You may already be familiar with and have the proper software to burn images to CD. If you are not familiar with this process and do not have the required software to burn images to CD, here are just two (of many) software packages that can be used:

    After downloading and burning the Enterprise Linux images (ISO files) to CD, insert Enterprise Linux Disk #1 into the first server (linux1 in this example), power it on, and answer the installation screen prompts as noted below. After completing the Linux installation on the first node, perform the same Linux installation on the second node while substituting the node name linux1 for linux2 and the different IP addresses where appropriate.

    Boot Screen
    The first screen is the Enterprise Linux boot screen. At the boot: prompt, hit [Enter] to start the installation process.

    Media Test
    When asked to test the CD media, tab over to [Skip] and hit [Enter]. If there were any errors, the media burning software would have warned us. After several seconds, the installer should then detect the video card, monitor, and mouse. The installer then goes into GUI mode.

    Welcome to Enterprise Linux
    At the welcome screen, click [Next] to continue.

    Language / Keyboard Selection
    The next two screens prompt you for the Language and Keyboard settings. Make the appropriate selections for your configuration.

    Detect Previous Installation
    Note that if the installer detects a previous version of Enterprise Linux, it will ask if you would like to "Install Enterprise Linux" or "Upgrade an existing Installation". Always select to "Install Enterprise Linux".

    Disk Partitioning Setup
    Select [Automatically partition] and click [Next] continue.

    If there were a previous installation of Linux on this machine, the next screen will ask if you want to "remove" or "keep" old partitions. Select the option to [Remove all partitions on this system]. Also, ensure that the correct hard drive ("hda" for my configuration) is selected for the Linux installation. I also keep the checkbox [Review (and modify if needed) the partitions created] selected. Click [Next] to continue.

    You will then be prompted with a dialog window asking if you really want to remove all partitions. Click [Yes] to acknowledge this warning.

    Partitioning
    The installer will then allow you to view (and modify if needed) the disk partitions it automatically selected. In almost all cases, the installer will choose 100MB for /boot, double the amount of RAM for swap, and the rest going to the root (/) partition. I like to have a minimum of 1GB for swap. For the purpose of this install, I will accept all automatically preferred sizes. (Including 2GB for swap since I have 1GB of RAM installed.)

    Starting with EL 4, the installer will create the same disk configuration as just noted but will create them using the Logical Volume Manager (LVM). For example, it will partition the first hard drive (/dev/hda for my configuration) into two partitions—one for the /boot partition (/dev/hda1) and the remainder of the disk dedicate to a LVM named VolGroup00 (/dev/hda2). The LVM Volume Group (VolGroup00) is then partitioned into two LVM partitions - one for the root filesystem (/) and another for swap. I basically check that it created at least 1GB of swap. Since I have 1GB of RAM installed, the installer created 2GB of swap. Saying that, I just accept the default disk layout.

    Boot Loader Configuration
    The installer will use the GRUB boot loader by default. To use the GRUB boot loader, accept all default values and click [Next] to continue.

    Network Configuration
    I made sure to install both NIC interfaces (cards) in each of the Linux machines before starting the operating system installation. This screen should have successfully detected each of the network devices.

    First, make sure that each of the network devices are checked to [Active on boot]. The installer may choose to not activate eth1.

    Second, [Edit] both eth0 and eth1 as follows. You may choose to use different IP addresses for both eth0 and eth1 and that is OK. If possible, try to put eth1 (the interconnect) on a different subnet than eth0 (the public network):

    eth0:
    - Check off the option to [Configure using DHCP]
    - Leave the [Activate on boot] checked
    - IP Address: 192.168.1.100
    - Netmask: 255.255.255.0

    eth1:
    - Check off the option to [Configure using DHCP]
    - Leave the [Activate on boot] checked
    - IP Address: 192.168.2.100
    - Netmask: 255.255.255.0

    Continue by setting your hostname manually. I used "linux1" for the first node and "linux2" for the second one. Finish this dialog off by supplying your gateway and DNS servers.

    Firewall
    On this screen, make sure to select [No firewall]

    Also, under the option to "Enable SELinux?", select [Disabled].

    Click [Next] to continue.

    You may be prompted with a warning dialog about not setting the firewall. If this occurs, simply hit [Proceed] to continue.

    Additional Language Support/Time Zone
    The next two screens allow you to select additional language support and time zone information. Make the appropriate selection for your configuration.

    Set Root Password
    Select a root password and click [Next] to continue.

    Package Installation Defaults
    By default, Enterprise Linux installs most of the software required for a typical server. There are several other packages, however, that are required to successfully install the Oracle Database software. For the purpose of this article, select the radio button [Customize software packages to be installed].

    Package Group Selection
    Scroll down to the bottom of this screen and select [Everything] under the "Miscellaneous" section. Click [Next] to continue.

    Please note that the installation of Oracle does not require all Linux packages to be installed. My decision to install all packages was for the sake of brevity. Please see section Section 19 ("Pre-Installation Tasks for Oracle10g Release 2") for a more detailed look at the critical packages required for a successful Oracle installation.

    Also note that with some Oracle Enterprise Linux 4 distributions, you will not get the "Package Group Selection" screen by default. There, you are asked to simply "Install default software packages" or "Customize software packages to be installed". Select the option to "Customize software packages to be installed" and click [Next] to continue. This will then bring up the "Package Group Selection" screen. Now, scroll down to the bottom of this screen and select [Everything] under the "Miscellaneous" section. Click [Next] to continue.

    About to Install
    This screen is basically a confirmation screen. Click [Continue] to start the installation. During the installation process, you will be asked to switch disks to Disk #2, Disk #3, and then Disk #4.

    Congratulations
    And that's it. You have successfully installed Enterprise Linux on the first node (linux1). The installer will eject the CD from the CD-ROM drive. Take out the CD and click [Reboot] to reboot the system.

    When the system boots into Enterprise Linux for the first time, it will prompt you with another Welcome screen. The following wizard allows you to configure the date and time, add any additional users, test the sound card, and to install any additional CDs. The only screen I care about is the time and date. As for the others, simply run through them as there is nothing additional that needs to be installed (at this point anyways!). If everything was successful, you should now be presented with the Enterprise Linux login screen.

    Perform the same installation on the second node
    After completing the Linux installation on the first node, repeat the above steps for the second node (linux2). When configuring the machine name and networking, ensure to configure the proper values. For my installation, this is what I configured for linux2:

    First, make sure that each of the network devices are checked to [Active on boot]. The installer will choose not to activate eth1.

    Second, [Edit] both eth0 and eth1 as follows:

    eth0:
    - Check off the option to [Configure using DHCP]
    - Leave the [Activate on boot] checked
    - IP Address: 192.168.1.101
    - Netmask: 255.255.255.0

    eth1:
    - Check off the option to [Configure using DHCP]
    - Leave the [Activate on boot] checked
    - IP Address: 192.168.2.101
    - Netmask: 255.255.255.0

    Continue by setting your hostname manually. I used "linux2" for the second node. Finish this dialog off by supplying your gateway and DNS servers.

     


    7. Network Configuration

    Perform the following network configuration on both Oracle RAC nodes in the cluster!

    Note: Although we configured several of the network settings during the Linux installation, it is important to not skip this section as it contains critical steps that are required for the RAC environment.

    Introduction to Network Settings

    During the Linux O/S install we already configured the IP address and host name for both of the Oracle RAC nodes. We now need to configure the /etc/hosts file as well as adjusting several of the network settings for the interconnect.

    Both of the Oracle RAC nodes should have one static IP address for the public network and one static IP address for the private cluster interconnect. Do not use DHCP naming for the public IP address or the interconnects; you need static IP addresses! The private interconnect should only be used by Oracle to transfer Cluster Manager and Cache Fusion related data along with data for the network storage server (Openfiler). Note that Oracle does not support using the public network interface for the interconnect. You must have one network interface for the public network and another network interface for the private interconnect. For a production RAC implementation, the interconnect should be at least gigabit (or more) and only be used by Oracle as well as having the network storage server (Openfiler) on a separate gigabit network.

    Configuring Public and Private Network

    In our two node example, we need to configure the network on both Oracle RAC nodes for access to the public network as well as their private interconnect.

    The easiest way to configure network settings in Oracle Enterprise Linux is with the program "Network Configuration". This application can be started from the command-line as the "root" user account as follows:

    # su -
    # /usr/bin/system-config-network &

    Using the Network Configuration application, you need to configure both NIC devices as well as the /etc/hosts file. Both of these tasks can be completed using the Network Configuration GUI. Notice that the /etc/hosts settings are the same for both nodes.

    Our example configuration will use the following settings:

    Oracle RAC Node 1 - (linux1)
    Device IP Address Subnet Gateway Purpose
    eth0 192.168.1.100 255.255.255.0 192.168.1.1 Connects linux1 to the public network
    eth1 192.168.2.100 255.255.255.0   Connects linux1 (interconnect) to linux2 (linux2-priv)
    /etc/hosts
    127.0.0.1        localhost.localdomain localhost
    
    # Public Network - (eth0)
    192.168.1.100    linux1
    192.168.1.101    linux2
    
    # Private Interconnect - (eth1)
    192.168.2.100    linux1-priv
    192.168.2.101    linux2-priv
    
    # Public Virtual IP (VIP) addresses - (eth0)
    192.168.1.200    linux1-vip
    192.168.1.201    linux2-vip
    
    # Private Storage Network for Openfiler - (eth1)
    192.168.1.195    openfiler1
    192.168.2.195    openfiler1-priv

    Oracle RAC Node 2 - (linux2)
    Device IP Address Subnet Gateway Purpose
    eth0 192.168.1.101 255.255.255.0 192.168.1.1 Connects linux2 to the public network
    eth1 192.168.2.101 255.255.255.0   Connects linux2 (interconnect) to linux1 (linux1-priv)
    /etc/hosts
    127.0.0.1        localhost.localdomain localhost
    
    # Public Network - (eth0)
    192.168.1.100    linux1
    192.168.1.101    linux2
    
    # Private Interconnect - (eth1)
    192.168.2.100    linux1-priv
    192.168.2.101    linux2-priv
    
    # Public Virtual IP (VIP) addresses - (eth0)
    192.168.1.200    linux1-vip
    192.168.1.201    linux2-vip
    
    # Private Storage Network for Openfiler - (eth1)
    192.168.1.195    openfiler1
    192.168.2.195    openfiler1-priv

    Note that the virtual IP addresses only need to be defined in the /etc/hosts file (or your DNS) for both Oracle RAC nodes. The public virtual IP addresses will be configured automatically by Oracle when you run the Oracle Universal Installer, which starts Oracle's Virtual Internet Protocol Configuration Assistant (VIPCA). All virtual IP addresses will be activated when the srvctl start nodeapps -n <node_name> command is run. This is the Host Name/IP Address that will be configured in the client(s) tnsnames.ora file (more details later).

    In the screenshots below, only Oracle RAC Node 1 (linux1) is shown. Be sure to make all the proper network settings to both Oracle RAC nodes.



    Figure 2 Network Configuration Screen, Node 1 (linux1)



    Figure 3 Ethernet Device Screen, eth0 (linux1)



    Figure 4 Ethernet Device Screen, eth1 (linux1)



    Figure 5: Network Configuration Screen, /etc/hosts (linux1)

    Once the network is configured, you can use the ifconfig command to verify everything is working. The following example is from linux1:

    # /sbin/ifconfig -a
    eth0      Link encap:Ethernet  HWaddr 00:14:6C:76:5C:71
              inet addr:192.168.1.100  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::214:6cff:fe76:5c71/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:1546 errors:0 dropped:0 overruns:0 frame:0
              TX packets:1273 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:1179157 (1.1 MiB)  TX bytes:183011 (178.7 KiB)
              Interrupt:169 Base address:0x2f00
    
    eth1      Link encap:Ethernet  HWaddr 00:0E:0C:64:D1:E5
              inet addr:192.168.2.100  Bcast:192.168.2.255  Mask:255.255.255.0
              inet6 addr: fe80::20e:cff:fe64:d1e5/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:0 (0.0 b)  TX bytes:782 (782.0 b)
              Base address:0xddc0 Memory:fe9c0000-fe9e0000
    
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:4893 errors:0 dropped:0 overruns:0 frame:0
              TX packets:4893 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:6521518 (6.2 MiB)  TX bytes:6521518 (6.2 MiB)
    
    sit0      Link encap:IPv6-in-IPv4
              NOARP  MTU:1480  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

    About Virtual IP

    Why is there a Virtual IP (VIP) in 10g? Why does it just return a dead connection when its primary node fails?

    It's all about availability of the application. When a node fails, the VIP associated with it is supposed to be automatically failed over to some other node. When this occurs, two things happen.

    1. The new node re-arps the world indicating a new MAC address for the address. For directly connected clients, this usually causes them to see errors on their connections to the old address.
    2. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

    This means that when the client issues SQL to the node that is now down, or traverses the address list while connecting, rather than waiting on a very long TCP/IP time-out (~10 minutes), the client receives a TCP reset. In the case of SQL, this is ORA-3113. In the case of connect, the next address in tnsnames is used.

    Going one step further is making use of Transparent Application Failover (TAF). With TAF successfully configured, it is possible to completely avoid ORA-3113 errors alltogether! TAF will be discussed in more detail in Section 30 ("Transparent Application Failover - (TAF)").

    Without using VIPs, clients connected to a node that died will often wait a 10-minute TCP timeout period before getting an error. As a result, you don't really have a good HA solution without using VIPs (Source - Metalink Note 220970.1).

    Confirm the RAC Node Name is Not Listed in Loopback Address

    Ensure that the node names (linux1 or linux2) are not included for the loopback address in the /etc/hosts file. If the machine name is listed in the in the loopback address entry as below:

    127.0.0.1 linux1 localhost.localdomain localhost
    it will need to be removed as shown below:
    127.0.0.1 localhost.localdomain localhost

    If the RAC node name is listed for the loopback address, you will receive the following error during the RAC installation:

    ORA-00603: ORACLE server session terminated by fatal error
    or
    ORA-29702: error occurred in Cluster Group Service operation

    Confirm localhost is defined in the /etc/hosts file for the loopback address

    Ensure that the entry for localhost.localdomain and localhost are included for the loopback address in the /etc/hosts file for each of the Oracle RAC nodes:

        127.0.0.1        localhost.localdomain localhost
    If an entry does not exist for localhost in the /etc/hosts file, Oracle Clusterware will be unable to start the application resources — notably the ONS process. The error would indicate "Failed to get IP for localhost" and will be written to the log file for ONS. For example:
    CRS-0215 could not start resource 'ora.linux1.ons'. Check log file
    "/u01/app/crs/log/linux1/racg/ora.linux1.ons.log"
    for more details.
    The ONS log file will contain lines similar to the following:

    Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
    2007-04-14 13:10:02.729: [ RACG][3086871296][13316][3086871296][ora.linux1.ons]: Failed to get IP for localhost (1)
    Failed to get IP for localhost (1)
    Failed to get IP for localhost (1)
    onsctl: ons failed to start
    ...

    Adjusting Network Settings

    With Oracle 9.2.0.1 and later, Oracle makes use of UDP as the default protocol on Linux for inter-process communication (IPC), such as Cache Fusion and Cluster Manager buffer transfers between instances within the RAC cluster.

    Oracle strongly suggests to adjust the default and maximum send buffer size (SO_SNDBUF socket option) to 256KB, and the default and maximum receive buffer size (SO_RCVBUF socket option) to 256KB.

    The receive buffers are used by TCP and UDP to hold received data until it is read by the application. The receive buffer cannot overflow because the peer is not allowed to send data beyond the buffer size window. This means that datagrams will be discarded if they don't fit in the socket receive buffer, potentially causing the sender to overwhelm the receiver.

    The default and maximum window size can be changed in the /proc file system without reboot:

    # su - root
    
    # sysctl -w net.core.rmem_default=262144
    net.core.rmem_default = 262144
    
    # sysctl -w net.core.wmem_default=262144
    net.core.wmem_default = 262144
    
    # sysctl -w net.core.rmem_max=262144
    net.core.rmem_max = 262144
    
    # sysctl -w net.core.wmem_max=262144
    net.core.wmem_max = 262144

    The above commands made the changes to the already running OS. You should now make the above changes permanent (for each reboot) by adding the following lines to the /etc/sysctl.conf file for both nodes in your RAC cluster:

    # Default setting in bytes of the socket receive buffer
    net.core.rmem_default=262144
    
    # Default setting in bytes of the socket send buffer
    net.core.wmem_default=262144
    
    # Maximum socket receive buffer size which may be set by using
    # the SO_RCVBUF socket option
    net.core.rmem_max=262144
    
    # Maximum socket send buffer size which may be set by using 
    # the SO_SNDBUF socket option
    net.core.wmem_max=262144
    

    Check and turn off UDP ICMP rejections

    During the Linux installation process, I indicated to not configure the firewall option. By default the option to configure a firewall is selected by the installer. This has burned me several times so I like to do a double-check that the firewall option is not configured and to ensure udp ICMP filtering is turned off.

    If UDP ICMP is blocked or rejected by the firewall, the Oracle Clusterware software will crash after several minutes of running. When the Oracle Clusterware process fails, you will have something similar to the following in the <machine_name>_evmocr.log file:

    08/29/2005 22:17:19
    oac_init:2: Could not connect to server, clsc retcode = 9
    08/29/2005 22:17:19
    a_init:12!: Client init unsuccessful : [32]
    ibctx:1:ERROR: INVALID FORMAT
    proprinit:problem reading the bootblock or superbloc 22

    When experiencing this type of error, the solution was to remove the udp ICMP (iptables) rejection rule - or to simply have the firewall option turned off. The Oracle Clusterware software will then start to operate normally and not crash. The following commands should be executed as the root user account:

    1. Check to ensure that the firewall option is turned off. If the firewall option is stopped (like it is in my example below) you do not have to proceed with the following steps.
      # /etc/rc.d/init.d/iptables status
      Firewall is stopped.

    2. If the firewall option is operating you will need to first manually disable UDP ICMP rejections:
      # /etc/rc.d/init.d/iptables stop
      Flushing firewall rules: [ OK ]
      Setting chains to policy ACCEPT: filter [ OK ]
      Unloading iptables modules: [ OK ]

    3. Then, to turn UDP ICMP rejections off for next server reboot (which should always be turned off):
      # chkconfig iptables off

     


    8. Install Openfiler

    Perform the following installation on the network storage server (openfiler1)!

    With the network configured on both Oracle RAC nodes, the next step is to install the Openfiler software to the network storage server (openfiler1). Later in this article, the network storage server will be configured as an iSCSI storage device for all Oracle RAC 10g shared storage requirements.

    Powered by rPath Linux, Openfiler is a free browser-based network storage management utility that delivers file-based Network Attached Storage (NAS) and block-based Storage Area Networking (SAN) in a single framework. The entire software stack interfaces with open source applications such as Apache, Samba, LVM2, ext3, Linux NFS and iSCSI Enterprise Target. Openfiler combines these ubiquitous technologies into a small, easy to manage solution fronted by a powerful web-based management interface.

    Openfiler supports CIFS, NFS, HTTP/DAV, and FTP, however, we will only be making use of its iSCSI capabilities to implement an inexpensive SAN for the shared storage components required by Oracle RAC 10g. A 500GB external hard drive will be connected to the Openfiler server via its USB 2.0 interface. The Openfiler server will be configured to use this disk for iSCSI based storage and will be used in our Oracle RAC 10g configuration to store the shared files required by Oracle Clusterware as well as all Oracle ASM volumes.

    To learn more about Openfiler, please visit their website at http://www.openfiler.com/

    Download Openfiler

    Use the links (below) to download Openfiler 2.2 x86 (respin 2). After downloading Openfiler, you will then need to burn the ISO image to CD.

    If you are downloading the above ISO file to a MS Windows machine, there are many options for burning these images (ISO files) to a CD. You may already be familiar with and have the proper software to burn images to CD. If you are not familiar with this process and do not have the required software to burn images to CD, here are just two (of many) software packages that can be used:

    Install Openfiler

    This section provides a summary of the screens used to install the Openfiler software. For the purpose of this article, I opted to install Openfiler with all default options. The only manual change required was for configuring the local network settings.

    Once the install has completed, the server will reboot to make sure all required components, services and drivers are started and recognized. After the reboot, the external hard drive should be discovered by the Openfiler server as the device /dev/sda.

    For more detailed installation instructions, please visit http://www.openfiler.com/docs/. I would suggest, however, that the instructions I have provided below be used for this Oracle RAC 10g configuration.

    Before installing the Openfiler software to the network storage server, you should have both NIC interfaces (cards) installed and any external hard drives connected and turned on.

    After downloading and burning the Openfiler ISO image (ISO file) to CD, insert the CD into the network storage server (openfiler1 in this example), power it on, and answer the installation screen prompts as noted below.

    Boot Screen
    The first screen is the Openfiler boot screen. At the boot: prompt, hit [Enter] to start the installation process.

    Media Test
    When asked to test the CD media, tab over to [Skip] and hit [Enter]. If there were any errors, the media burning software would have warned us. After several seconds, the installer should then detect the video card, monitor, and mouse. The installer then goes into GUI mode.

    Welcome to Openfiler NAS/SAN Appliance
    At the welcome screen, click [Next] to continue.

    Keyboard Configuration
    The next screen prompts you for the Keyboard settings. Make the appropriate selection for your configuration.

    Disk Partitioning Setup
    The next screen asks whether to perform disk partitioning using "Automatic Partitioning" or "Manual Partitioning with Disk Druid". You can choose either method here, although the official Openfiler documentation suggests to use Manual Partitioning. Since the internal hard drive I will be using for this install is small and only going to be used to store the Openfiler software (I will not be using any space on the internal 40GB hard drive for iSCSI storage), I opted to use "Automatic Partitioning".

    Select [Automatically partition] and click [Next] continue.

    If there were a previous installation of Linux on this machine, the next screen will ask if you want to "remove" or "keep" old partitions. Select the option to [Remove all partitions on this system]. Also, ensure that ONLY the [hda] drive is selected for this installation. I also keep the checkbox [Review (and modify if needed) the partitions created] selected. Click [Next] to continue.

    You will then be prompted with a dialog window asking if you really want to remove all partitions. Click [Yes] to acknowledge this warning.

    Partitioning
    The installer will then allow you to view (and modify if needed) the disk partitions it automatically selected for /dev/hda. In almost all cases, the installer will choose 100MB for /boot, double the amount of RAM for swap, and the rest going to the root (/) partition. I like to have a minimum of 1GB for swap. For the purpose of this install, I will accept all automatically preferred sizes. (Including 2GB for swap since I have 2GB of RAM installed.)

    Network Configuration
    I made sure to install both NIC interfaces (cards) in the network storage server before starting the Openfiler installation. This screen should have successfully detected each of the network devices.

    First, make sure that each of the network devices are checked to [Active on boot]. The installer may choose to not activate eth1 by default.

    Second, [Edit] both eth0 and eth1 as follows. You may choose to use different IP addresses for both eth0 and eth1 and that is OK. You must, however, configure eth1 (the storage network) to be on the same subnet you configured for eth1 on linux1 and linux2:

    eth0:
    - Check off the option to [Configure using DHCP]
    - Leave the [Activate on boot] checked
    - IP Address: 192.168.1.195
    - Netmask: 255.255.255.0

    eth1:
    - Check off the option to [Configure using DHCP]
    - Leave the [Activate on boot] checked
    - IP Address: 192.168.2.195
    - Netmask: 255.255.255.0

    Continue by setting your hostname manually. I used a hostname of "openfiler1". Finish this dialog off by supplying your gateway and DNS servers.

    Time Zone Selection
    The next screen allows you to configure your time zone information. Make the appropriate selection for your location.

    Set Root Password
    Select a root password and click [Next] to continue.

    About to Install
    This screen is basically a confirmation screen. Click [Next] to start the installation.

    Congratulations
    And that's it. You have successfully installed Openfiler on the network storage server. The installer will eject the CD from the CD-ROM drive. Take out the CD and click [Reboot] to reboot the system.

    If everything was successful after the reboot, you should now be presented with a text login screen and the URL to use for administering the Openfiler server.

    Modify /etc/hosts File on Openfiler Server
    Although not mandatory, I typically copy the contents of the /etc/hosts file from one of the Oracle RAC nodes to the new Openfiler server. This allows convenient name resolution when testing the network for the cluster.

     


    9. Configure iSCSI Volumes using Openfiler

    Perform the following configuration tasks on the network storage server (openfiler1)!

    Openfiler administration is performed using the Openfiler Storage Control Center — a browser based tool over an https connection on port 446. For example:

    https://openfiler1:446/

    From the Openfiler Storage Control Center home page, login as an administrator. The default administration login credentials for Openfiler are:

    • Username: openfiler
    • Password: password

    The first page the administrator sees is the [Accounts] / [Authentication] screen. Configuring user accounts and groups is not necessary for this article and will therefore not be discussed.

    To use the Openfiler as an iSCSI storage server, we have to perform three major tasks; set up iSCSI services, configure network access, and create physical storage.

    Services

    To control services, we use the Openfiler Storage Control Center and navigate to [Services] / [Enable/Disable]:



    Figure 6 Enable iSCSI Openfiler Service

    To enable the iSCSI service, click on 'Enable' under the 'iSCSI target' service name. After that, the 'iSCSI target' status should change to 'Enabled'.

    The ietd program implements the user level part of iSCSI Enterprise Target software for building an iSCSI storage system on Linux. With the iSCSI target enabled, we should be able to SSH into the Openfiler server and see the iscsi-target service running:

    [root@openfiler1 ~]# service iscsi-target status
    ietd (pid 3784) is running...

    Network Access Restriction

    The next step is to configure network access in Openfiler so both Oracle RAC nodes (linux1 and linux2) have permissions to our iSCSI volumes through the storage (private) network. (iSCSI volumes will be created in the next section!)

    Again, this task can be completed using the Openfiler Storage Control Center by navigating to [General] / [Local Networks]. The Local Networks screen allows an administrator to setup networks and/or hosts that will be allowed to access resources exported by the Openfiler appliance. For the purpose of this article, we will want to add both Oracle RAC nodes individually rather than allowing the entire 192.168.2.0 network have access to Openfiler resources.

    When entering each of the Oracle RAC nodes, note that the 'Name' field is just a logical name used for reference only. As a convention when entering nodes, I simply use the node name defined for that IP address. Next, when entering the actual node in the 'Network/Host' field, always use its IP address even though its host name may already be defined in your /etc/hosts file or DNS. Lastly, when entering actual hosts in our Class C network, use a subnet mask of 255.255.255.255.

    It is important to remember that you will be entering the IP address of the private network (eth1) for each of the RAC nodes in the cluster.

    The following image shows the results of adding both Oracle RAC nodes:



    Figure 7 Configure Openfiler Host Access for Oracle RAC Nodes

    Physical Storage

    In this section, we will be creating the five iSCSI volumes to be used as shared storage by both of the Oracle RAC nodes in the cluster. This involves multiple steps that will be performed on the external USB hard drive connected to the Openfiler server.

    Storage devices like internal IDE/SATA/SCSI disks, external USB or FireWire drives, or any other storage can be connected to the Openfiler server, and served to the clients. Once these devices are discovered at the OS level, Openfiler Storage Control Center can be used to set up and manage all that storage.

    In our case, we have a 500GB external USB hard drive for our storage needs. On the Openfiler server this drive is seen as /dev/sda (HDS72505 0KLAT80). To see this and to start the process of creating our iSCSI volumes, navigate to [Volumes] / [Physical Storage Mgmt.] from the Openfiler Storage Control Center:



    Figure 8 Openfiler Physical Storage

    Partitioning the Physical Disk

    The first step we will perform is to create a single primary partition on the /dev/sda external USB hard drive. By clicking on the /dev/sda link, we are presented with the options to 'Edit' or 'Create' a partition. Since we will be creating a single primary partition that spans the entire disk, most of the options can be left to their default setting where the only modification would be to change the 'Partition Type' from 'Extended partition' to 'Physical volume'. Here are the values I specified to create the primary partition on /dev/sda:

    Mode: Primary
    Partition Type: Physical volume
    Starting Cylinder: 1
    Ending Cylinder: 60801

    The size now shows 465.76 GB. To accept that we click on the Create button. This results in a new partition (/dev/sda1) on our external hard drive:



    Figure 9 Partition the Physical Volume

    Volume Group Management

    The next step is to create a Volume Group. We will be creating a single volume group named rac1 that contains the newly created primary partition.

    From the Openfiler Storage Control Center, navigate to [Volumes] / [Volume Group Mgmt.]. There we would see any existing volume groups, or none as in our case. Using the Volume Group Management screen, enter the name of the new volume group (rac1), click on the checkbox in front of /dev/sda1 to select that partition, and finally click on the 'Add volume group' button. After that we are presented with the list that now shows our newly created volume group named "rac1":



    Figure 10 New Volume Group Created

    Logical Volumes

    We can now create the five logical volumes in the newly created volume group (rac1).

    From the Openfiler Storage Control Center, navigate to [Volumes] / [Create New Volume]. There we will see the newly created volume group (rac1) along with its block storage statistics. Also available at the bottom of this screen is the option to create a new volume in the selected volume group. Use this screen to create the following five logical (iSCSI) volumes. After creating each logical volume, the application will point you to the "List of Existing Volumes" screen. You will then need to click back to the "Create New Volume" tab to create the next logical volume until all five iSCSI volumes are created:

    iSCSI / Logical Volumes
    Volume Name Volume Description Required Space (MB) Filesystem Type
    crs Oracle Clusterware 2,048 iSCSI
    asm1 Oracle ASM Volume 1 118,720 iSCSI
    asm2 Oracle ASM Volume 2 118,720 iSCSI
    asm3 Oracle ASM Volume 3 118,720 iSCSI
    asm4 Oracle ASM Volume 4 118,720 iSCSI

    In effect we have created five iSCSI disks that can now be presented to iSCSI clients (linux1 and linux2) on the network. The "List of Existing Volumes" screen should look as follows:



    Figure 11 New Logical (iSCSI) Volumes

    Grant Access Rights to New Logical Volumes

    Before an iSCSI client can have access to the newly created iSCSI volumes, it needs to be granted the appropriate permissions. Awhile back, we configured Openfiler with two hosts (the Oracle RAC nodes) that can be configured with access rights to resources. We now need to grant both of the Oracle RAC nodes access to each of the newly created iSCSI volumes.

    From the Openfiler Storage Control Center, navigate to [Volumes] / [List of Existing Volumes]. This will present the screen shown in the previous section. For each of the logical volumes, click on the 'Edit' link (under the Properties column). This will bring up the 'Edit properties' screen for that volume. Scroll to the bottom of this screen, change both hosts from 'Deny' to 'Allow' and click the 'Update' button:



    Figure 12 Grant Host Access to Logical (iSCSI) Volumes

    Perform this task for all five logical volumes.

    Make iSCSI Targets Available to Clients

    Every time a new logical volume is added, we need to restart the associated service on the Openfiler server. In our case we created iSCSI logical volumes, so we have to restart the iSCSI target (iscsi-target) service. This will make the new iSCSI targets available to all clients on the network who have privileges to access them.

    To restart the iSCSI target service, use the Openfiler Storage Control Center and navigate to [Services] / [Enable/Disable]. The iSCSI target service should already be enabled (several sections back). If so, disable the service then enable it again. (See Figure 6)

    The same task can be achieved through an SSH session on the Openfiler server:

    [root@openfiler1 ~]# service iscsi-target restart
    Stopping iSCSI target service: [  OK  ]
    Starting iSCSI target service: [  OK  ]

     


    10. Configure iSCSI Volumes on Oracle RAC Nodes

    Configure the iSCSI initiator on both Oracle RAC nodes in the cluster! Creating partitions, however, should only be executed on one of nodes in the RAC cluster.

    An iSCSI client can be any system (Linux, Unix, MS Windows, Apple Mac, etc.) for which iSCSI support (a driver) is available. In our case, the clients are two Linux servers, (linux1 and linux2), running Oracle Enterprise Linux 4.

    In this section we will be configuring the iSCSI initiator on both of the Oracle RAC nodes. This involves configuring the /etc/iscsi.conf file on both of the Oracle RAC nodes with the name of the network storage server (openfiler1) so they can discover the iSCSI volumes created in the previous section. We then go through the arduous task of mapping the iSCSI target names discovered from Openfiler to the local SCSI device name on one of the nodes — namely linux1 (the node where we will be partioning the iSCSI volumes from). This is often considered a lengthy task but only needs to be performed in this section and when formatting the iSCSI volumes with the Oracle Cluster File System (OCFS2) and Automatic Storage Management (ASM). Knowing the local SCSI device name and which iSCSI target it maps to is required in order to know which volume (device) is to be used for OCFS2 and which volumes belong to ASM. Note that every time one of the Oracle RAC nodes is rebooted, the mappings may be different. For example, the iSCSI target name "iqn.2006-01.com.openfiler:rac1.crs" may have been discovered as /dev/sdd on linux1 during the process of configuring the volumes (as it was for me when writing this section!). After rebooting this node, however, "iqn.2006-01.com.openfiler:rac1.crs" may get discovered as /dev/sde. This will not be a problem since all disks will be labeled either by OCFS2 or ASM (later in this article). When either of these services attempt to mount a volume, it will do so using their label and not using their local SCSI device name.

    iSCSI (initiator) service

    On each of the Oracle RAC nodes, we have to make sure the iSCSI (initiator) service is up and running. If not installed as part of the operating system setup, the iscsi-initiator-utils RPM (i.e. iscsi-initiator-utils-4.0.3.0-5.i386.rpm) should be downloaded and installed on each of the Oracle RAC nodes.

    To determine if this package is installed, perform the following on both Oracle RAC nodes:

    # rpm -qa | grep iscsi
    iscsi-initiator-utils-4.0.3.0-5

    If not installed, the iscsi-initiator-utils RPM package can be found on disk 3 of 4 of the Enterprise Linux 4 Update 5 distribution or downloaded from one of the Internet RPM resources.

    Use the following command to install the iscsi-initiator-utils RPM package if not present:

    # rpm -Uvh iscsi-initiator-utils-4.0.3.0-5.i386.rpm
    warning: iscsi-initiator-utils-4.0.3.0-5.i386.rpm: 
      V3 DSA signature: NOKEY, key ID 443e1821
    Preparing...                ########################################### [100%]
       1:iscsi-initiator-utils  ########################################### [100%]

    After verifying that the iscsi-initiator-utils RPM is installed, the only configuration step required on the Oracle RAC nodes (iSCSI client) is to specify the network storage server (iSCSI server) in the /etc/iscsi.conf file. Edit the /etc/iscsi.conf file and include an entry for DiscoveryAddress which specifies the hostname of the Openfiler network storage server. In our case that was:

    ...
    DiscoveryAddress=openfiler1-priv
    ...

    After making that change to the /etc/iscsi.conf file on both Oracle RAC nodes, we can start (or restart) the iscsi initiator service on both nodes:

    # service iscsi restart
    Searching for iscsi-based multipath maps
    Found 0 maps
    Stopping iscsid: iscsid not running
    
    Checking iscsi config:  [  OK  ]
    Loading iscsi driver:  [  OK  ]
    Starting iscsid: [  OK  ]

    We should also configure the iSCSI service to be active across machine reboots for both Oracle RAC nodes. The Linux command chkconfig can be used to achieve that as follows:

    # chkconfig --level 345 iscsi on

    Discovering iSCSI Targets

    Although the iSCSI initiator service has been configured and is running on both of the Oracle RAC nodes, the discovery instructions in this section only need to be run from the node we will be partitioning and labeling volumes from; namely linux1.

    When the Openfiler server publishes available iSCSI targets (that happens when the iscsi-target service gets started/restarted on the Openfiler server), or when the iSCSI initiator service is started/restarted on the client, configured clients get the message that new iSCSI disks are now available. We would see something like this in the client's /var/log/messages file:

    ...
    Jun 25 12:56:10 linux1 iscsi: iscsid startup succeeded
    Jun 25 12:56:10 linux1 iscsid[10315]: Connected to Discovery Address 192.168.2.195
    Jun 25 12:56:10 linux1 kernel: iscsi-sfnet:host1: Session established
    Jun 25 12:56:10 linux1 kernel: iscsi-sfnet:host0: Session established
    Jun 25 12:56:10 linux1 kernel: iscsi-sfnet:host2: Session established
    Jun 25 12:56:10 linux1 kernel: scsi0 : SFNet iSCSI driver
    Jun 25 12:56:10 linux1 kernel: iscsi-sfnet:host3: Session established
    Jun 25 12:56:10 linux1 kernel: scsi1 : SFNet iSCSI driver
    Jun 25 12:56:10 linux1 kernel:   Vendor: Openfile  Model: Virtual disk      Rev: 0
    Jun 25 12:56:10 linux1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
    Jun 25 12:56:10 linux1 kernel: SCSI device sda: 243138560 512-byte hdwr sectors (124487 MB)
    Jun 25 12:56:10 linux1 kernel: iscsi-sfnet:host4: Session established
    Jun 25 12:56:10 linux1 kernel: scsi4 : SFNet iSCSI driver
    Jun 25 12:56:10 linux1 kernel: scsi2 : SFNet iSCSI driver
    Jun 25 12:56:10 linux1 kernel:   Vendor: Openfile  Model: Virtual disk      Rev: 0
    Jun 25 12:56:10 linux1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
    Jun 25 12:56:10 linux1 kernel: SCSI device sda: drive cache: write through
    Jun 25 12:56:10 linux1 kernel:   Vendor: Openfile  Model: Virtual disk      Rev: 0
    Jun 25 12:56:10 linux1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
    Jun 25 12:56:10 linux1 kernel: scsi3 : SFNet iSCSI driver
    Jun 25 12:56:10 linux1 kernel:   Vendor: Openfile  Model: Virtual disk      Rev: 0
    Jun 25 12:56:10 linux1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
    Jun 25 12:56:10 linux1 kernel: SCSI device sda: 243138560 512-byte hdwr sectors (124487 MB)
    Jun 25 12:56:10 linux1 kernel:   Vendor: Openfile  Model: Virtual disk      Rev: 0
    Jun 25 12:56:10 linux1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
    Jun 25 12:56:10 linux1 kernel: SCSI device sda: drive cache: write through
    Jun 25 12:56:10 linux1 kernel:  sda: unknown partition table
    Jun 25 12:56:10 linux1 kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
    Jun 25 12:56:10 linux1 kernel: SCSI device sdb: 4194304 512-byte hdwr sectors (2147 MB)
    Jun 25 12:56:10 linux1 kernel: SCSI device sdb: drive cache: write through
    Jun 25 12:56:10 linux1 kernel: SCSI device sdb: 4194304 512-byte hdwr sectors (2147 MB)
    Jun 25 12:56:10 linux1 kernel: SCSI device sdb: drive cache: write through
    Jun 25 12:56:10 linux1 kernel:  sdb: unknown partition table
    Jun 25 12:56:10 linux1 kernel: Attached scsi disk sdb at scsi4, channel 0, id 0, lun 0
    Jun 25 12:56:10 linux1 kernel: SCSI device sdc: 243138560 512-byte hdwr sectors (124487 MB)
    Jun 25 12:56:10 linux1 kernel: SCSI device sdc: drive cache: write through
    Jun 25 12:56:10 linux1 kernel: SCSI device sdc: 243138560 512-byte hdwr sectors (124487 MB)
    Jun 25 12:56:10 linux1 kernel: SCSI device sdc: drive cache: write through
    Jun 25 12:56:11 linux1 scsi.agent[10455]: disk at /devices/platform/host0/target0:0:0/0:0:0:0
    Jun 25 12:56:11 linux1 scsi.agent[10476]: disk at /devices/platform/host4/target4:0:0/4:0:0:0
    Jun 25 12:56:11 linux1 kernel:  sdc: unknown partition table
    Jun 25 12:56:11 linux1 kernel: Attached scsi disk sdc at scsi2, channel 0, id 0, lun 0
    Jun 25 12:56:11 linux1 kernel: SCSI device sdd: 243138560 512-byte hdwr sectors (124487 MB)
    Jun 25 12:56:11 linux1 kernel: SCSI device sdd: drive cache: write through
    Jun 25 12:56:11 linux1 kernel: SCSI device sdd: 243138560 512-byte hdwr sectors (124487 MB)
    Jun 25 12:56:11 linux1 kernel: SCSI device sdd: drive cache: write through
    Jun 25 12:56:11 linux1 hald[3898]: Timed out waiting for hotplug event 626. Rebasing to 629
    Jun 25 12:56:11 linux1 kernel:  sdd: unknown partition table
    Jun 25 12:56:11 linux1 scsi.agent[10499]: disk at /devices/platform/host2/target2:0:0/2:0:0:0
    Jun 25 12:56:11 linux1 kernel: Attached scsi disk sdd at scsi3, channel 0, id 0, lun 0
    Jun 25 12:56:11 linux1 kernel: SCSI device sde: 243138560 512-byte hdwr sectors (124487 MB)
    Jun 25 12:56:11 linux1 kernel: SCSI device sde: drive cache: write through
    Jun 25 12:56:11 linux1 kernel: SCSI device sde: 243138560 512-byte hdwr sectors (124487 MB)
    Jun 25 12:56:11 linux1 kernel: SCSI device sde: drive cache: write through
    Jun 25 12:56:11 linux1 scsi.agent[10528]: disk at /devices/platform/host3/target3:0:0/3:0:0:0
    Jun 25 12:56:11 linux1 kernel:  sde: unknown partition table
    Jun 25 12:56:11 linux1 kernel: Attached scsi disk sde at scsi1, channel 0, id 0, lun 0
    Jun 25 12:56:11 linux1 scsi.agent[10559]: disk at /devices/platform/host1/target1:0:0/1:0:0:0
    ...
    

    The above entries show that the client (linux1) was able to establish the iSCSI sessions with the iSCSI storage server (openfiler1-priv at 192.168.2.195).

    We also see how the local SCSI device names map to iSCSI targets' host IDs and LUNs:

    Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
    Attached scsi disk sdb at scsi4, channel 0, id 0, lun 0
    Attached scsi disk sdc at scsi2, channel 0, id 0, lun 0
    Attached scsi disk sdd at scsi3, channel 0, id 0, lun 0
    Attached scsi disk sde at scsi1, channel 0, id 0, lun 0

    Another way to determine how local SCSI device names map to iSCSI targets' host IDs and LUNs is with the dmesg command:

    # dmesg | sort | grep '^Attached scsi disk'
    Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
    Attached scsi disk sdb at scsi4, channel 0, id 0, lun 0
    Attached scsi disk sdc at scsi2, channel 0, id 0, lun 0
    Attached scsi disk sdd at scsi3, channel 0, id 0, lun 0
    Attached scsi disk sde at scsi1, channel 0, id 0, lun 0

    We now have to work out the mapping of iSCSI target names to local SCSI IDs (which gets displayed as HOST ID below), by running the iscsi-ls command on the client (linux1):

    # iscsi-ls
    *******************************************************************************
    SFNet iSCSI Driver Version ...4:0.1.11-4(15-Jan-2007)
    *******************************************************************************
    TARGET NAME             : iqn.2006-01.com.openfiler:rac1.asm4
    TARGET ALIAS            :
    HOST ID                 : 0
    BUS ID                  : 0
    TARGET ID               : 0
    TARGET ADDRESS          : 192.168.2.195:3260,1
    SESSION STATUS          : ESTABLISHED AT Mon Jun 25 12:56:10 EDT 2007
    SESSION ID              : ISID 00023d000001 TSIH 200
    *******************************************************************************
    TARGET NAME             : iqn.2006-01.com.openfiler:rac1.asm3
    TARGET ALIAS            :
    HOST ID                 : 1
    BUS ID                  : 0
    TARGET ID               : 0
    TARGET ADDRESS          : 192.168.2.195:3260,1
    SESSION STATUS          : ESTABLISHED AT Mon Jun 25 12:56:10 EDT 2007
    SESSION ID              : ISID 00023d000001 TSIH 100
    *******************************************************************************
    TARGET NAME             : iqn.2006-01.com.openfiler:rac1.asm2
    TARGET ALIAS            :
    HOST ID                 : 2
    BUS ID                  : 0
    TARGET ID               : 0
    TARGET ADDRESS          : 192.168.2.195:3260,1
    SESSION STATUS          : ESTABLISHED AT Mon Jun 25 12:56:10 EDT 2007
    SESSION ID              : ISID 00023d000001 TSIH 300
    *******************************************************************************
    TARGET NAME             : iqn.2006-01.com.openfiler:rac1.asm1
    TARGET ALIAS            :
    HOST ID                 : 3
    BUS ID                  : 0
    TARGET ID               : 0
    TARGET ADDRESS          : 192.168.2.195:3260,1
    SESSION STATUS          : ESTABLISHED AT Mon Jun 25 12:56:10 EDT 2007
    SESSION ID              : ISID 00023d000001 TSIH 400
    *******************************************************************************
    TARGET NAME             : iqn.2006-01.com.openfiler:rac1.crs
    TARGET ALIAS            :
    HOST ID                 : 4
    BUS ID                  : 0
    TARGET ID               : 0
    TARGET ADDRESS          : 192.168.2.195:3260,1
    SESSION STATUS          : ESTABLISHED AT Mon Jun 25 12:56:10 EDT 2007
    SESSION ID              : ISID 00023d000001 TSIH 500
    *******************************************************************************

    Using the mapping information from local SCSI ID to the iSCSI targets' host IDs / LUNs along with the iSCSI targets' name to SCSI ID, we can then generate a full mapping from iSCSI target name to local SCSI device name for the host linux1:

    iSCSI Target Name to local SCSI Device Name
    iSCSI Target Name Host / SCSI ID SCSI Device Name
    iqn.2006-01.com.openfiler:rac1.asm4 0 /dev/sda
    iqn.2006-01.com.openfiler:rac1.asm3 1 /dev/sde
    iqn.2006-01.com.openfiler:rac1.asm2 2 /dev/sdc
    iqn.2006-01.com.openfiler:rac1.asm1 3 /dev/sdd
    iqn.2006-01.com.openfiler:rac1.crs 4 /dev/sdb

    Note that the method I used above to create the mapping of iSCSI Target Names to local SCSI Device Names can become pretty cumbersome and is very prone to errors.

    A much more efficient process in generating this mapping comes from a script written by Martin Jones:

    iscsi-ls-map.sh
    # ---------------------
    # FILE: iscsi-ls-map.sh
    # ---------------------
    
    RUN_USERID=root
    export RUN_USERID
    
    RUID=`id | awk -F\( '{print $2}'|awk -F\) '{print $1}'`
    if [[ ${RUID} != "$RUN_USERID" ]];then
        echo " "
        echo "You must be logged in as $RUN_USERID to run this script."
        echo "Exiting script."
        echo " "
        exit 1
    fi
    
    dmesg | grep "^Attach"  \
          | awk -F" " '{ print "/dev/"$4 " " $6 }'  \
          | sed -e 's/,//' | sed -e 's/scsi//'  \
          | sort -n -k2  \
          | sed -e '/disk1/d' > /tmp/tmp_scsi_dev
    
    iscsi-ls | egrep -e "TARGET NAME" -e "HOST ID"   \
             | awk -F" " '{ if ($0 ~ /^TARGET.*/) printf $4; if ( $0 ~ /^HOST/) printf " %s\n",$4}'  \
             | sort -n -k2  \
             | cut -d':' -f2-  \
             | cut -d'.' -f2- > /tmp/tmp_scsi_targets
    
    join -t" " -1 2 -2 2 /tmp/tmp_scsi_dev /tmp/tmp_scsi_targets > MAP
    
    
    echo "Host / SCSI ID    SCSI Device Name         iSCSI Target Name"
    echo "----------------  -----------------------  -----------------"
    
    cat MAP | sed -e 's/ /                 /g'
    
    rm -f MAP

    Example run:

    # ./iscsi-ls-map.sh
    
    Host / SCSI ID    SCSI Device Name         iSCSI Target Name
    ----------------  -----------------------  -----------------
    0                 /dev/sda                 asm4
    1                 /dev/sde                 asm3
    2                 /dev/sdc                 asm2
    3                 /dev/sdd                 asm1
    4                 /dev/sdb                 crs

    Create Partitions on iSCSI Volumes

    The next step is to create a single primary partition on each of the iSCSI volumes that spans the entire size of the volume. As mentioned earlier in this article, I will be using Oracle's Cluster File System, Release 2 (OCFS2) to store the two files to be shared for Oracle's Clusterware software. We will then be using Automatic Storage Management (ASM) to create four ASM volumes; two for all physical database files (data/index files, online redo log files, and control files) and two for the Flash Recovery Area (RMAN backups and archived redo log files).

    The following table lists the five iSCSI volumes and what file systems they will support:

    Oracle Shared Drive Configuration
    File System Type iSCSI Target (short) Name Size Mount Point ASM Diskgroup Name File Types
    OCFS2 crs 2 GB /u02/oradata/orcl   Oracle Cluster Registry (OCR) File - (~100 MB)
    Voting Disk - (~20MB)
    ASM asm1 118 GB ORCL:VOL1 +ORCL_DATA1 Oracle Database Files
    ASM asm2 118 GB ORCL:VOL2 +ORCL_DATA1 Oracle Database Files
    ASM asm3 118 GB ORCL:VOL3 +FLASH_RECOVERY_AREA Oracle Flash Recovery Area
    ASM asm4 118 GB ORCL:VOL4 +FLASH_RECOVERY_AREA Oracle Flash Recovery Area
    Total   474 GB      

    As shown in the table above, we will need to create a single Linux primary partition on each of the five iSCSI volumes. The fdisk command is used in Linux for creating (and removing) partitions. For each of the five iSCSI volumes, you can use the default values when creating the primary partition as the default action is to use the entire disk. You can safely ignore any warnings that may indicate the device does not contain a valid DOS partition (or Sun, SGI or OSF disklabel).

    For the purpose of this example, I will be running the fdisk command from linux1 to create a single primary partition for each of the local SCSI devices identified in the previous section:

    • /dev/sda
    • /dev/sdb
    • /dev/sdc
    • /dev/sdd
    • /dev/sde

    Please note that creating the partition on each of the iSCSI volumes must only be run from one of the nodes in the Oracle RAC cluster!

    # ---------------------------------------
    
    # fdisk /dev/sda
    Command (m for help): n
    Command action
       e   extended
       p   primary partition (1-4)
    p
    Partition number (1-4): 1
    First cylinder (1-15134, default 1): 1
    Last cylinder or +size or +sizeM or +sizeK (1-15134, default 15134): 15134
    
    Command (m for help): p
    
    Disk /dev/sda: 124.4 GB, 124486942720 bytes
    255 heads, 63 sectors/track, 15134 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sda1               1       15134   121563823+  83  Linux
    
    Command (m for help): w
    The partition table has been altered!
    
    Calling ioctl() to re-read partition table.
    Syncing disks.
    
    # ---------------------------------------
    
    # fdisk /dev/sdb
    Command (m for help): n
    Command action
       e   extended
       p   primary partition (1-4)
    p
    Partition number (1-4): 1
    First cylinder (1-1009, default 1): 1
    Last cylinder or +size or +sizeM or +sizeK (1-1009, default 1009): 1009
    
    Command (m for help): p
    
    Disk /dev/sdb: 2147 MB, 2147483648 bytes
    67 heads, 62 sectors/track, 1009 cylinders
    Units = cylinders of 4154 * 512 = 2126848 bytes
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdb1               1        1009     2095662   83  Linux
    
    Command (m for help): w
    The partition table has been altered!
    
    Calling ioctl() to re-read partition table.
    Syncing disks.
    
    # ---------------------------------------
    
    # fdisk /dev/sdc
    Command (m for help): n
    Command action
       e   extended
       p   primary partition (1-4)
    p
    Partition number (1-4): 1
    First cylinder (1-15134, default 1): 1
    Last cylinder or +size or +sizeM or +sizeK (1-15134, default 15134): 15134
    
    Command (m for help): p
    
    Disk /dev/sdc: 124.4 GB, 124486942720 bytes
    255 heads, 63 sectors/track, 15134 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdc1               1       15134   121563823+  83  Linux
    
    Command (m for help): w
    The partition table has been altered!
    
    Calling ioctl() to re-read partition table.
    Syncing disks.
    
    # ---------------------------------------
    
    # fdisk /dev/sdd
    Command (m for help): n
    Command action
       e   extended
       p   primary partition (1-4)
    p
    Partition number (1-4): 1
    First cylinder (1-15134, default 1): 1
    Last cylinder or +size or +sizeM or +sizeK (1-15134, default 15134): 15134
    
    Command (m for help): p
    
    Disk /dev/sdd: 124.4 GB, 124486942720 bytes
    255 heads, 63 sectors/track, 15134 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdd1               1       15134   121563823+  83  Linux
    
    Command (m for help): w
    The partition table has been altered!
    
    Calling ioctl() to re-read partition table.
    Syncing disks.
    
    # ---------------------------------------
    
    # fdisk /dev/sde
    Command (m for help): n
    Command action
       e   extended
       p   primary partition (1-4)
    p
    Partition number (1-4): 1
    First cylinder (1-15134, default 1): 1
    Last cylinder or +size or +sizeM or +sizeK (1-15134, default 15134): 15134
    
    Command (m for help): p
    
    Disk /dev/sde: 124.4 GB, 124486942720 bytes
    255 heads, 63 sectors/track, 15134 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sde1               1       15134   121563823+  83  Linux
    
    Command (m for help): w
    The partition table has been altered!
    
    Calling ioctl() to re-read partition table.
    Syncing disks.
    
    # ---------------------------------------

    After creating all required partitions, you should now inform the kernel of the partition changes using the following command as the "root" user account from both of the Oracle RAC nodes in the cluster. Note that the mapping of iSCSI target names discovered from Openfiler and the local SCSI device name will be different on both Oracle RAC nodes. This will not cause any problems since the volumes will be mounted by name as opposed to their local SCSI device name.

    # partprobe
    
    # fdisk -l
    
    Disk /dev/hda: 40.0 GB, 40000000000 bytes
    255 heads, 63 sectors/track, 4863 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/hda1   *           1          13      104391   83  Linux
    /dev/hda2              14        4863    38957625   8e  Linux LVM
    
    Disk /dev/sda: 124.4 GB, 124486942720 bytes
    255 heads, 63 sectors/track, 15134 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sda1               1       15134   121563823+  83  Linux
    
    Disk /dev/sdb: 2147 MB, 2147483648 bytes
    67 heads, 62 sectors/track, 1009 cylinders
    Units = cylinders of 4154 * 512 = 2126848 bytes
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdb1               1        1009     2095662   83  Linux
    
    Disk /dev/sdc: 124.4 GB, 124486942720 bytes
    255 heads, 63 sectors/track, 15134 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdc1               1       15134   121563823+  83  Linux
    
    Disk /dev/sdd: 124.4 GB, 124486942720 bytes
    255 heads, 63 sectors/track, 15134 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sdd1               1       15134   121563823+  83  Linux
    
    Disk /dev/sde: 124.4 GB, 124486942720 bytes
    255 heads, 63 sectors/track, 15134 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sde1               1       15134   121563823+  83  Linux


    Page 1  Page 2  Page 3

    E-mail this page
    Printer View Printer View
    Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy