|
See the Oracle RAC 11g Release 1 version of this guide here
1.
Introduction
One of the most
efficient ways to become familiar with Oracle Real Application Clusters
(RAC) 10g technology is to have access to an actual
Oracle RAC 10g cluster. There's no better way to
understand its benefits—including fault tolerance,
security, load balancing, and scalability—than to
experience them directly.
Unfortunately, for many
shops, the price of the hardware required for a typical production RAC
configuration makes this goal impossible. A small two-node cluster can
cost from US$10,000 to well over US$20,000. That cost would not even
include the heart of a production RAC environment—typically
a storage area network—which can start at US$10,000.
For those who want to become familiar
with Oracle RAC 10g without a major cash outlay,
this guide provides a low-cost alternative to configuring an Oracle RAC
10g Release 2 system using commercial off-the-shelf
components and downloadable software at an estimated cost of US$2,200
to US$2,600. The system
will consist of a dual node cluster (each with a single processor), both running Oracle's Enterprise Linux
(Release 4 Update 5),
Oracle10g Release 2, OCFS2, and ASMLib 2.0. All shared disk storage for Oracle RAC
will be based on iSCSI
using a Network Storage Server; namely Openfiler Release 2.2.
Although this article should work with Red Hat Enterprise Linux,
Oracle's Enterprise Linux (available for free) will provide the same if not better
stability and will already include the OCFS2 and ASMLib software packages
(with the exception of the ASMLib userspace libraries which is a separate download).
Powered by rPath Linux,
Openfiler
is a free browser-based network storage management utility that delivers file-based
Network Attached Storage (NAS) and block-based Storage Area Networking (SAN) in a single framework.
Openfiler supports CIFS, NFS, HTTP/DAV, FTP, however, we will only be making use of its
iSCSI capabilities to implement an inexpensive SAN for the shared storage components
required by Oracle RAC 10g. A 500GB external hard drive will be connected to the
network storage server (sometimes referred to in this article as the
Openfiler server) via its USB 2.0 interface. The Openfiler server will be configured to
use this disk for iSCSI based storage and will be used in our Oracle RAC 10g
configuration to store the shared files required by Oracle Clusterware as well as all
Oracle ASM volumes.
Note: This article is provided for educational purposes only, so the setup is kept simple to
demonstrate ideas and concepts. For example, the disk mirroring configured in this article will be setup on one
physical disk only, while in practice that should be performed on multiple physical drives. Also note that
while this article provides detailed instructions for successfully installing a complete
Oracle RAC 10g system, it is by no means a substitute for the official
Oracle documentation. In addition to this article, users should also consult the following
Oracle documents to gain a full understanding of alternative configuration options, installation, and
administration with Oracle RAC 10g. Oracle's official documentation site is
docs.oracle.com.
This
is not the only way to build a low-cost
Oracle RAC 10g system.
I have worked on other solutions that utilize an implementation based on SCSI
for the shared storage component. In some cases, SCSI will cost more than
the implementation described in this article where an inexpensive SCSI configuration
will consist of:
- SCSI Controller:Two SCSI controllers priced from $20 (Adaptec AHA-2940UW) to $220 (Adaptec 39320A-R) each
- SCSI Enclosure: $70 - (Inclose 1 Bay 3.5" U320 SCSI Case)
- SCSI Hard Drive: $140 - (36GB 15K 68p U320 SCSI Hard Drive)
- SCSI Cables: Two SCSI cables priced at $20 each - (3ft External HD68 to HD68 U320 Cable)
Keep in mind that some motherboards may already include
built-in SCSI controllers.
The previous Oracle9i
and Oracle 10g Release 1 guides used raw
partitions for storing files on shared storage, but here we will make
use of the Oracle Cluster File System Release 2 (OCFS2) and Oracle
Automatic Storage Management (ASM) feature. The two Oracle RAC nodes will
be configured as follows:
| Oracle Database Files |
| RAC Node Name |
Instance Name |
Database Name |
$ORACLE_BASE |
File System / Volume Manager for DB Files |
| linux1 |
orcl1 |
orcl |
/u01/app/oracle |
ASM |
| linux2 |
orcl2 |
orcl |
/u01/app/oracle |
ASM |
| Oracle Clusterware Shared Files |
| File Type |
File Name |
iSCSI Volume Name |
Mount Point |
File System |
| Oracle Cluster Registry |
/u02/oradata/orcl/OCRFile |
crs |
/u02/oradata/orcl |
OCFS2 |
| CRS Voting Disk |
/u02/oradata/orcl/CSSFile |
crs |
/u02/oradata/orcl |
OCFS2 |
Note that with Oracle Database 10g Release 2 (10.2),
Cluster Ready Services, or CRS, is now
called Oracle Clusterware.
The Oracle Clusterware software will be installed to
/u01/app/crs on both of the nodes
that make up the RAC cluster.
Starting with Oracle Database 10g Release 2 (10.2), Oracle Clusterware should be
installed in a separate Oracle Clusterware home directory which is non-release specific
(/u01/app/oracle/product/10.2.0/... for example)
and must never be a subdirectory of the ORACLE_BASE directory (/u01/app/oracle for example).
This is a change to the Optimal Flexible Architecture (OFA) rules. Note that the
Oracle Clusterware and Oracle Real Application Clusters installation documentation from Oracle
incorrectly state that the Oracle Clusterware home directory can be a subdirectory of the
ORACLE_BASE directory. For example, in Chapter 2, "Preinstallation", in the section "Oracle
Clusterware home directory", it incorrectly lists the path /u01/app/oracle/product/crs as
a possible Oracle Clusterware home (or CRS home) path. This is incorrect. The default ORACLE_BASE path
is /u01/app/oracle, and the Oracle Clusterware home must never be a subdirectory of the ORACLE_BASE
directory. This issue is tracked with Oracle documentation bug
"5843155" - (B14203-08 HAS CONFLICTING CRS_HOME LOCATIONS )
and is fixed in Oracle 11g.
The Oracle Clusterware software will be installed to
/u01/app/crs on both of the nodes
that make up the RAC cluster, however, the Clusterware software requires that two
of its files, the
"Oracle Cluster Registry (OCR)" file and the "Voting Disk" file
be shared with both nodes in the cluster. These two files will
be installed on shared storage using Oracle's Cluster File System, Release 2 (OCFS2).
It is also possible to use RAW devices for these files, however, it is not possible
to use ASM for these two shared Clusterware files.
The Oracle10g Release 2 Database software will be installed into a separate
Oracle Home; namely /u01/app/oracle/product/10.2.0/db_1 on both of the nodes
that make up the RAC cluster.
All of the Oracle physical database files (data, online redo logs, control files, archived redo logs)
will be installed to shared volumes being
managed by Automatic Storage Management (ASM). (The Oracle database files can just as easily be
stored on OCFS2. Using ASM, however, makes the article that much more
interesting!)
Note: This article is only designed
to work as documented with absolutely no substitutions.
The only exception here is the choice of vendor hardware
(i.e. machines, networking equipment, and external hard drive).
Ensure that the hardware you purchase from the vendor is supported on Oracle Enterprise Linux (Release 4 Update 5.
If you are
looking for an example that takes advantage of Oracle RAC 11g
Release 1 with OEL using iSCSI, click here.
If you are
looking for an example that takes advantage of Oracle RAC 10g
Release 2 with RHEL 4 using FireWire, click here.
If you are
looking for an example that takes advantage of Oracle RAC 10g
Release 1 with RHEL 3, click here.
For the previously
published Oracle9i RAC version of this guide, click here.
2.
Oracle RAC 10g Overview
Oracle RAC, introduced with Oracle9i, is the successor to Oracle Parallel Server (OPS).
Oracle RAC allows multiple instances to access the same
database (storage) simultaneously. RAC provides fault tolerance, load balancing, and
performance benefits by allowing the system to scale out, and at the same
time since all nodes access the same database, the failure of one instance
will not cause the loss of access to the database.
At the heart of Oracle10g RAC is a shared disk subsystem. All nodes in the
cluster must be able to access all of the data, redo log files, control
files and parameter files for all nodes in the cluster. The data disks must
be globally available in order to allow all nodes to access the database. Each
node has its own redo log file(s) and UNDO tablespace, but the other nodes must be able to
access them (and the shared control file) in order to recover that node in the event of a system failure.
The biggest difference between Oracle RAC and OPS is the addition of Cache
Fusion. With OPS a request for data from one node to another required the
data to be written to disk first, then the requesting node can read that
data. With cache fusion, data is passed along a high-speed interconnect
using a sophisticated locking algorithm.
Not all clustering solutions use shared storage. Some vendors use an approach
known as a Federated Cluster, in which data is spread across several machines
rather than shared by all. With Oracle10g RAC, however, multiple nodes use the same
set of disks for storing data. With Oracle10g RAC, the data files, redo log files, control files,
and archived log files reside on shared storage on raw-disk devices, a NAS, ASM, or on a
clustered file system. Oracle's approach to clustering leverages the collective
processing power of all the nodes in the cluster and at the same time provides
failover security.
Pre-configured Oracle10g RAC solutions are available from vendors such as
Dell, IBM and HP for production environments. This article, however,
focuses on putting together your own Oracle10g RAC environment for development
and testing by using Linux servers and a low cost shared disk solution; iSCSI.
For more background about Oracle RAC,
visit the Oracle RAC Product
Center on OTN.
3.
Shared-Storage Overview
Today, fibre channel is one of the most popular solutions for shared storage.
As mentioned earlier, fibre channel is a high-speed serial-transfer interface
that is used to connect systems and storage devices in either point-to-point (FC-P2P),
arbitrated loop (FC-AL), or switched topologies (FC-SW). Protocols supported by Fibre Channel include SCSI
and IP. Fibre channel configurations can support as many as 127 nodes
and have a throughput of up to 2.12 gigabits per second in each direction,
and 4.25 Gbps is expected.
Fibre channel, however, is
very expensive. Just the fibre channel switch alone can start at around US$1,000. This
does not even include the fibre channel storage array and high-end drives,
which can reach prices of about US$300 for a 36GB drive. A typical fibre channel setup
which includes fibre channel cards for the servers is roughly US$10,000,
which does not include the cost of the servers that make up the cluster.
A less expensive alternative to fibre channel is SCSI. SCSI technology provides
acceptable performance for shared storage, but for administrators and developers who
are used to GPL-based Linux prices, even SCSI can come in over budget, at around
US$2,000 to US$5,000 for a two-node cluster.
Another popular solution is the Sun NFS (Network File System) found on a NAS. It can be used for
shared storage but only if you are using a network appliance or something similar.
Specifically, you need servers that guarantee direct I/O over NFS, TCP as the transport protocol,
and read/write block sizes of 32K.
The shared storage that will be used for this article is based on
iSCSI technology using a network storage server installed with Openfiler.
This solution offers a low-cost alternative
to fibre channel for testing and educational purposes, but given the
low-end hardware being used, it should not be used in a production
environment.
4.
iSCSI Technology
For many years, the only technology that existed for building a
network based storage solution was a Fibre Channel Storage Area Network
(FC SAN).
Based on an earlier set of ANSI protocols called
Fiber Distributed Data Interface (FDDI), Fibre Channel was developed to
move SCSI commands over a storage network.
Several of the advantages to FC SAN include greater performance,
increased disk utilization, improved availability, better scalability,
and most important to us support for server clustering! Still today, however,
FC SANs suffer from three major disadvantages. The first is price.
While the costs involved in building a FC SAN have come down in recent
years, the cost of entry still remains prohibitive for small companies
with limited IT budgets.
The second is incompatible hardware components. Since its adoption, many
product manufacturers have interpreted the Fibre Channel specifications
differently from each other which has resulted in scores of interconnect
problems. When purchasing Fibre Channel components from a common manufacturer,
this is usually not a problem.
The third disadvantage is the fact that a Fibre Channel network is not
Ethernet! It requires a separate network technology along with a second
set of skill sets that need to exist with the datacenter staff.
With the popularity of Gigabit Ethernet and the demand for lower cost,
Fibre Channel has recently been given a run for its money by iSCSI-based
storage systems. Today, iSCSI SANs remain the leading competitor to FC SANs.
Ratified on February 11, 2003 by the Internet Engineering Task Force (IETF),
the Internet Small Computer System Interface, better known as iSCSI,
is an Internet Protocol (IP)-based storage networking standard for
establishing and managing connections between IP-based storage devices,
hosts, and clients. iSCSI is a data transport protocol defined in the
SCSI-3 specifications framework
and is similar to Fibre Channel in that it is responsible for carrying
block-level data over a storage network.
Block-level communication means that data is transferred between the host
and the client in chunks called blocks. Database servers depend on this
type of communication (as opposed to the file level communication used
by most NAS systems) in order to work properly.
Like a FC SAN, an iSCSI SAN should
be a separate physical network devoted entirely to storage, however,
its components can be much the same as in a typical IP network (LAN).
While iSCSI has a promising future, many of its early critics were quick
to point out some of its inherent shortcomings with regards to performance.
The beauty of iSCSI is its ability to utilize an already familiar IP network
as its transport mechanism. The TCP/IP protocol, however, is very
complex and CPU intensive. With iSCSI, most of the processing of the data
(both TCP and iSCSI) is handled in software and is much slower than Fibre Channel
which is handled completely in hardware. The overhead incurred in mapping
every SCSI command onto an equivalent iSCSI transaction is excessive. For many
the solution is to do away with iSCSI software initiators and invest in
specialized cards that can offload TCP/IP and iSCSI processing from a server's
CPU. These specialized cards are sometimes referred to as an iSCSI Host Bus Adaptor (HBA)
or a TCP Offload Engine (TOE) card. Also consider that 10-Gigabit Ethernet is a
reality today!
As with any new technology, iSCSI comes with its own set of acronyms and terminology.
For the purpose of this article, it is only important to understand the difference
between an iSCSI initiator and an iSCSI target.
iSCSI Initiator
Basically, an iSCSI initiator is a
client device that connects and initiates requests to some service offered
by a server (in this case an iSCSI target). The iSCSI initiator software will
need to exist on each of the Oracle RAC nodes (linux1 and linux2).
An iSCSI initiator can be implemented using either software or hardware.
Software iSCSI initiators are available for most major operating system platforms.
For this article, we will be using the free Linux iscsi-sfnet software driver found
in the iscsi-initiator-utils RPM developed as part of the Linux-iSCSI Project.
The iSCSI software initiator is generally used with a standard
network interface card (NIC) a Gigabit Ethernet card in most cases.
A hardware initiator is an iSCSI HBA (or a TCP Offload Engine (TOE) card), which is
basically just a specialized Ethernet card with a SCSI ASIC on-board to offload
all the work (TCP and SCSI commands) from the system CPU. iSCSI HBAs
are available from a number of vendors, including Adaptec, Alacritech, Intel, and QLogic.
iSCSI Target
An iSCSI target is the "server" component of an iSCSI network.
This is typically the storage device that contains the information
you want and answers requests from the initiator(s). For the purpose
of this article, the node openfiler1 will be the iSCSI target.
So with all of this talk about iSCSI, does this mean the death of Fibre Channel anytime soon? Probably not.
Fibre Channel has clearly demonstrated its capabilities over the years with its capacity for
extremely high speeds, flexibility, and robust reliability.
Customers who have strict requirements for high performance storage, large complex connectivity,
and mission critical reliability will undoubtedly continue to choose Fibre Channel.
Before closing out this section, I thought it would be appropriate to
present the following chart that shows speed comparisons of the various types of
disk interfaces and network technologies.
For each interface, I provide the maximum transfer rates in kilobits (kb), kilobytes (KB),
megabits (Mb), megabytes (MB), gigabits (Gb), and gigabytes (GB) per second with some of the more
common ones highlighted in grey.
| Disk Interface / Network / BUS |
Speed |
| Kb |
KB |
Mb |
MB |
Gb |
GB |
| Serial |
115 |
14.375 |
0.115 |
0.014 |
|
|
| Parallel (standard) |
920 |
115 |
0.92 |
0.115 |
|
|
| 10Base-T Ethernet |
|
|
10 |
1.25 |
|
|
| IEEE 802.11b wireless Wi-Fi (2.4 GHz band) |
|
|
11 |
1.375 |
|
|
| USB 1.1 |
|
|
12 |
1.5 |
|
|
| Parallel (ECP/EPP) |
|
|
24 |
3 |
|
|
| SCSI-1 |
|
|
40 |
5 |
|
|
| IEEE 802.11g wireless WLAN (2.4 GHz band) |
|
|
54 |
6.75 |
|
|
| SCSI-2 (Fast SCSI / Fast Narrow SCSI) |
|
|
80 |
10 |
|
|
| 100Base-T Ethernet (Fast Ethernet) |
|
|
100 |
12.5 |
|
|
| ATA/100 (parallel) |
|
|
100 |
12.5 |
|
|
| IDE |
|
|
133.6 |
16.7 |
|
|
| Fast Wide SCSI (Wide SCSI) |
|
|
160 |
20 |
|
|
| Ultra SCSI (SCSI-3 / Fast-20 / Ultra Narrow) |
|
|
160 |
20 |
|
|
| Ultra IDE |
|
|
264 |
33 |
|
|
| Wide Ultra SCSI (Fast Wide 20) |
|
|
320 |
40 |
|
|
| Ultra2 SCSI |
|
|
320 |
40 |
|
|
| FireWire 400 - (IEEE1394a) |
|
|
400 |
50 |
|
|
| USB 2.0 |
|
|
480 |
60 |
|
|
| Wide Ultra2 SCSI |
|
|
640 |
80 |
|
|
| Ultra3 SCSI |
|
|
640 |
80 |
|
|
| FireWire 800 - (IEEE1394b) |
|
|
800 |
100 |
|
|
| Gigabit Ethernet |
|
|
1000 |
125 |
1 |
|
| PCI - (33 MHz / 32-bit) |
|
|
1064 |
133 |
1.064 |
|
| Serial ATA I - (SATA I) |
|
|
1200 |
150 |
1.2 |
|
| Wide Ultra3 SCSI |
|
|
1280 |
160 |
1.28 |
|
| Ultra160 SCSI |
|
|
1280 |
160 |
1.28 |
|
| PCI - (33 MHz / 64-bit) |
|
|
2128 |
266 |
2.128 |
|
| PCI - (66 MHz / 32-bit) |
|
|
2128 |
266 |
2.128 |
|
| AGP 1x - (66 MHz / 32-bit) |
|
|
2128 |
266 |
2.128 |
|
| Serial ATA II - (SATA II) |
|
|
2400 |
300 |
2.4 |
|
| Ultra320 SCSI |
|
|
2560 |
320 |
2.56 |
|
| FC-AL Fibre Channel |
|
|
3200 |
400 |
3.2 |
|
| PCI-Express x1 - (bidirectional) |
|
|
4000 |
500 |
4 |
|
| PCI - (66 MHz / 64-bit) |
|
|
4256 |
532 |
4.256 |
|
| AGP 2x - (133 MHz / 32-bit) |
|
|
4264 |
533 |
4.264 |
|
| Serial ATA III - (SATA III) |
|
|
4800 |
600 |
4.8 |
|
| PCI-X - (100 MHz / 64-bit) |
|
|
6400 |
800 |
6.4 |
|
| PCI-X - (133 MHz / 64-bit) |
|
|
|
1064 |
8.512 |
1 |
| AGP 4x - (266 MHz / 32-bit) |
|
|
|
1066 |
8.528 |
1 |
| 10G Ethernet - (IEEE 802.3ae) |
|
|
|
1250 |
10 |
1.25 |
| PCI-Express x4 - (bidirectional) |
|
|
|
2000 |
16 |
2 |
| AGP 8x - (533 MHz / 32-bit) |
|
|
|
2133 |
17.064 |
2.1 |
| PCI-Express x8 - (bidirectional) |
|
|
|
4000 |
32 |
4 |
| PCI-Express x16 - (bidirectional) |
|
|
|
8000 |
64 |
8 |
5.
Hardware & Costs
The hardware used to build our example Oracle RAC 10g environment
consists of three Linux servers (two Oracle RAC nodes and one Network Storage Server)
and components that can be purchased at many local computer stores or over the Internet.
| Oracle RAC Node 1 - (linux1) |
Dimension 2400 Series
Intel(R) Pentium(R) 4 Processor at 2.80GHz
1GB DDR SDRAM (at 333MHz)
40GB 7200 RPM Internal Hard Drive
Integrated Intel 3D AGP Graphics
Integrated 10/100 Ethernet - (Broadcom BCM4401)
CDROM (48X Max Variable)
3.5" Floppy
No Keyboard, Monitor, or Mouse - (Connected to KVM Switch)
|
US$620 |
| 1 - Ethernet LAN Card
Used for RAC interconnect to linux2 and
Openfiler networked storage.
Each Linux server for Oracle RAC should contain two NIC adapters.
The Dell Dimension includes an integrated 10/100 Ethernet adapter
that will be used to connect to the public network. The second NIC adapter
will be used for the private network (RAC interconnect and
Openfiler networked storage). Select the appropriate NIC adapter
that is compatible with the maximum data transmission speed of the
network switch to be used for the private network.
For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards)
for the private network.
Gigabit Ethernet
10/100 Ethernet
|
US$35 |
| Oracle RAC Node 2 - (linux2) |
Dimension 2400 Series
Intel(R) Pentium(R) 4 Processor at 2.80GHz
1GB DDR SDRAM (at 333MHz)
40GB 7200 RPM Internal Hard Drive
Integrated Intel 3D AGP Graphics
Integrated 10/100 Ethernet - (Broadcom BCM4401)
CDROM (48X Max Variable)
3.5" Floppy
No Keyboard, Monitor, or Mouse - (Connected to KVM Switch)
|
US$620 |
| 1 - Ethernet LAN Card
Used for RAC interconnect to linux1 and
Openfiler networked storage.
Each Linux server for Oracle RAC should contain two NIC adapters.
The Dell Dimension includes an integrated 10/100 Ethernet adapter
that
will be used to connect to the public network. The second NIC adapter
will be used for the private network (RAC interconnect and
Openfiler networked storage). Select the appropriate NIC adapter
that is compatible with the maximum data transmission speed of the
network switch to be used for the private network.
For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards)
for the private network.
Gigabit Ethernet
10/100 Ethernet
|
US$35 |
| Network Storage Server - (openfiler1) |
Clone / Pentium 4
Intel(R) Pentium(R) 4 CPU 1.80GHz
1GB DDR SDRAM (at 333MHz)
40GB 7200 RPM Internal Hard Drive
NVIDIA GeForce FX 5200 / AGP Graphics
Integrated 10/100 Ethernet - (Realtek Semiconductor, RTL-8139/8139C/8139C+ Series)
4 x Integrated USB 2.0 Ports
CDROM (48X Max Variable)
3.5" Floppy
No Keyboard, Monitor, or Mouse - (Connected to KVM Switch)
|
US$500 |
| 1 - Ethernet LAN Card
Used for networked storage on the private network.
The Network Storage Server (Openfiler server) should contain two NIC adapters.
The Clone / Pentium 4 machine included an integrated 10/100 Ethernet adapter that
will be used to connect to the public network. The second NIC adapter
will be used for the private network (Openfiler networked storage).
Select the appropriate NIC adapter that is compatible with the maximum data
transmission speed of the network switch to be used for the private network.
For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards)
for the private network.
Gigabit Ethernet
10/100 Ethernet
|
US$35 |
| Miscellaneous Components |
|
Storage Device(s) - External Hard Drive
For the database storage I used a single external LaCie d2 Hard Drive Extreme with Triple Interface (500GB) drive which was
connected to the Openfiler server via its USB 2.0 interface. The Openfiler server
will be configured to use this disk for iSCSI based storage and will be used in our
Oracle RAC 10g configuration to store the shared files required by Oracle Clusterware
as well as all Oracle ASM volumes.
Note: Since the writing of this article, LaCie has
discontinued the 500GB version of this external hard drive and only the 250GB and 320GB capacities exist.
Please be aware that any
type of hard disk (internal or external) should work for database storage as long as it can be recognized
by the network storage server (Openfiler) and has adequate space.
|
US$260 |
| 1 - Ethernet Switch
Used for the interconnect between linux1-priv and linux2-priv.
This switch will also be used for network storage traffic for
Openfiler. For the purpose of this article, I used
a Gigabit Ethernet switch (and 1Gb Ethernet cards) for the private
network.
Gigabit Ethernet
10/100 Ethernet
|
US$50 |
| 6 - Network Cables
|
US$5 US$5 US$5 US$5 US$5 US$5 |
| Optional Components |
|
KVM Switch
This article requires access to the console of all nodes (servers) in order to install the
operating system and perform several of the configuration tasks.
When managing a very small number of servers, it might make sense to connect each server
with its own monitor, keyboard, and mouse in order to access its console. However, as
the number of servers to manage increases, this solution becomes unfeasible. A more
practical solution would be to configure a dedicated computer which would include a single
monitor, keyboard, and mouse that would have direct access to the console of each server.
This solution is made possible using a Keyboard, Video, Mouse Switch better known as
a KVM Switch. A KVM switch is a hardware device that allows a user to control multiple
computers from a single keyboard, video monitor and mouse. Avocent provides a high quality
and economical 4-port switch which includes four 6' cables:
For a detailed explanation and guide on the use and KVM switches, please see the article
"KVM Switches For the Home and the Enterprise".
|
US$340 |
| Total |
US$2,525 |
We are about to start the installation process.
Now that we have talked about the hardware that will be used in this example,
let's take a conceptual look at what the environment would look like
(click on the graphic below to view larger image):

Figure 1 Architecture
As we start to go into the details of the
installation, it should be noted that most of the tasks within this
document will need to be performed on both Oracle RAC nodes (linux1 and linux2).
I will indicate at the beginning of each section whether or not the task(s)
should be performed on both Oracle RAC nodes or on the network storage
server (openfiler1).
6.
Install the Linux Operating System
Perform the following installation
on both Oracle RAC nodes in the cluster!
This section provides a
summary of the screens used to install the Linux operating system. This
guide is designed to work with Oracle's Enterprise Linux Release 4 Update 5.
For more detailed installation
instructions, it is possible to use the manuals
from Red Hat Linux. I would suggest, however, that the instructions I
have provided below be used for this configuration.
Before installing the Enterprise Linux operating system on
both Oracle RAC nodes, you should have the two NIC interfaces (cards) installed.
Download the following ISO images for
Enterprise Linux Release 4 Update 5:
Oracle E-Delivery Web site for Enterprise Linux
- V10378-01_1of4.zip
(572 MB)
- V10378-01_2of4.zip
(619 MB)
- V10378-01_3of4.zip
(621 MB)
- V10378-01_4of4.zip
(269 MB)
After downloading the Enterprise Linux software,
unzip each of the files. You will then have the following ISO images which
will need to be burned to CDs:
- Enterprise-R4-U5-i386-disc1.iso
- Enterprise-R4-U5-i386-disc2.iso
- Enterprise-R4-U5-i386-disc3.iso
- Enterprise-R4-U5-i386-disc4.iso
If you are downloading the above ISO files to a
MS Windows machine, there are many options for burning these images
(ISO files) to a CD. You may already be familiar with and have the proper
software to burn images to CD. If you are not familiar with this process and
do not have the required software to burn images to CD, here are just two
(of many) software packages that can be used:
After downloading
and burning the Enterprise Linux images (ISO files) to CD, insert Enterprise Linux Disk #1
into the first server (linux1 in this example),
power it on, and answer the installation screen prompts as noted below.
After completing the Linux installation on the first node, perform the
same Linux installation on the second node while substituting the node
name linux1 for linux2 and
the different IP addresses where appropriate.
Boot Screen
The first screen is the Enterprise Linux boot screen. At the
boot: prompt, hit [Enter] to start the installation process.
Media Test
When asked to test the CD media, tab over to [Skip] and hit [Enter]. If
there were any errors, the media burning software would have warned us.
After several seconds, the installer should then detect the video card,
monitor, and mouse. The installer then goes into GUI mode.
Welcome to Enterprise Linux
At the welcome screen, click [Next] to continue.
Language / Keyboard Selection
The next two screens prompt you for the Language and Keyboard settings.
Make the appropriate selections for your configuration.
Detect Previous Installation
Note that if the installer detects a previous version of Enterprise Linux,
it will ask if you would like to "Install Enterprise Linux" or "Upgrade an
existing Installation". Always select to "Install Enterprise Linux".
Disk Partitioning Setup
Select [Automatically partition] and click [Next] continue.
If there were a previous installation
of Linux on this machine, the next screen will ask if you want to
"remove" or "keep" old partitions. Select the option to [Remove all
partitions on this system]. Also, ensure that the correct hard drive
("hda" for my configuration) is
selected for the Linux installation. I also keep the checkbox [Review (and
modify if needed) the partitions created] selected. Click [Next] to
continue.
You will then be prompted with a
dialog window asking if you really want to remove all partitions. Click
[Yes] to acknowledge this warning.
Partitioning
The installer will then allow you to view (and modify if needed) the
disk partitions it automatically selected. In almost all cases, the
installer will choose 100MB for /boot, double the amount of RAM for
swap, and the rest going to the root (/) partition. I like to have a
minimum of 1GB for swap. For the purpose of this install, I will accept
all automatically preferred sizes. (Including 2GB for swap since I have
1GB of RAM installed.)
Starting with EL 4, the installer
will create the same disk configuration as just noted but will create
them using the Logical Volume Manager (LVM). For example, it will
partition the first hard drive (/dev/hda for my configuration) into two
partitions—one for the /boot partition (/dev/hda1) and the
remainder of the disk dedicate to a LVM named VolGroup00 (/dev/hda2).
The LVM Volume Group (VolGroup00) is then partitioned into two LVM
partitions - one for the root filesystem (/) and another for swap. I
basically check that it created at least 1GB of swap. Since I have 1GB
of RAM installed, the installer created 2GB of swap. Saying that, I
just accept the default disk layout.
Boot Loader Configuration
The installer will use the GRUB boot loader by default. To use the GRUB
boot loader, accept all default values and click [Next] to continue.
Network Configuration
I made sure to install both NIC interfaces (cards) in each of the Linux
machines before starting the operating system installation. This screen
should have successfully detected each of the network devices.
First, make sure that each of the
network devices are checked to [Active on boot]. The installer may
choose to not activate eth1.
Second, [Edit] both eth0 and eth1 as
follows. You may choose to use different IP addresses for both eth0 and
eth1 and that is OK. If possible, try to put eth1 (the interconnect) on
a different subnet than eth0 (the public network):
eth0:
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.1.100
- Netmask: 255.255.255.0
eth1:
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.2.100
- Netmask: 255.255.255.0
Continue by setting your hostname
manually. I used "linux1" for the first node and "linux2" for the
second one. Finish this dialog off by supplying your gateway and DNS
servers.
Firewall
On this screen, make sure to select [No firewall]
Also, under the option to "Enable SELinux?", select [Disabled].
Click [Next] to continue.
You may be prompted with a warning dialog about not setting
the firewall. If this occurs, simply hit [Proceed] to continue.
Additional Language Support/Time Zone
The next two screens allow you to select additional language support
and time zone information. Make the appropriate selection for your configuration.
Set Root Password
Select a root password and click [Next] to continue.
Package Installation Defaults
By default, Enterprise Linux installs most of the software required
for a typical server. There are several other packages, however, that
are required to successfully install the Oracle Database software.
For the purpose of this article, select the radio button [Customize software packages to be installed].
Package Group Selection
Scroll down to the bottom of this screen and select [Everything] under
the "Miscellaneous" section. Click [Next] to continue.
Please note that the installation of Oracle does not require
all Linux packages to be installed. My decision to install all packages
was for the sake of brevity. Please see section Section 19
("Pre-Installation Tasks for Oracle10g Release 2") for
a more detailed look at the critical packages required for a successful
Oracle installation.
Also note that with some Oracle Enterprise Linux 4 distributions, you will not get the "Package Group Selection"
screen by default. There, you are asked to simply "Install default
software packages" or "Customize software packages to be installed".
Select the option to "Customize software packages to be installed"
and click [Next] to continue. This will then bring up the
"Package Group Selection" screen. Now, scroll down to the bottom of this screen and select
[Everything] under the "Miscellaneous" section. Click
[Next] to continue.
About to Install
This screen is basically a confirmation screen. Click [Continue] to start
the installation. During the installation process, you will be asked to
switch disks to Disk #2, Disk #3, and then Disk #4.
Congratulations
And that's it. You have successfully installed Enterprise Linux
on the first node (linux1). The installer will eject the CD from the
CD-ROM drive. Take out the CD and click [Reboot] to reboot the system.
When the system boots into Enterprise Linux for
the first time, it will prompt you with another Welcome screen. The
following wizard allows you to configure the date and time, add any
additional users, test the sound card, and to install any additional
CDs. The only screen I care about is the time and date. As for the others,
simply run through them as there is nothing additional that needs to be
installed (at this point anyways!). If everything was successful, you
should now be presented with the Enterprise Linux login screen.
Perform the
same installation on the second node
After completing the Linux installation on the first node, repeat the
above steps for the second node (linux2). When configuring the machine
name and networking, ensure to configure the proper values. For my
installation, this is what I configured for linux2:
First, make sure that each of the
network devices are checked to [Active on boot]. The installer will
choose not to activate eth1.
Second, [Edit] both eth0 and eth1 as
follows:
eth0:
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.1.101
- Netmask: 255.255.255.0
eth1:
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.2.101
- Netmask: 255.255.255.0
Continue by setting your hostname
manually. I used "linux2" for the second node. Finish this dialog off
by supplying your gateway and DNS servers.
7.
Network Configuration
Perform the following network configuration
on both Oracle RAC nodes in the cluster!
Note:
Although we configured several of the network settings during the Linux
installation, it is important to not skip this
section as it contains critical steps that are required for the RAC
environment.
Introduction to Network Settings
During the Linux O/S install we already configured the IP address and
host name for both of the Oracle RAC nodes.
We now need to configure
the /etc/hosts file as well as adjusting several of the
network settings for the interconnect.
Both of the Oracle RAC nodes should have one static IP address for the public network
and one static IP address for the private cluster interconnect. Do not use DHCP naming for the
public IP address or the interconnects; you need static IP addresses! The
private interconnect should only be used by Oracle to transfer
Cluster Manager and Cache Fusion related data along with data for the
network storage server (Openfiler). Although it is possible
to use the public network for the interconnect, this not recommended as
it may cause degraded database performance (reducing the amount of bandwidth
for Cache Fusion and Cluster Manager traffic). For a production RAC implementation,
the interconnect should be at least gigabit (or more) and only be used by Oracle
as well as having the network storage server (Openfiler) on a separate gigabit network.
Configuring Public and Private Network
In our two node example, we need to configure the network on both Oracle RAC nodes
for access to the public network as well as their private interconnect.
The easiest way to configure network settings in Oracle Enterprise Linux is with the program
"Network Configuration". This application can be started from the command-line
as the "root" user account as follows:
# su - # /usr/bin/system-config-network &
Using the Network Configuration
application, you need to configure both NIC devices as well as the /etc/hosts
file. Both of these tasks can be completed using the Network
Configuration GUI. Notice that the /etc/hosts
settings are the same for both nodes.
Our example
configuration will use the following settings:
| Oracle RAC Node 1 - (linux1) |
| Device |
IP Address |
Subnet |
Gateway |
Purpose |
| eth0 |
192.168.1.100 |
255.255.255.0 |
192.168.1.1 |
Connects linux1 to the public network |
| eth1 |
192.168.2.100 |
255.255.255.0 |
|
Connects linux1 (interconnect) to linux2 (linux2-priv) |
| /etc/hosts |
127.0.0.1 localhost.localdomain localhost
# Public Network - (eth0)
192.168.1.100 linux1
192.168.1.101 linux2
# Private Interconnect - (eth1)
192.168.2.100 linux1-priv
192.168.2.101 linux2-priv
# Public Virtual IP (VIP) addresses - (eth0)
192.168.1.200 linux1-vip
192.168.1.201 linux2-vip
# Private Storage Network for Openfiler - (eth1)
192.168.1.195 openfiler1
192.168.2.195 openfiler1-priv
|
| Oracle RAC Node 2 - (linux2) |
| Device |
IP Address |
Subnet |
Gateway |
Purpose |
| eth0 |
192.168.1.101 |
255.255.255.0 |
192.168.1.1 |
Connects linux2 to the public network |
| eth1 |
192.168.2.101 |
255.255.255.0 |
|
Connects linux2 (interconnect) to linux1 (linux1-priv) |
| /etc/hosts |
127.0.0.1 localhost.localdomain localhost
# Public Network - (eth0)
192.168.1.100 linux1
192.168.1.101 linux2
# Private Interconnect - (eth1)
192.168.2.100 linux1-priv
192.168.2.101 linux2-priv
# Public Virtual IP (VIP) addresses - (eth0)
192.168.1.200 linux1-vip
192.168.1.201 linux2-vip
# Private Storage Network for Openfiler - (eth1)
192.168.1.195 openfiler1
192.168.2.195 openfiler1-priv
|
Note that the virtual IP addresses
only need to be defined in the /etc/hosts file
(or your DNS) for both Oracle RAC nodes. The public virtual IP addresses will be
configured automatically by Oracle when you run the Oracle Universal
Installer, which starts Oracle's Virtual Internet Protocol
Configuration Assistant (VIPCA). All virtual IP addresses will be
activated when the srvctl start nodeapps -n
<node_name> command is run. This is the Host
Name/IP Address that will be configured in the client(s) tnsnames.ora
file (more details later).
In the screenshots below, only Oracle RAC Node 1
(linux1) is shown. Be sure to make all the proper network settings to
both Oracle RAC nodes.
Figure 2
Network Configuration Screen, Node 1 (linux1)
Figure 3
Ethernet Device Screen, eth0 (linux1)
Figure 4
Ethernet Device Screen, eth1 (linux1)
Figure 5:
Network Configuration Screen, /etc/hosts (linux1)
Once the network is configured, you
can use the ifconfig command to verify everything
is working. The following example is from linux1:
# /sbin/ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:14:6C:76:5C:71
inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::214:6cff:fe76:5c71/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1546 errors:0 dropped:0 overruns:0 frame:0
TX packets:1273 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1179157 (1.1 MiB) TX bytes:183011 (178.7 KiB)
Interrupt:169 Base address:0x2f00
eth1 Link encap:Ethernet HWaddr 00:0E:0C:64:D1:E5
inet addr:192.168.2.100 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::20e:cff:fe64:d1e5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:782 (782.0 b)
Base address:0xddc0 Memory:fe9c0000-fe9e0000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:4893 errors:0 dropped:0 overruns:0 frame:0
TX packets:4893 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6521518 (6.2 MiB) TX bytes:6521518 (6.2 MiB)
sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
About Virtual IP
Why is there a Virtual IP (VIP) in 10g?
Why does it just return a dead connection when its primary node fails?
It's all about availability of the
application. When a node fails, the VIP associated with it is supposed
to be automatically failed over to some other node. When this occurs,
two things happen.
- The new node re-arps the world
indicating a new MAC address for the address. For directly connected
clients, this usually causes them to see errors on their connections to
the old address.
- Subsequent packets sent to the VIP
go to the new node, which will send error RST packets back to the
clients. This results in the clients getting errors immediately.
This means that when the client
issues SQL to the node that is now down, or traverses the address list
while connecting, rather than waiting on a very long TCP/IP time-out
(~10 minutes), the client receives a TCP reset. In the case of SQL,
this is ORA-3113. In the case of connect, the
next address in tnsnames is used.
Going one step fu |