| Page 1 Page 2 Page 3
Build Your Own
Oracle RAC Cluster on Oracle Enterprise Linux and
iSCSI (Continued)
The information in this guide is not validated by Oracle, is not supported by Oracle, and should only be used at your own risk; it is for educational purposes only.
21.
Install Oracle 10g Clusterware Software
Perform the following installation procedures from only one of the Oracle RAC nodes in the cluster (linux1)!
The Oracle Clusterware software will be installed to both of the Oracle RAC nodes in the
cluster by the Oracle Universal Installer.
You are now ready to install the "cluster" part of the
environment: the Oracle Clusterware. In the previous section, you
downloaded and extracted the install files for Oracle Clusterware to linux1
in the directory /home/oracle/orainstall/clusterware.
This is the only node from which you need to perform the install from.
During the installation of Oracle Clusterware, you will
be asked for the nodes involved and to configure in the RAC cluster.
Once the actual installation starts, it will copy the required software
to all nodes using the remote access we configured in the section
Section 15 ("Configure RAC Nodes for Remote Access using SSH").
So, what exactly is the Oracle Clusterware responsible
for? It contains all of the cluster and database configuration metadata
along with several system management features for RAC. It allows the
DBA to register and invite an Oracle instance (or instances) to the
cluster. During normal operation, Oracle Clusterware will send messages
(via a special ping operation) to all nodes configured in the cluster,
often called the "heartbeat." If the heartbeat fails for any of the
nodes, it checks with the Oracle Clusterware configuration files (on
the shared disk) to distinguish between a real node failure and a
network failure.
After installing Oracle Clusterware, the Oracle
Universal Installer (OUI) used to install the Oracle Database
software (next section) will automatically recognize these
nodes. Like the Oracle Clusterware install you will be performing in
this section, the Oracle Database software only
needs to be run from one node. The OUI will copy the software packages
to all nodes configured in the RAC cluster.
Oracle Clusterware Shared Files
The two shared files (actually file groups) used by
Oracle Clusterware will be stored on the Oracle Cluster File System,
Release 2 (OFCS2) we created earlier. The two shared Oracle Clusterware
file groups are:
- Oracle Cluster Registry (OCR)
- File 1 : /u02/oradata/racdb/OCRFile
- File 2 : /u02/oradata/racdb/OCRFile_mirror
- Size : (2 * 100MB) = 200M
- CRS Voting Disk
- File 1 : /u02/oradata/racdb/CSSFile
- File 2 : /u02/oradata/racdb/CSSFile_mirror1
- File 3 : /u02/oradata/racdb/CSSFile_mirror2
- Size : (3 * 20MB) = 60MB
Note: It is not possible
to use Automatic Storage Management (ASM) for the two shared Oracle
Clusterware files: Oracle Cluster Registry (OCR) or
the CRS Voting Disk files. The problem is that
these files need to be in place and accessible BEFORE any Oracle
instances can be started. For ASM to be available, the ASM instance
would need to be run first.
Also note that the two shared files could be stored on
the OCFS2, shared RAW devices, or another vendor's clustered file
system.
Verifying Terminal Shell Environment
Before starting the Oracle Universal Installer, you should first verify
you are logged onto the server you will be running the installer from
(i.e. linux1) then run the xhost
command as root from the console to allow X
Server connections. Next, login as the oracle
user account. If you are using a remote client to connect to the node
performing the installation (SSH or Telnet to linux1
from a workstation configured with an X Server), you will need to set
the DISPLAY variable to point to your local
workstation. Finally, verify remote access / user equivalence to all
nodes in the cluster:
Verify Server and Enable X Server Access
# hostname
linux1
# xhost +
access control disabled, clients can connect from any host
Login as the oracle
User Account and Set DISPLAY (if necessary)
# su - oracle
$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$ DISPLAY=<your local workstation>:0.0
$ export DISPLAY
Verify Remote Access / User Equivalence
Verify you are able to run the
Secure Shell
commands (ssh or scp)
on the Linux server you will be running the Oracle Universal Installer
from against all other Linux servers in the cluster without being
prompted for a password. When using the
secure shell method,
user equivalence
will need to be enabled for the terminal shell session before attempting
to run the OUI. To enable
user equivalence for the current terminal shell session, perform the
following steps remembering to enter the pass phrase for the RSA key
you generated when prompted:
$ exec /usr/bin/ssh-agent $SHELL
$ /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa: xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)
$ ssh linux1 "date;hostname"
Fri Jul 31 22:28:43 EDT 2009
linux1
$ ssh linux2 "date;hostname"
Fri Jul 31 22:29:00 EDT 2009
linux2
Installing Cluster Ready Services
Perform the following tasks to install the Oracle Clusterware:
$ cd ~oracle
$ /home/oracle/orainstall/clusterware/runInstaller -ignoreSysPrereqs
| Screen Name |
Response |
| Welcome Screen |
Click Next |
| Specify Inventory directory and credentials |
Accept the default values:
Inventory directory:
/u01/app/oracle/oraInventory
Operating System group name:
oinstall |
| Specify Home Details |
Set the Name and Path
for the ORACLE_HOME (actually the $ORA_CRS_HOME that I will be using in
this article) as follows:
Name: OraCrs10g_home
Path: /u01/app/crs |
| Product-Specific Prerequisite Checks |
The installer will run through a series of
checks to determine if the node meets the minimum requirements for
installing and configuring the Oracle Clusterware software. If any of
the checks fail, you will need to manually verify the check that failed
by clicking on the checkbox. For my installation, all checks passed
with no problems.
Click Next to continue. |
| Specify Cluster Configuration |
Cluster Name: crs
| Public Node Name |
Private Node Name |
Virtual Node Name |
| linux1 |
linux1-priv |
linux1-vip |
| linux2 |
linux2-priv |
linux2-vip |
|
| Specify Network Interface Usage |
| Interface Name |
Subnet |
Interface Type |
| eth0 |
192.168.1.0 |
Public |
| eth1 |
192.168.2.0 |
Private |
|
| Specify OCR Location |
Starting with Oracle Database 10g
Release 2 (10.2) with RAC, Oracle Clusterware provides for the creation
of a mirrored OCR file, enhancing cluster reliability. For the purpose
of this example, I did choose to mirror the OCR file by keeping the
default option of "Normal Redundancy":
Specify OCR Location: /u02/oradata/racdb/OCRFile
Specify OCR Mirror Location: /u02/oradata/racdb/OCRFile_mirror |
| Specify Voting Disk Location |
Starting with Oracle Database 10g
Release 2 (10.2) with RAC, CSS has been modified to allow you to
configure CSS with multiple voting disks. In Release 1 (10.1), you
could configure only one voting disk. By enabling multiple voting disk
configuration, the redundant voting disks allow you to configure a RAC
database with multiple voting disks on independent shared physical
disks. This option facilitates the use of the iSCSI network protocol,
and other Network Attached Storage (NAS) storage solutions. Note that
to take advantage of the benefits of multiple voting disks, you must
configure at least three voting disks. For the purpose of this example,
I did choose to mirror the voting disk by keeping the default option of
"Normal Redundancy":
Voting Disk Location: /u02/oradata/racdb/CSSFile
Additional Voting Disk 1 Location: /u02/oradata/racdb/CSSFile_mirror1
Additional Voting Disk 2 Location: /u02/oradata/racdb/CSSFile_mirror2 |
| Summary |
Click Install to start the installation! |
| Execute Configuration Scripts |
orainstRoot.sh
After the installation has completed, you will be prompted to run the orainstRoot.sh
and root.sh script. Open a new console window on
both Oracle RAC nodes in the cluster, (starting with the node you are
performing the install from), as the "root" user account.
Navigate to the /u01/app/oracle/oraInventory directory and
run orainstRoot.sh on all nodes in the RAC cluster.
Note: After executing the orainstRoot.sh on
both nodes, verify the permissions of the file /etc/oraInst.loc are
644 (-rw-r--r--)
and owned by root. Problems can occur during the installation
of Oracle if the oracle user account does not have read permissions to this file -
(the location of the oraInventory directory cannot be determined). For example, during the
Oracle Clusterware post-installation process (while running the Oracle Clusterware
Verification Utility), the following error will occur:
"CRS is not installed on any of the nodes."
If the permissions to
/etc/oraInst.loc are not set correctly, it is possible you didn't run orainstRoot.sh
on both nodes before running root.sh. Also, the umask setting may be off - it should be
0022.
Run the following on both nodes in the RAC cluster to correct this problem:
# chmod 644 /etc/oraInst.loc
# ls -l /etc/oraInst.loc
-rw-r--r-- 1 root root 63 Aug 1 12:31 /etc/oraInst.loc
root.sh
Within the same new console window on both Oracle RAC nodes
in the cluster, (starting with the node you are performing the
install from), stay logged in as the "root" user account.
Navigate to the /u01/app/crs
directory and locate the root.sh file for each
node in the cluster - (starting with the node you are performing the
install from). Run the root.sh file ON
ALL NODES in the RAC cluster ONE AT A TIME.
If the Oracle Clusterware home directory is a sub-directory of the ORACLE_BASE directory (which should never be!),
you will receive several warnings regarding permission while running the root.sh script
on both nodes. These warnings can be safely ignored.
The root.sh may take awhile
to run.
Running root.sh on the Last Node will Fail!
After receiving any of the errors described in this section, please
leave the OUI up. Do not hit the OK button on the
"Execute Configuration Scripts" dialog until all of the
issues described in this section have been resolved.
Issue #1
At the end of root.sh on the last node, vipca will fail
to run with the following error:
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
/u01/app/crs/jdk/jre//bin/java: error while loading
shared libraries: libpthread.so.0:
cannot open shared object file: No such file or directory
Note that srvctl will produce similar output
until the workaround described below is performed.
This error occurs because these releases of the Linux kernel
fix an old bug in the Linux threading that Oracle worked around
using LD_ASSUME_KERNEL settings in both vipca
and srvctl, this workaround is no longer valid on
OEL5 or RHEL5 or SLES10 hence the failures.
To workaround Issue #1 above, edit vipca (in the CRS bin directory
on all nodes) to undo the setting of LD_ASSUME_KERNEL.
After the IF statement around line 120, add an unset
command to ensure LD_ASSUME_KERNEL is not set as follows:
if [ "$arch" = "i686" -o "$arch" = "ia64" ]
then
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
fi
unset LD_ASSUME_KERNEL <<== Line to be added
|
Similarly for srvctl (in both the CRS and,
when installed, RDBMS and ASM bin directories on all nodes),
unset LD_ASSUME_KERNEL by adding one line, around
line 168 should look like this:
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
unset LD_ASSUME_KERNEL <<== Line to be added
|
Note: Remember to re-edit these files on all nodes:
<ORA_CRS_HOME>/bin/vipca
<ORA_CRS_HOME>/bin/srvctl
<RDBMS_HOME>/bin/srvctl
<ASM_HOME>/bin/srvctl # (If exists)
after applying the 10.2.0.2 or 10.2.0.3 patch sets, as these patchset will
still include those settings unnecessary for OEL5 or RHEL5 or SLES10.
This issue was raised with development and is fixed in the
10.2.0.4 patchsets.
Also note that we are explicitly unsetting LD_ASSUME_KERNEL
and not merely commenting out its setting to handle a case where
the user has it set in their environment (login shell).
Issue #2
After working around Issue #1 above, vipca will now fail
to run with the following error if the VIP IP's are in a non-routable
range [10.x.x.x, 172.(16-31).x.x, or 192.168.x.x]:
[root@linux2 ~]# $ORA_CRS_HOME/bin/vipca
Error 0(Native: listNetInterfaces:[3])
[Error 0(Native: listNetInterfaces:[3])]
There are several ways to workaround Issue #2. The goal to this workaround
is to get the output of "$ORA_CRS_HOME/bin/oifcfg getif" to include both
public and cluster_interconnect interfaces. If you try to run the above
command, you will notice it provides nothing which means we have some work
to do!
The first step is to identify the current interfaces and IP addresses:
[root@linux2 ~]# $ORA_CRS_HOME/bin/oifcfg iflist
eth1 192.168.2.0
eth0 192.168.1.0
Remember during the Oracle Clusterware install that
192.168.1.0 is my public interface while
192.168.2.0 is the cluster_interconnect interface.
Using this information, we can manually set the public / private
interfaces accordingly using the setif option of the
$ORA_CRS_HOME/bin/oifcfg command:
# $ORA_CRS_HOME/bin/oifcfg setif -global eth0/192.168.1.0:public
# $ORA_CRS_HOME/bin/oifcfg setif -global eth1/192.168.2.0:cluster_interconnect
Let's know run the "$ORA_CRS_HOME/bin/oifcfg getif" command again
to verify its output:
[root@linux2 ~]# $ORA_CRS_HOME/bin/oifcfg getif
eth0 192.168.1.0 global public
eth1 192.168.2.0 global cluster_interconnect
Manually Run vipca from Last Node
After resolving all of the issues above, manually
re-run vipca (GUI) as root from the last node in which the
errors occurred. Please keep in mind that vipca is a GUI
and will need to set your DISPLAY variable accordingly
to your X server:
# $ORA_CRS_HOME/bin/vipca
When the "VIP Configuration Assistant" appears,
this is how I answered the screen prompts:
Welcome:
Click Next
Network interfaces: Select only the public interface - eth0
Virtual IPs for cluster nodes:
Node Name: linux1
IP Alias Name: linux1-vip
IP Address: 192.168.1.200
Subnet Mask: 255.255.255.0
Node Name: linux2
IP Alias Name: linux2-vip
IP Address: 192.168.1.201
Subnet Mask: 255.255.255.0
Summary:
Click Finish
Configuration Assistant Progress Dialog:
Click OK after configuration is complete.
Configuration Results: Click Exit
Go back to the OUI and acknowledge the "Execute
Configuration scripts" dialog window. |
| End of installation |
At the end of the installation, exit from the OUI. |
Verify Oracle Clusterware Installation
After the installation of Oracle Clusterware, we can run through several tests to
verify the install was successful. Run the following commands
on both nodes in the RAC cluster.
Check cluster nodes
$ /u01/app/crs/bin/olsnodes -n
linux1 1
linux2 2
Confirm Oracle Clusterware Function
$ $ORA_CRS_HOME/bin/crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
ora.linux1.gsd application 0/5 0/0 ONLINE ONLINE linux1
ora.linux1.ons application 0/3 0/0 ONLINE ONLINE linux1
ora.linux1.vip application 0/0 0/0 ONLINE ONLINE linux1
ora.linux2.gsd application 0/5 0/0 ONLINE ONLINE linux2
ora.linux2.ons application 0/3 0/0 ONLINE ONLINE linux2
ora.linux2.vip application 0/0 0/0 ONLINE ONLINE linux2
Check CRS Status
$ $ORA_CRS_HOME/bin/crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
Check Oracle Clusterware Auto-Start Scripts
$ ls -l /etc/init.d/init.*
-r-xr-xr-x 1 root root 1951 Aug 1 12:49 /etc/init.d/init.crs
-r-xr-xr-x 1 root root 4699 Aug 1 12:49 /etc/init.d/init.crsd
-r-xr-xr-x 1 root root 35379 Aug 1 12:49 /etc/init.d/init.cssd
-r-xr-xr-x 1 root root 3175 Aug 1 12:49 /etc/init.d/init.evmd
22.
Install Oracle Database 10g Software
Perform the following installation procedures from only one of the Oracle RAC nodes in the cluster (linux1)!
The Oracle Database software will be installed to both of Oracle RAC nodes in the
cluster by the Oracle Universal Installer.
After successfully installing the Oracle Clusterware
software, the next step is to install Oracle Database 10g
Release 2 (10.2.0.1.0) with RAC.
For the purpose of this example, you will forgo the
"Create Database" option when installing the software. You will,
instead, create the database using the Database Configuration Assistant
(DBCA) after the install.
Like the Oracle Clusterware install (previous section),
the Oracle 10g database software only needs to be run
from one node. The OUI will copy the software packages to all nodes
configured in the RAC cluster.
Verifying Terminal Shell Environment
As discussed in the previous section, (Install Oracle Clusterware 10g Software),
the terminal shell environment needs to
be configured for remote access and user equivalence to all nodes in the cluster
before running the Oracle Universal Installer. Note
that you can utilize the same terminal shell session used in the previous section
which in this case, you do not
have to perform any of the actions described below with regards to setting up remote
access and the DISPLAY variable:
Login as the oracle
User Account and Set DISPLAY (if necessary)
# su - oracle
$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$ DISPLAY=<your local workstation>:0.0
$ export DISPLAY
Verify Remote Access / User Equivalence
Verify you are able to run the
Secure Shell
commands (ssh or scp)
on the Linux server you will be running the Oracle Universal Installer
from against all other Linux servers in the cluster without being
prompted for a password. When using the
secure shell method,
user equivalence
will need to be enabled for the terminal shell session before attempting
to run the OUI. To enable
user equivalence for the current terminal shell session, perform the
following steps remembering to enter the pass phrase for the RSA key
you generated when prompted:
$ exec /usr/bin/ssh-agent $SHELL
$ /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa: xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)
$ ssh linux1 "date;hostname"
Sat Aug 1 14:14:42 EDT 2009
linux1
$ ssh linux2 "date;hostname"
Sat Aug 1 14:15:16 EDT 2009
linux2
Run the Oracle Cluster Verification Utility
Before installing the Oracle Database Software, we should run
the following database pre-installation check using the
Cluster Verification Utility (CVU).
Note: Instructions for configuring CVU can
be found in the section
"Prerequisites for Using Cluster Verification Utility
discussed earlier in this article.
$ cd /home/oracle/orainstall/clusterware/cluvfy
$ CV_HOME=/home/oracle/orainstall/clusterware/cluvfy; export CV_HOME
$ CV_JDKHOME=/home/oracle/orainstall/clusterware/cluvfy/jdk14; export CV_JDKHOME
$ ./runcluvfy.sh stage -pre dbinst -n linux1,linux2 -r 10gR2 -verbose
Review the CVU report. Note that this report will contain
the same errors we received when checking pre-installation tasks
for CRS failure to find a suitable set of interfaces for VIPs
and the failure to find specific RPM packages that are not required with Oracle Enterprise Linux 5.
These two errors can be safely ignored.
Install Oracle Database 10g Release 2 Software
Install the Oracle Database 10g
Release 2 software with the following:
$ cd ~oracle
$ /home/oracle/orainstall/database/runInstaller -ignoreSysPrereqs
| Screen Name |
Response |
| Welcome Screen |
Click Next |
| Select Installation Type |
Select the type of installation you would like to perform:
"Enterprise Edition", "Standard Edition", or "Custom".
If you need advanced components like Oracle Label Security
or if you simply want to customize the environment,
select "Custom".
For the purpose of this article, I selected the "Custom" option. |
| Specify Home Details |
Set the Name and Path
for the ORACLE_HOME as follows:
Name: OraDb10g_home1
Path: /u01/app/oracle/product/10.2.0/db_1 |
| Specify Hardware Cluster Installation Mode |
Select the Cluster Installation option then
select all nodes available. Click "Select All" to select all servers:
linux1 and linux2.
If the installation stops here and the status of
any of the RAC nodes is "Node not reachable", perform the following
checks:
- Ensure Oracle Clusterware is running on the node in question.
- Ensure you are able to reach the node in question from the node
you are performing the installation from.
|
Available Product Components (Custom Database Installs Only) |
Select the components that you plan on using for your database
environment. |
| Product-Specific Prerequisite Checks |
The installer will run through a series of
checks to determine if the node meets the minimum requirements for
installing and configuring the Oracle database software. If any of the
checks fail, you will need to manually verify the check that failed by
clicking on the checkbox.
It is possible to receive an error about the available swap
space not meeting its minimum requirements:
Checking available swap space requirements...
Expected result: 3036MB
Actual Result: 1983MB
In most cases, you will have the minimum required swap space
(as shown above) and this can be safely ignored.
Simply click the check-box for "Checking available swap space requirements..." and
click Next to continue. |
Privileged Operating System Groups (Custom Database Installs Only) |
Select the UNIX groups that will be used for
each of the Oracle group names as follows:
Database Administrator (OSDBA) Group: dba
Database Operator (OSOPER) Group: oper
|
| Create Database |
Select the option to "Install database Software only."
Remember that we will create the clustered
database as a separate step using DBCA. |
| Summary |
Click on Install to start the installation! |
| Root Script Window - Run root.sh |
After the installation has completed, you will
be prompted to run the root.sh script. It is
important to keep in mind that the root.sh script will need to be run on
all nodes in the RAC cluster one at a time
starting with the node you are running the database installation from.
First, open a new console window on the node you
are installing the Oracle 10g database software from
as the root user account. For me, this was "linux1".
Navigate to the /u01/app/oracle/product/10.2.0/db_1
directory and run root.sh.
After running the root.sh
script on all nodes in the cluster, go back to the OUI and acknowledge
the "Execute Configuration scripts" dialog window. |
| End of installation |
At the end of the installation, exit from the OUI. |
Unset LD_ASSUME_KERNEL in SRVCTL
During the Oracle Clusterware installation in the
previous section,
we needed to modify both vipca and srvctl in the CRS bin
directory. Those same modifications need to be performed with the
new srvctl in the Database bin directory:
$ $ORACLE_HOME/bin/srvctl
/u01/app/oracle/product/10.2.0/db_1/jdk/jre/bin/java: error while loading
shared libraries: libpthread.so.0:
cannot open shared object file: No such file or directory
This error occurs because these releases of the Linux kernel
fix an old bug in the Linux threading that Oracle worked around
using LD_ASSUME_KERNEL settings in both vipca
and srvctl, this workaround is no longer valid on
OEL5 or RHEL5 or SLES10 hence the failures.
To workaround this issue, edit srvctl (in the Database bin directory
on all nodes) to undo the setting of LD_ASSUME_KERNEL
by adding one line, around line 168:
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
unset LD_ASSUME_KERNEL <<== Line to be added
|
Note: Remember to re-edit these files on all nodes:
<ORA_CRS_HOME>/bin/vipca
<ORA_CRS_HOME>/bin/srvctl
<RDBMS_HOME>/bin/srvctl
<ASM_HOME>/bin/srvctl # (If exists)
after applying the 10.2.0.2 or 10.2.0.3 patch sets, as these patchset will
still include those settings unnecessary for OEL5 or RHEL5 or SLES10.
This issue was raised with development and is fixed in the
10.2.0.4 patchsets.
Also note that we are explicitly unsetting LD_ASSUME_KERNEL
and not merely commenting out its setting to handle a case where
the user has it set in their environment (login shell).
23.
Install Oracle 10g Companion CD Software
Perform the following installation procedures from only one of the Oracle RAC nodes in the cluster (linux1)!
The Oracle10g Companion CD software will be installed to both of Oracle RAC nodes in the
cluster by the Oracle Universal Installer.
After successfully installing the Oracle Database software, the next
step is to install the Oracle Database 10g Companion CD
Release 2 software (10.2.0.1.0).
Please keep in mind that this is an optional step. For the purpose of this article,
my testing database will often make use of the
Java Virtual Machine (Java VM) and Oracle interMedia and therefore will require
the installation of the Oracle Database 10g Companion CD. The type of
installation to perform will be the Oracle Database 10g Products installation type.
This installation type includes the Natively Compiled
Java Libraries (NCOMP) files to improve Java performance. If you do not
install the NCOMP files, the ORA-29558:JAccelerator (NCOMP)
not installed error occurs when a database that uses Java VM
is upgraded to the patch release.
Verifying Terminal Shell Environment
As discussed in the previous section, (Install Oracle Database 10g Software),
the terminal shell environment needs to
be configured for remote access and user equivalence to all nodes in the cluster
before running the Oracle Universal Installer. Note
that you can utilize the same terminal shell session used in the previous section
which in this case, you do not
have to perform any of the actions described below with regards to setting up remote
access and the DISPLAY variable:
Login as the oracle
User Account and Set DISPLAY (if necessary)
# su - oracle
$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$ DISPLAY=<your local workstation>:0.0
$ export DISPLAY
Verify Remote Access / User Equivalence
Verify you are able to run the
Secure Shell
commands (ssh or scp)
on the Linux server you will be running the Oracle Universal Installer
from against all other Linux servers in the cluster without being
prompted for a password. When using the
secure shell method,
user equivalence
will need to be enabled for the terminal shell session before attempting
to run the OUI. To enable
user equivalence for the current terminal shell session, perform the
following steps remembering to enter the pass phrase for the RSA key
you generated when prompted:
$ exec /usr/bin/ssh-agent $SHELL
$ /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa: xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)
$ ssh linux1 "date;hostname"
Sat Aug 1 14:14:42 EDT 2009
linux1
$ ssh linux2 "date;hostname"
Sat Aug 1 14:15:16 EDT 2009
linux2
Install Oracle Database 10g Companion CD Software
Install the Oracle Database 10g Companion CD software with the following:
$ cd ~oracle
$ /home/oracle/orainstall/companion/runInstaller -ignoreSysPrereqs
| Screen Name |
Response |
| Welcome Screen |
Click Next |
| Select a Product to Install |
Select the "Oracle Database 10g Products 10.2.0.1.0" option. |
| Specify Home Details |
Set the destination for the ORACLE_HOME Name
and Path to that of the previous Oracle10g Database
software install as follows:
Name: OraDb10g_home1
Path: /u01/app/oracle/product/10.2.0/db_1 |
| Specify Hardware Cluster Installation Mode |
The Cluster Installation option will be selected
along with all of the available nodes in the cluster by default. Stay
with these default options and click Next to continue.
If the installation stops here and the status of
any of the RAC nodes is "Node not reachable", perform the following
checks:
- Ensure Oracle Clusterware is running on the
node in question.
- Ensure you are able to reach the node in
question from the node you are performing the installation from.
|
| Product-Specific Prerequisite Checks |
The installer will run through a series of
checks to determine if the node meets the minimum requirements for
installing and configuring the Companion CD Software. If any of the
checks fail, you will need to manually verify the check that failed by
clicking on the checkbox. For my installation, all checks passed with
no problems.
Click on Next to continue. |
| Summary |
On the Summary screen, click Install to start
the installation! |
| End of installation |
At the end of the installation, exit from the OUI. |
24.
Create TNS Listener Process
Perform the following configuration procedures
from only one of the Oracle RAC nodes in the cluster (linux1)!
The Network Configuration Assistant (NETCA) will setup the TNS
listener in a clustered configuration on both of Oracle RAC nodes in the cluster.
The DBCA requires the Oracle TNS Listener process to be
configured and running on all nodes in the RAC cluster before it can
create the clustered database.
The process of creating the TNS listener only needs to
be performed on one node in the cluster. All changes will be made and
replicated to all nodes in the cluster. On one of the nodes (I will be
using linux1) bring up the NETCA and run through
the process of creating a new TNS listener process and also configure
the node for local access.
Verifying Terminal Shell Environment
As discussed in the previous section, (Install Oracle Database 10g Companion CD Software),
the terminal shell environment needs to
be configured for remote access and user equivalence to all nodes in the cluster
before running the Oracle Universal Installer. Note
that you can utilize the same terminal shell session used in the previous section
which in this case, you do not
have to perform any of the actions described below with regards to setting up remote
access and the DISPLAY variable:
Login as the oracle
User Account and Set DISPLAY (if necessary)
# su - oracle
$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$ DISPLAY=<your local workstation>:0.0
$ export DISPLAY
Verify Remote Access / User Equivalence
Verify you are able to run the
Secure Shell
commands (ssh or scp)
on the Linux server you will be running the Oracle Universal Installer
from against all other Linux servers in the cluster without being
prompted for a password. When using the
secure shell method,
user equivalence
will need to be enabled for the terminal shell session before attempting
to run the OUI. To enable
user equivalence for the current terminal shell session, perform the
following steps remembering to enter the pass phrase for the RSA key
you generated when prompted:
$ exec /usr/bin/ssh-agent $SHELL
$ /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa: xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)
$ ssh linux1 "date;hostname"
Sat Aug 1 14:14:42 EDT 2009
linux1
$ ssh linux2 "date;hostname"
Sat Aug 1 14:15:16 EDT 2009
linux2
Run the Network Configuration Assistant
To start the NETCA, run the following:
$ netca &
The following table walks you through the process of creating a new
Oracle listener for our RAC environment.
| Screen Name |
Response |
Select the Type of Oracle
Net Services Configuration |
Select Cluster Configuration |
| Select the nodes to configure |
Select all of the nodes: linux1 and linux2. |
| Type of Configuration |
Select Listener configuration. |
| Listener Configuration - Next 6 Screens |
The following screens are now like any other
normal listener configuration. You can simply accept the default
parameters for the next six screens:
What do you want to do: Add
Listener name: LISTENER
Selected protocols: TCP
Port number: 1521
Configure another listener: No
Listener configuration complete!
[ Next ]
You will be returned to this Welcome (Type of Configuration) Screen. |
| Type of Configuration |
Select Naming Methods configuration. |
| Naming Methods Configuration |
The following screens are:
Selected Naming Methods: Local Naming
Naming Methods configuration complete!
[ Next ]
You will be returned to this Welcome (Type of Configuration) Screen. |
| Type of Configuration |
Click Finish to exit the NETCA. |
The Oracle TNS listener process should now be running
on all nodes in the RAC cluster:
$ hostname
linux1
$ ps -ef | grep lsnr | grep -v 'grep' | grep -v 'ocfs' | awk '{print $9}'
LISTENER_LINUX1
=====================
$ hostname
linux2
$ ps -ef | grep lsnr | grep -v 'grep' | grep -v 'ocfs' | awk '{print $9}'
LISTENER_LINUX2
25.
Create the Oracle Cluster Database
The database creation process should only be performed
from one of the Oracle RAC nodes in the cluster (linux1)!
We will use the DBCA to create the clustered database.
Before executing the DBCA, make sure that $ORACLE_HOME
and $PATH are set appropriately for the $ORACLE_BASE/product/10.2.0/db_1
environment.
You should also verify that all services we have
installed up to this point (Oracle TNS listener, Oracle Clusterware
processes, etc.) are running before attempting to start the clustered
database creation process.
Verifying Terminal Shell Environment
As discussed in the previous section, (Create TNS Listener Process),
the terminal shell environment needs to
be configured for remote access and user equivalence to all nodes in the cluster
before running the Oracle Universal Installer. Note
that you can utilize the same terminal shell session used in the previous section
which in this case, you do not
have to perform any of the actions described below with regards to setting up remote
access and the DISPLAY variable:
Login as the oracle
User Account and Set DISPLAY (if necessary)
# su - oracle
$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$ DISPLAY=<your local workstation>:0.0
$ export DISPLAY
Verify Remote Access / User Equivalence
Verify you are able to run the
Secure Shell
commands (ssh or scp)
on the Linux server you will be running the Oracle Universal Installer
from against all other Linux servers in the cluster without being
prompted for a password. When using the
secure shell method,
user equivalence
will need to be enabled for the terminal shell session before attempting
to run the OUI. To enable
user equivalence for the current terminal shell session, perform the
following steps remembering to enter the pass phrase for the RSA key
you generated when prompted:
$ exec /usr/bin/ssh-agent $SHELL
$ /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa: xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)
$ ssh linux1 "date;hostname"
Sat Aug 1 14:14:42 EDT 2009
linux1
$ ssh linux2 "date;hostname"
Sat Aug 1 14:15:16 EDT 2009
linux2
Run the Oracle Cluster Verification Utility
Before creating the Oracle clustered database, we should run
the following database configuration check using the
Cluster Verification Utility (CVU).
Note: Instructions for configuring CVU can
be found in the section
"Prerequisites for Using Cluster Verification Utility
discussed earlier in this article.
$ cd /home/oracle/orainstall/clusterware/cluvfy
$ CV_HOME=/home/oracle/orainstall/clusterware/cluvfy; export CV_HOME
$ CV_JDKHOME=/home/oracle/orainstall/clusterware/cluvfy/jdk14; export CV_JDKHOME
$ ./runcluvfy.sh stage -pre dbcfg -n linux1,linux2 -d ${ORACLE_HOME} -verbose
Review the CVU report. Note that this report will contain
the same error we received when checking pre-installation tasks
for CRS failure to find a suitable set of interfaces for VIPs.
This error can be safely ignored.
Create the Clustered Database
To start the database creation process, run the
following:
$ dbca &
| Screen Name |
Response |
| Welcome Screen |
Select "Oracle Real Application Clusters database." |
| Operations |
Select Create a Database. |
| Node Selection |
Click on the Select All
button to select all servers: linux1 and linux2. |
| Database Templates |
Select Custom Database. |
| Database Identification |
Select:
Global Database Name:
racdb.idevelopment.info
SID Prefix: racdb
I used idevelopment.info for the database domain.
You may use any domain. Keep in mind that this domain does not have to
be a valid DNS domain. |
| Management Option |
Leave the default options here, which is to
"Configure the Database with Enterprise Manager / Use Database Control
for Database Management." |
| Database Credentials |
I selected to Use the Same
Password for All Accounts. Enter the password (twice)
and make sure the password does not start with a digit number. |
| Storage Options |
For this guide, we will select to use
Automatic Storage Management (ASM). |
| Create ASM Instance |
Supply the SYS password to use for the new ASM instance.
Also, starting with Oracle10g Release 2, the ASM instance
server parameter file (SPFILE) needs to be on a shared disk. You will
need to modify the default entry for "Create server parameter file
(SPFILE)" to reside on the OCFS2 partition as follows: /u02/oradata/racdb/dbs/spfile+ASM.ora.
All other options can stay at their defaults.
You will then be prompted with a dialog box
asking if you want to create and start the ASM instance. Select the OK
button to acknowledge this dialog.
The OUI will now create and start the ASM
instance on all nodes in the RAC cluster. |
| ASM Disk Groups |
To start, click the Create New button.
This will bring up the "Create Disk Group" window with the four volumes
we configured earlier using ASMLib.
If the volumes we created earlier in this article do not show up in the
"Select Member Disks" window:
(ORCL:VOL1,
ORCL:VOL2,
ORCL:VOL3, and ORCL:VOL4)
then click on the "Change Disk Discovery Path" button and input "ORCL:VOL*".
For the first "Disk Group Name", I used the string "RACDB_DATA1".
Select the first two ASM volumes
(ORCL:VOL1 and ORCL:VOL2)
in the "Select Member Disks" window.
Keep the "Redundancy" setting to "Normal".
After verifying all values in this window are correct, click the [OK] button.
This will present the "ASM Disk Group Creation" dialog. When the ASM Disk Group Creation
process is finished, you will be returned to the "ASM Disk Groups" windows.
Click the Create New button again.
For the second "Disk Group Name", I used the string "FLASH_RECOVERY_AREA".
Select the last two ASM volumes
(ORCL:VOL3 and ORCL:VOL4) in the "Select Member Disks" window.
Keep the "Redundancy" setting to "Normal".
After verifying all values in this window are correct, click the [OK] button.
This will present the "ASM Disk Group Creation" dialog.
When the ASM Disk Group Creation process is finished, you will be returned to
the "ASM Disk Groups" window with two disk groups created and selected.
Select only one of the disk groups by using the checkbox next to the newly created
Disk Group Name "RACDB_DATA1" (ensure that the
disk group for "FLASH_RECOVERY_AREA" is not selected) and click [Next] to continue. |
| Database File Locations |
I selected to use the default, which is to use Oracle Managed Files:
Database Area: +RACDB_DATA1 |
| Recovery Configuration |
Check the option for "Specify Flash Recovery Area".
For the "Flash Recovery Area", click the [Browse]
button and select the disk group name "+FLASH_RECOVERY_AREA".
My disk group has
a size of about 17GB. When defining the Flash Recovery Area size, use the entire
volume minus 10% (17-10%=15 GB).
I used a "Flash Recovery Area Size"
of 15 GB (15360 MB). |
| Database Content |
I left all of the Database Components (and
destination tablespaces) set to their default value, although it is
perfectly OK to select the Sample Schemas. This option is available
since we installed the Oracle Companion CD software. |
| Database Services |
For this test configuration, click Add,
and enter racdb_taf as the "Service Name." Leave
both instances set to Preferred and for the "TAF Policy" select "Basic". |
| Initialization Parameters |
Change any parameters for your environment. I
left them all at their default settings. |
| Database Storage |
Change any parameters for your environment. I
left them all at their default settings. |
| Creation Options |
Keep the default option Create Database selected. I also
always select to "Generate Database Creation Scripts".
Click Finish to start the database
creation process. After acknowledging the database creation report
and script generation dialog, the database creation will start.
Click OK on the "Summary" screen. |
| End of Database Creation |
At the end of the database creation, exit from the DBCA.
Note: When exiting the DBCA you will not receive
any feedback from the dialog window for around 30-60 seconds. After awhile, another
dialog will come up indicating that it is starting all Oracle instances and HA
service "racdb_taf". This may take several minutes to complete. When finished,
all windows and dialog boxes will disappear. |
When the DBCA has completed, you will have a fully
functional Oracle RAC cluster running!
$ $ORA_CRS_HOME/bin/crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE linux1
ora....X1.lsnr application ONLINE ONLINE linux1
ora.linux1.gsd application ONLINE ONLINE linux1
ora.linux1.ons application ONLINE ONLINE linux1
ora.linux1.vip application ONLINE ONLINE linux1
ora....SM2.asm application ONLINE ONLINE linux2
ora....X2.lsnr application ONLINE ONLINE linux2
ora.linux2.gsd application ONLINE ONLINE linux2
ora.linux2.ons application ONLINE ONLINE linux2
ora.linux2.vip application ONLINE ONLINE linux2
ora.racdb.db application ONLINE ONLINE linux1
ora....b1.inst application ONLINE ONLINE linux1
ora....b2.inst application ONLINE ONLINE linux2
ora...._taf.cs application ONLINE ONLINE linux1
ora....db1.srv application ONLINE ONLINE linux1
ora....db2.srv application ONLINE ONLINE linux2
Verify the racdb_taf Service
During the creation of the Oracle clustered database,
you added a service named racdb_taf that will be
used to connect to the database with TAF enabled.
Use the following to verify the racdb_taf
service was successfully added:
SQL> show parameter service
NAME TYPE VALUE
-------------------- ----------- --------------------------------
service_names string racdb.idevelopment.info, racdb_taf
If the only service defined was for racdb.idevelopment.info,
then you will need to manually add the service to both instances:
SQL> show parameter service
NAME TYPE VALUE
-------------------- ----------- --------------------------
service_names string racdb.idevelopment.info
SQL> alter system set service_names =
2 'racdb.idevelopment.info, racdb_taf' scope=both;
26.
Post-Installation Tasks - (Optional)
This chapter describes several optional tasks that can be applied to your
new Oracle 10g environment in order to enhance availability as well as database management.
Re-compile Invalid Objects
Run the utlrp.sql script to recompile all invalid PL/SQL
packages now instead of when the packages are accessed for the first time.
This step is optional but recommended.
$ sqlplus / as sysdba
SQL> @?/rdbms/admin/utlrp.sql
Enabling Archive Logs in a RAC Environment
Whether a single instance or clustered database, Oracle tracks and logs all changes to database blocks in
online redolog files. In an Oracle RAC environment, each instance will have its own set of online
redolog files known as a thread. Each Oracle instance will use its group of online redologs
in a circular manner. Once an online redolog fills, Oracle moves to the next one. If the database is in
"Archive Log Mode", Oracle will make a copy of the online redo log before it gets reused. A thread must
contain at least two online redologs (or online redolog groups). The same holds true for a single instance
configuration. The single instance must contain at least two online redologs (or online redolog groups).
The size of an online redolog file is completely independent of another intances' redolog size. Although
in most configurations the size is the same, it may be different depending on the workload and
backup / recovery considerations for each node. It is also worth mentioning that each instance has exclusive
write access to its own online redolog files. In a correctly configured RAC environment, however, each
instance can read another instance's current online redolog file to perform instance recovery if that
instance was terminated abnormally. It is therefore a requirement that online redo logs be located on a
shared storage device (just like the database files).
As already mentioned, Oracle writes to its online redolog files in a circular manner. When the current
online redolog fills, Oracle will switch to the next one. To facilitate media recovery, Oracle allows
the DBA to put the database into "Archive Log Mode" which makes a copy of the online redolog after it
fills (and before it gets reused). This is a process known as archiving.
The Database Configuration Assistant (DBCA) allows users to configure a new database to be in archive log
mode, however most DBA's opt to bypass this option during initial database creation. In cases like this where the database is in no archive
log mode, it is a simple task to put the database into archive log mode. Note however that this will require
a short database outage. From one of the nodes in the Oracle RAC configuration, use the following
tasks to put a RAC enabled database into archive log mode. For the purpose of this article, I will use
the node linux1 which runs the racdb1 instance:
- Login to one of the nodes (i.e. linux1) and disable the cluster instance parameter
by setting cluster_database to FALSE from the current instance:
$ sqlplus / as sysdba
SQL> alter system set cluster_database=false scope=spfile sid='racdb1';
- Shutdown all instances accessing the clustered database:
$ srvctl stop database -d racdb
- Using the local instance, MOUNT the database:
$ sqlplus / as sysdba
SQL> startup mount
- Enable archiving:
SQL> alter database archivelog;
- Re-enable support for clustering by modifying the instance parameter
cluster_database to TRUE from the current instance:
SQL> alter system set cluster_database=true scope=spfile sid='racdb1';
- Shutdown the local instance:
SQL> shutdown immediate
- Bring all instance back up using srvctl:
$ srvctl start database -d racdb
- (Optional) Bring any services (i.e. TAF) back up using srvctl:
$ srvctl start service -d racdb
- Login to the local instance and verify Archive Log Mode is enabled:
$ sqlplus / as sysdba
SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 29
Next log sequence to archive 30
Current log sequence 30
After enabling Archive Log Mode, each instance in the RAC configuration can automatically
archive redologs!
Download and Install Custom Oracle Database Scripts
DBA's rely on Oracle's data dictionary views and dynamic performance views in order to
support and better manage their databases. Although these views provide a simple and easy mechanism to
query critical information regarding the database, it helps to have a collection of accurate and readily
available SQL scripts to query these views.
In this section you will download and install a collection of Oracle DBA scripts that can
be used to manage
many aspects of your database including space management, performance, backups,
security, and session management. The Oracle DBA scripts archive can be downloaded
using the following link
http://www.idevelopment.info/data/Oracle/DBA_scripts/common.zip. As the oracle user account, download the common.zip archive to the
$ORACLE_BASE directory of
each node in the cluster. For the purpose of this example, the common.zip archive
will be copied to /u01/app/oracle. Next, unzip the archive file to the $ORACLE_BASE directory.
For example, perform the following on both nodes in the Oracle RAC cluster as the oracle
user account:
$ mv common.zip /u01/app/oracle
$ cd /u01/app/oracle
$ unzip common.zip
The final step is to verify (or set) the appropriate environment variable for the
current UNIX shell to ensure the Oracle SQL scripts can be run from SQL*Plus while
in any directory. For UNIX verify the following environment variable is set
and included in your login shell script:
ORACLE_PATH=$ORACLE_BASE/common/oracle/sql:.:$ORACLE_HOME/rdbms/admin
export ORACLE_PATH
Note: The ORACLE_PATH environment variable should
already be set in the .bash_profile login script that was created
in the section
Create Login Script for oracle User Account.
Now that the Oracle DBA scripts have been unzipped and the UNIX environment variable
($ORACLE_PATH) has been set to the appropriate directory, you should now be able
to run any of the SQL scripts in your $ORACLE_BASE/common/oracle/sql while logged
into SQL*Plus. For example, to query tablespace information while logged into the Oracle
database as a DBA user:
SQL> @dba_tablespaces
Status Tablespace Name TS Type Ext. Mgt. Seg. Mgt. Tablespace Size Used (in bytes) Pct. Used
--------- --------------- ------------ ---------- --------- ------------------ ------------------ ---------
ONLINE UNDOTBS1 UNDO LOCAL MANUAL 1,283,457,024 9,043,968 1
ONLINE SYSAUX PERMANENT LOCAL AUTO 524,288,000 378,732,544 72
ONLINE USERS PERMANENT LOCAL AUTO 2,147,483,648 321,257,472 15
ONLINE SYSTEM PERMANENT LOCAL MANUAL 838,860,800 505,544,704 60
ONLINE INDX PERMANENT LOCAL AUTO 1,073,741,824 65,536 0
ONLINE UNDOTBS2 UNDO LOCAL MANUAL 1,283,457,024 22,282,240 2
ONLINE TEMP TEMPORARY LOCAL MANUAL 1,073,741,824 92,274,688 9
------------------ ------------------ ---------
avg 23
sum 8,225,030,144 1,329,201,152
7 rows selected.
To obtain a list of all available Oracle DBA scripts while logged into SQL*Plus, run the help.sql
script:
SQL> @help.sql
========================================
Automatic Shared Memory Management
========================================
asmm_components.sql
========================================
Automatic Storage Management
========================================
asm_alias.sql
asm_clients.sql
asm_diskgroups.sql
asm_disks.sql
asm_disks_perf.sql
asm_drop_files.sql
asm_files.sql
asm_files2.sql
asm_templates.sql
< --- SNIP --- >
perf_top_sql_by_buffer_gets.sql
perf_top_sql_by_disk_reads.sql
========================================
Workspace Manager
========================================
wm_create_workspace.sql
wm_disable_versioning.sql
wm_enable_versioning.sql
wm_freeze_workspace.sql
wm_get_workspace.sql
wm_goto_workspace.sql
wm_merge_workspace.sql
wm_refresh_workspace.sql
wm_remove_workspace.sql
wm_unfreeze_workspace.sql
wm_workspaces.sql
Create Shared Oracle Password Files
In this section, I present the steps required to configure a shared Oracle password
file between all instances in the Oracle clustered database.
The password file for the database in UNIX is located at
$ORACLE_HOME/dbs/orapw<ORACLE_SID> for each instance and contains a list
of all database users that have SYSDBA privileges. When a database user is granted the
SYSDBA role, the instance records this in the database password file for the instance
you are logged into. But what about the other instances in the cluster? The database password
file on other instances do not get updated and will not contain the user who was just
granted the SYSDBA role. Therefore a program (like RMAN) that tries to login as
this new user with SYSDBA privileges will fail if it tries to use an
instance with a password file that does not contain his or her name.
To resolve this problem, a common solution is to place a single database password file
on a shared / clustered file system and then create symbolic links from each of the instances
to this single version of the database password file. Since the environment described in this
article makes use of the Oracle Clustered File System (OCFS2), we will use it to store
the single version of the database password file.
In this section, we will also be including the Oracle password file for the ASM instance.
- Create the database password directory on the clustered file system mounted
on /u02/oradata/racdb. Perform the following from only one node in the
cluster as the oracle user account - (linux1):
$ mkdir -p /u02/oradata/racdb/dbs
- From one node in the cluster (linux1), move the database password files
to the database password directory on the clustered file system.
Chose a node that contains a database password file that has the most recent
SYSDBA additions. In most cases, this will not matter since any missing
entries can be easily added by granting them the SYSDBA role - (plus the fact
that this is a fresh install and unlikely you created any SYSDBA users at this point!).
Note that the
database server does not need to be shutdown while performing the following actions.
From linux1 as the oracle user account:
$ mv $ORACLE_HOME/dbs/orapw+ASM1 /u02/oradata/racdb/dbs/orapw+ASM
$ mv $ORACLE_HOME/dbs/orapwracdb1 /u02/oradata/racdb/dbs/orapwracdb
$ ln -s /u02/oradata/racdb/dbs/orapw+ASM $ORACLE_HOME/dbs/orapw+ASM1
$ ln -s /u02/oradata/racdb/dbs/orapwracdb $ORACLE_HOME/dbs/orapwracdb1
- From the second node in the cluster (linux2):
$ rm $ORACLE_HOME/dbs/orapw+ASM2
$ rm $ORACLE_HOME/dbs/orapwracdb2
$ ln -s /u02/oradata/racdb/dbs/orapw+ASM $ORACLE_HOME/dbs/orapw+ASM2
$ ln -s /u02/oradata/racdb/dbs/orapwracdb $ORACLE_HOME/dbs/orapwracdb2
Now, when a user is granted the SYSDBA role, all instances will have access
to the same password file:
SQL> GRANT sysdba TO scott;
27.
Verify TNS Networking Files
Ensure that the TNS networking files are configured on
both Oracle RAC nodes in the cluster!
listener.ora
We already covered how to create a TNS listener
configuration file (listener.ora) for a clustered
environment in Section
24. The listener.ora file should be
properly configured and no modifications should be needed.
For clarity, I have included a copy of the listener.ora
file from my node linux1 in this guide support files. I've also included
a copy of my tnsnames.ora file that was
configured by Oracle and can be used for testing the Transparent
Application Failover (TAF). This file should already be configured on
both Oracle RAC nodes in the cluster.
You can include any of these entries on other client
machines that need access to the clustered database.
Connecting to Clustered Database From an
External Client
This is an optional step, but I like to perform it in order to verify
my TNS files are configured correctly. Use another machine (i.e. a Windows machine connected to
the network) that has Oracle
installed and add the TNS entries
(in the tnsnames.ora) from either of the nodes in the cluster that were created
for the clustered database.
Note: Verify that the machine you are connecting from can resolve
all host names exactly how they appear in the listener.ora and tnsnames.ora files.
For the purpose of this document, the machine you are connecting from should be
able to resolve the following host names in the local hosts file or through DNS:
192.168.1.100 linux1
192.168.1.101 linux2
192.168.1.200 linux1-vip
192.168.1.201 linux2-vip
Try to connect to the clustered database using all
available service names defined in the tnsnames.ora
file:
C:\> sqlplus system/manager@racdb2
C:\> sqlplus system/manager@racdb1
C:\> sqlplus system/manager@racdb_taf
C:\> sqlplus system/manager@racdb
28.
Create / Alter Tablespaces
When creating the clustered database, we left all
tablespaces set to their default size. If you are using a large drive
for the shared storage, you may want to make a sizable testing
database.
Below are several optional SQL commands for modifying
and creating all tablespaces for the test database. Please keep in mind
that the database file names (OMF files) used in this example may
differ from what the Oracle Database Configuration Assistant (DBCA) creates
for your environment. When working through this section, substitute the data
file names that were created in your environment where appropriate. The following
query can be used to determine the file names for your environment:
SQL> select tablespace_name, file_name
2 from dba_data_files
3 union
4 select tablespace_name, file_name
5 from dba_temp_files;
TABLESPACE_NAME FILE_NAME
--------------- --------------------------------------------------
EXAMPLE +RACDB_DATA1/racdb/datafile/example.257.570913311
INDX +RACDB_DATA1/racdb/datafile/indx.270.570920045
SYSAUX +RACDB_DATA1/racdb/datafile/sysaux.260.570913287
SYSTEM +RACDB_DATA1/racdb/datafile/system.262.570913215
TEMP +RACDB_DATA1/racdb/tempfile/temp.258.570913303
UNDOTBS1 +RACDB_DATA1/racdb/datafile/undotbs1.261.570913263
UNDOTBS2 +RACDB_DATA1/racdb/datafile/undotbs2.265.570913331
USERS +RACDB_DATA1/racdb/datafile/users.264.570913355
$ sqlplus / as sysdba
SQL> create user scott identified by tiger default tablespace users;
SQL> grant dba, resource, connect to scott;
SQL> alter database datafile '+RACDB_DATA1/racdb/datafile/users.264.570913355' resize 1024m;
SQL> alter tablespace users add datafile '+RACDB_DATA1' size 1024m autoextend off;
SQL> create tablespace indx datafile '+RACDB_DATA1' size 1024m
2 autoextend on next 50m maxsize unlimited
3 extent management local autoallocate
4 segment space management auto;
SQL> alter database datafile '+RACDB_DATA1/racdb/datafile/system.262.570913215' resize 800m;
SQL> alter database datafile '+RACDB_DATA1/racdb/datafile/sysaux.260.570913287' resize 500m;
SQL> alter tablespace undotbs1 add datafile '+RACDB_DATA1' size 1024m
2 autoextend on next 50m maxsize 2048m;
SQL> alter tablespace undotbs2 add datafile '+RACDB_DATA1' size 1024m
2 autoextend on next 50m maxsize 2048m;
SQL> alter database tempfile '+RACDB_DATA1/racdb/tempfile/temp.258.570913303' resize 1024m;
Here is a snapshot of the tablespaces I have defined for my test
database environment:
Status Tablespace Name TS Type Ext. Mgt. Seg. Mgt. Tablespace Size Used (in bytes) Pct. Used
--------- --------------- ------------ ---------- --------- ------------------ ------------------ ---------
ONLINE UNDOTBS1 UNDO LOCAL MANUAL 1,283,457,024 85,065,728 7
ONLINE SYSAUX PERMANENT LOCAL AUTO 524,288,000 275,906,560 53
ONLINE USERS PERMANENT LOCAL AUTO 2,147,483,648 131,072 0
ONLINE SYSTEM PERMANENT LOCAL MANUAL 838,860,800 500,301,824 60
ONLINE EXAMPLE PERMANENT LOCAL AUTO 157,286,400 83,820,544 53
ONLINE INDX PERMANENT LOCAL AUTO 1,073,741,824 65,536 0
ONLINE UNDOTBS2 UNDO LOCAL MANUAL 1,283,457,024 3,801,088 0
ONLINE TEMP TEMPORARY LOCAL MANUAL 1,073,741,824 27,262,976 3
------------------ ------------------ ---------
avg 22
sum 8,382,316,544 976,355,328
8 rows selected.
29.
Verify the RAC Cluster & Database Configuration
The following RAC verification checks should be performed
on both Oracle RAC nodes in the cluster! For this article, however, I will only
be performing checks from linux1.
This section provides several srvctl
commands and SQL queries you can use to validate your Oracle RAC 10g
configuration.
There are five node-level tasks defined for SRVCTL:
- Adding and deleting node-level applications
- Setting and unsetting the environment for node-level applications
- Administering node applications
- Administering ASM instances
- Starting and stopping a group of programs that
includes virtual IP addresses, listeners, Oracle Notification Services,
and Oracle Enterprise Manager agents (for maintenance purposes).
Status of all instances and services
$ srvctl status database -d racdb
Instance racdb1 is running on node linux1
Instance racdb2 is running on node linux2
Status of a single instance
$ srvctl status instance -d racdb -i racdb2
Instance racdb2 is running on node linux2
Status of a named service globally across the database
$ srvctl status service -d racdb -s racdb_taf
Service racdb_taf is running on instance(s) racdb1, racdb2
Status of node applications on a particular node
$ srvctl status nodeapps -n linux1
VIP is running on node: linux1
GSD is running on node: linux1
Listener is running on node: linux1
ONS daemon is running on node: linux1
Status of an ASM instance
$ srvctl status asm -n linux1
ASM instance +ASM1 is running on node linux1.
List all configured databases
$ srvctl config database
racdb
Display configuration for our RAC database
$ srvctl config database -d racdb
linux1 racdb1 /u01/app/oracle/product/10.2.0/db_1
linux2 racdb2 /u01/app/oracle/product/10.2.0/db_1
Display all services for the specified cluster database
$ srvctl config service -d racdb
racdb_taf PREF: racdb1 racdb2 AVAIL:
Display the configuration for node applications - (VIP, GSD, ONS, Listener)
$ srvctl config nodeapps -n linux1 -a -g -s -l
VIP exists.: /linux1-vip/192.168.1.200/255.255.255.0/eth0
GSD exists.
ONS daemon exists.
Listener exists.
Display the configuration for the ASM instance(s)
$ srvctl config asm -n linux1
+ASM1 /u01/app/oracle/product/10.2.0/db_1
All running instances in the cluster
SELECT
inst_id
, instance_number inst_no
, instance_name inst_name
, parallel
, status
, database_status db_status
, active_state state
, host_name host
FROM gv$instance
ORDER BY inst_id;
INST_ID INST_NO INST_NAME PAR STATUS DB_STATUS STATE HOST
-------- -------- ---------- --- ------- ------------ --------- -------
1 1 racdb1 YES OPEN ACTIVE NORMAL linux1
2 2 racdb2 YES OPEN ACTIVE NORMAL linux2
All data files which are in the disk group
select name from v$datafile
union
select member from v$logfile
union
select name from v$controlfile
union
select name from v$tempfile;
NAME
-------------------------------------------
+FLASH_RECOVERY_AREA/racdb/controlfile/current.258.570913191
+FLASH_RECOVERY_AREA/racdb/onlinelog/group_1.257.570913201
+FLASH_RECOVERY_AREA/racdb/onlinelog/group_2.256.570913211
+FLASH_RECOVERY_AREA/racdb/onlinelog/group_3.259.570918285
+FLASH_RECOVERY_AREA/racdb/onlinelog/group_4.260.570918295
+RACDB_DATA1/racdb/controlfile/current.259.570913189
+RACDB_DATA1/racdb/datafile/example.257.570913311
+RACDB_DATA1/racdb/datafile/indx.270.570920045
+RACDB_DATA1/racdb/datafile/sysaux.260.570913287
+RACDB_DATA1/racdb/datafile/system.262.570913215
+RACDB_DATA1/racdb/datafile/undotbs1.261.570913263
+RACDB_DATA1/racdb/datafile/undotbs1.271.570920865
+RACDB_DATA1/racdb/datafile/undotbs2.265.570913331
+RACDB_DATA1/racdb/datafile/undotbs2.272.570921065
+RACDB_DATA1/racdb/datafile/users.264.570913355
+RACDB_DATA1/racdb/datafile/users.269.570919829
+RACDB_DATA1/racdb/onlinelog/group_1.256.570913195
+RACDB_DATA1/racdb/onlinelog/group_2.263.570913205
+RACDB_DATA1/racdb/onlinelog/group_3.266.570918279
+RACDB_DATA1/racdb/onlinelog/group_4.267.570918289
+RACDB_DATA1/racdb/tempfile/temp.258.570913303
21 rows selected.
All ASM disk that belong to the 'RACDB_DATA1' disk group
SELECT path
FROM v$asm_disk
WHERE group_number IN (select group_number
from v$asm_diskgroup
where name = 'RACDB_DATA1');
PATH
----------------------------------
ORCL:VOL1
ORCL:VOL2
30.
Starting / Stopping the Cluster
At this point, we've installed and configured Oracle
RAC 10g entirely and have a fully functional
clustered database.
After all the work done up to this point, you may well
ask, "OK, so how do I start and stop services?". If you have followed
the instructions in this guide, all services—including Oracle
Clusterware, all Oracle instances, Enterprise Manager Database Console,
and so on—should start automatically on each reboot of the
Linux nodes.
There are times, however, when you might want to shut
down a node and manually start it back up. Or you may find that
Enterprise Manager is not running and need to start it. This section
provides the commands responsible
for starting and stopping the cluster environment.
Ensure that you are logged in as the oracle
UNIX user. We will run all commands in this section from linux1:
# su - oracle
$ hostname
linux1
Stopping the Oracle RAC 10g
Environment
The first step is to stop the Oracle instance. When the
instance (and related services) is down, then bring down the ASM
instance. Finally, shut down the node applications (Virtual IP, GSD,
TNS Listener, and ONS).
$ export ORACLE_SID=racdb1
$ emctl stop dbconsole
$ srvctl stop instance -d racdb -i racdb1
$ srvctl stop asm -n linux1
$ srvctl stop nodeapps -n linux1
Starting the Oracle RAC 10g Environment
The first step is to start the node applications
(Virtual IP, GSD, TNS Listener, and ONS). When the node applications
are successfully started, then bring up the ASM instance. Finally,
bring up the Oracle instance (and related services) and the Enterprise
Manager Database console.
$ export ORACLE_SID=racdb1
$ srvctl start nodeapps -n linux1
$ srvctl start asm -n linux1
$ srvctl start instance -d racdb -i racdb1
$ emctl start dbconsole
Start/Stop All Instances with SRVCTL
Start/stop all the instances and their enabled
services. I have included this step just for fun as a way to bring down
all instances!
$ srvctl start database -d racdb
$ srvctl stop database -d racdb
31.
Transparent Application Failover (TAF)
It is not uncommon for businesses to demand 99.99% (or
even 99.999%) availability for their enterprise applications. Think
about what it would take to ensure a downtime of no more than .5 hours
or even no downtime during the year. To answer many of these
high-availability requirements, businesses are investing in mechanisms
that provide for automatic failover when one participating system
fails. When considering the availability of the Oracle database, Oracle
RAC 10g provides a superior solution with its
advanced failover mechanisms. Oracle RAC 10g
includes the required components that all work within a clustered
configuration responsible for providing continuous availability; when
one of the participating systems fail within the cluster, the users are
automatically migrated to the other available systems.
A major component of Oracle RAC 10g
that is responsible for failover processing is the Transparent
Application Failover (TAF) option. All database connections (and
processes) that lose connections are reconnected to another node within
the cluster. The failover is completely transparent to the user.
This final section provides a short demonstration on
how TAF works in Oracle RAC 10g. Please note that a
complete discussion of failover in Oracle RAC 10g
would require an article in itself; my intention here is to present
only a brief overview.
One important note is that TAF happens automatically
within the OCI libraries. Thus your application (client) code does not
need to change in order to take advantage of TAF. Certain configuration
steps, however, will need to be done on the Oracle TNS file
tnsnames.ora. (Keep in mind that as of this writing, the Java thin
client will not be able to participate in TAF because it never reads
tnsnames.ora.)
Setup the tnsnames.ora File
Before demonstrating TAF, we need to verify that a
valid entry exists in the tnsnames.ora file on a non-RAC client machine
(if you have a Windows machine lying around). Ensure that you have the
Oracle RDBMS software installed. (Actually, you only need a client
install of the Oracle software.)
During the creation of the clustered database in this
guide, we created a new service that will be used for testing TAF named
RACDB_TAF. It provides all the necessary configuration parameters for
load balancing and failover. You can copy the contents of this entry to
the %ORACLE_HOME%\network\admin\tnsnames.ora file on the client machine
(my Windows laptop is being used in this example) in order to connect
to the new Oracle clustered database:
...
RACDB_TAF =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = linux1-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = linux2-vip)(PORT = 1521))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = racdb_taf.idevelopment.info)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD = BASIC)
(RETRIES = 180)
(DELAY = 5)
)
)
)
...
SQL Query to Check the Session's Failover Information
The following SQL query can be used to check a
session's failover type, failover method, and if a failover has
occurred. We will be using this query throughout this example.
COLUMN instance_name FORMAT a13
COLUMN host_name FORMAT a9
COLUMN failover_method FORMAT a15
COLUMN failed_over FORMAT a11
SELECT
instance_name
, host_name
, NULL AS failover_type
, NULL AS failover_method
, NULL AS failed_over
FROM v$instance
UNION
SELECT
NULL
, NULL
, failover_type
, failover_method
, failed_over
FROM v$session
WHERE username = 'SYSTEM';
TAF Demo
From a Windows machine (or other non-RAC client
machine), login to the clustered database using the racdb_taf
service as the SYSTEM user:
C:\> sqlplus system/manager@racdb_taf
COLUMN instance_name FORMAT a13
COLUMN host_name FORMAT a9
COLUMN failover_method FORMAT a15
COLUMN failed_over FORMAT a11
SELECT
instance_name
, host_name
, NULL AS failover_type
, NULL AS failover_method
, NULL AS failed_over
FROM v$instance
UNION
SELECT
NULL
, NULL
, failover_type
, failover_method
, failed_over
FROM v$session
WHERE username = 'SYSTEM';
INSTANCE_NAME HOST_NAME FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
------------- --------- ------------- --------------- -----------
racdb1 linux1
SELECT BASIC NO
DO NOT log out of the above SQL*Plus session!
Now that we have run the query (above), we should now
shutdown the instance racdb1 on linux1
using the abort option. To perform this
operation, you can use the srvctl command-line
utility as follows:
# su - oracle
$ srvctl status database -d racdb
Instance racdb1 is running on node linux1
Instance racdb2 is running on node linux2
$ srvctl stop instance -d racdb -i racdb1 -o abort
$ srvctl status database -d racdb
Instance racdb1 is not running on node linux1
Instance racdb2 is running on node linux2
Now let's go back to our SQL session and rerun the SQL statement
in the buffer:
COLUMN instance_name FORMAT a13
COLUMN host_name FORMAT a9
COLUMN failover_method FORMAT a15
COLUMN failed_over FORMAT a11
SELECT
instance_name
, host_name
, NULL AS failover_type
, NULL AS failover_method
, NULL AS failed_over
FROM v$instance
UNION
SELECT
NULL
, NULL
, failover_type
, failover_method
, failed_over
FROM v$session
WHERE username = 'SYSTEM';
INSTANCE_NAME HOST_NAME FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
------------- --------- ------------- --------------- -----------
racdb2 linux2
SELECT BASIC YES
SQL> exit
From the above demonstration, we can see that the above
session has now been failed over to instance racdb2
on linux2.
32.
Troubleshooting
Confirm the RAC Node Name is Not Listed in Loopback Address
Ensure that the node names (linux1
or linux2) are not included
for the loopback address in the /etc/hosts file.
If the machine name is listed in the in the loopback address entry as
below:
127.0.0.1 linux1 localhost.localdomain localhost
it will need to be removed as shown
below:
127.0.0.1 localhost.localdomain localhost
If the RAC node name is listed for
the loopback address, you will receive the following error during the
RAC installation:
ORA-00603: ORACLE server session terminated by fatal error
or
ORA-29702: error occurred in Cluster Group Service operation
Confirm localhost is defined in the /etc/hosts file for the loopback address
Ensure that the entry for localhost.localdomain and localhost are
included for the loopback address in the /etc/hosts file for each of the Oracle RAC
nodes:
127.0.0.1 localhost.localdomain localhost
If an entry does not exist for localhost in the /etc/hosts
file, Oracle Clusterware will be unable to start the application resources notably the ONS process.
The error would indicate "Failed to get IP for localhost"
and will be written to the log file for ONS. For example:
CRS-0215 could not start resource 'ora.linux1.ons'. Check log file
"/u01/app/crs/log/linux1/racg/ora.linux1.ons.log"
for more details.
The ONS log file will contain lines similar to the following:
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
2007-04-14 13:10:02.729: [ RACG][3086871296][13316][3086871296][ora.linux1.ons]: Failed to get IP for localhost (1)
Failed to get IP for localhost (1)
Failed to get IP for localhost (1)
onsctl: ons failed to start
...
Setting the Correct Date and Time on All Cluster Nodes
During the installation of Oracle Clusterware, the Database, and the
Companion CD, the Oracle Universal Installer (OUI) first installs the
software to the local node running the installer (i.e. linux1). The
software is then copied remotely to all of the remaining nodes in the
cluster (i.e. linux2). During the remote copy process, the OUI will
execute the UNIX "tar" command on each of the
remote nodes to extract the files that were archived and copied over.
If the date and time on the node performing the install is greater than
that of the node it is copying to, the OUI will throw an error from the
"tar" command indicating it is attempting to
extract files stamped with a time in the future:
Error while copying directory
/u01/app/crs with exclude file list 'null' to nodes 'linux2'.
[PRKC-1002 : All the submitted commands did not execute successfully]
---------------------------------------------
linux2:
/bin/tar: ./bin/lsnodes: time stamp 2009-07-28 09:21:34 is 735 s in the future
/bin/tar: ./bin/olsnodes: time stamp 2009-07-28 09:21:34 is 735 s in the future
...(more errors on this node)
Please note that although this would seem like a severe error from the
OUI, it can safely be disregarded as a warning. The "tar"
command DOES actually extract the files; however, when you perform a
listing of the files (using ls -l) on the remote
node, they will be missing the time field until the time on the server
is greater than the timestamp of the file.
Before starting any of the above noted installations, ensure that each
member node of the cluster is set as closely as possible to the same
date and time. Oracle strongly recommends using the Network
Time Protocol feature of most operating systems for this
purpose, with both Oracle RAC nodes using the same reference Network Time Protocol
server.
Accessing a Network Time Protocol server, however, may not always be an
option. In this case, when manually setting the date and time for the
nodes in the cluster, ensure that the date and time of the node you are
performing the software installations from (linux1) is less than all
other nodes in the cluster (linux2). I generally use a 20 second
difference as shown in the following example:
Setting the date and time from linux1:
# date -s "7/28/2009 23:00:00"
Setting the date and time from linux2:
# date -s "7/28/2009 23:00:20"
The two-node RAC configuration described in this article does not make
use of a Network Time Protocol server.
Openfiler - Logical Volumes Not Active on Boot
One issue that I have run into several times occurs when using a USB
drive connected to the Openfiler server. When the Openfiler server is rebooted,
the system is able to recognize the USB drive however, it
is not able to load the logical volumes and writes the following message to
/var/log/messages - (also available through dmesg):
iSCSI Enterprise Target Software - version 0.4.14
iotype_init(91) register fileio
iotype_init(91) register blockio
iotype_init(91) register nullio
open_path(120) Can't open /dev/rac1/crs -2
fileio_attach(268) -2
open_path(120) Can't open /dev/rac1/asm1 -2
fileio_attach(268) -2
open_path(120) Can't open /dev/rac1/asm2 -2
fileio_attach(268) -2
open_path(120) Can't open /dev/rac1/asm3 -2
fileio_attach(268) -2
open_path(120) Can't open /dev/rac1/asm4 -2
fileio_attach(268) -2
Please note that I am not suggesting that this only occurs with USB drives
connected to the Openfiler server. It may occur with other types of drives, however
I have only seen it with USB drives!
If you do receive this error, you should first check the status of all
logical volumes using the lvscan command from the Openfiler server:
# lvscan
inactive '/dev/rac1/crs' [2.00 GB] inherit
inactive '/dev/rac1/asm1' [115.94 GB] inherit
inactive '/dev/rac1/asm2' [115.94 GB] inherit
inactive '/dev/rac1/asm3' [115.94 GB] inherit
inactive '/dev/rac1/asm4' [115.94 GB] inherit
Notice that the status for each of the logical volumes is set to inactive - (the status
for each logical volume on a working system would be set to ACTIVE).
I currently know of two methods to get Openfiler to automatically load
the logical volumes on reboot, both of which are described below.
Method 1
One of the first steps is to shutdown both of the Oracle RAC nodes in the cluster - (linux1
and linux2). Then, from the Openfiler server, manually set each of the logical
volumes to ACTIVE for each consecutive reboot:
# lvchange -a y /dev/rac1/crs
# lvchange -a y /dev/rac1/asm1
# lvchange -a y /dev/rac1/asm2
# lvchange -a y /dev/rac1/asm3
# lvchange -a y /dev/rac1/asm4
Another method to set the status to active for all logical volumes is to use the Volume Group
change command as follows:
# vgscan
Reading all physical volumes. This may take a while...
Found volume group "rac1" using metadata type lvm2
# vgchange -ay
5 logical volume(s) in volume group "rac1" now active
After setting each of the logical volumes to active, use the lvscan command
again to verify the status:
# lvscan
ACTIVE '/dev/rac1/crs' [2.00 GB] inherit
ACTIVE '/dev/rac1/asm1' [115.94 GB] inherit
ACTIVE '/dev/rac1/asm2' [115.94 GB] inherit
ACTIVE '/dev/rac1/asm3' [115.94 GB] inherit
ACTIVE '/dev/rac1/asm4' [115.94 GB] inherit
As a final test, reboot the Openfiler server to ensure each of the logical volumes
will be set to ACTIVE after the boot process. After you have verified that
each of the logical volumes will be active on boot, check that the iSCSI
target service is running:
# service iscsi-target status
ietd (pid 2668) is running...
Finally, restart each of the Oracle RAC nodes in the
cluster - (linux1 and linux2).
Method 2
This method was kindly provided by
Martin Jones.
His workaround includes amending the /etc/rc.sysinit script to
basically wait for the USB disk (/dev/sda in my example) to be detected.
After making the changes to the /etc/rc.sysinit script (described below),
verify the external drives are powered on and then reboot the Openfiler server.
The following is a small portion of the /etc/rc.sysinit script on the Openfiler
server with the changes (highlighted in blue) proposed by Martin:
..............................................................
# LVM2 initialization, take 2
if [ -c /dev/mapper/control ]; then
if [ -x /sbin/multipath.static ]; then
modprobe dm-multipath >/dev/null 2>&1
/sbin/multipath.static -v 0
if [ -x /sbin/kpartx ]; then
/sbin/dmsetup ls --target multipath --exec "/sbin/kpartx -a"
fi
fi
if [ -x /sbin/dmraid ]; then
modprobe dm-mirror > /dev/null 2>&1
/sbin/dmraid -i -a y
fi
#-----
#----- MJONES - Customisation Start
#-----
# Check if /dev/sda is ready
while [ ! -e /dev/sda ]
do
echo "Device /dev/sda for first USB Drive is not yet ready."
echo "Waiting..."
sleep 5
done
echo "INFO - Device /dev/sda for first USB Drive is ready."
#-----
#----- MJONES - Customisation END
#-----
if [ -x /sbin/lvm.static ]; then
if /sbin/lvm.static vgscan > /dev/null 2>&1 ; then
action $"Setting up Logical Volume
Management:" /sbin/lvm.static vgscan --mknodes --ignorelockingfailure &&
/sbin/lvm.static vgchange -a y --ignorelockingfailure
fi
fi
fi
# Clean up SELinux labels
if [ -n "$SELINUX" ]; then
for file in /etc/mtab /etc/ld.so.cache ; do
[ -r $file ] && restorecon $file >/dev/null 2>&1
done
fi
..............................................................
Finally, restart each of the Oracle RAC nodes in the
cluster - (linux1 and linux2).
OCFS2 - o2cb_ctl: Unable to access cluster service while creating node
While configuring the nodes for OCFS2 using ocfs2console, it is possible to run into the error:
o2cb_ctl: Unable to access cluster service while creating node
This error does not show up when you startup ocfs2console for the first time. This message
comes up when there is a problem with the cluster configuration or if you do not save the
cluster configuration initially while setting it up using ocfs2console. This is a bug!
The work-around is to exit from the ocfs2console, unload the o2cb module and remove the
ocfs2 cluster configuration file /etc/ocfs2/cluster.conf. I also like to remove the
/config directory. After removing the ocfs2 cluster configuration file, restart
the ocfs2console program.
For example:
# /etc/init.d/o2cb offline ocfs2
# /etc/init.d/o2cb unload
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK
# rm -f /etc/ocfs2/cluster.conf
# rm -rf /config
# ocfs2console &
This time, it will add the nodes!
OCFS2 - Adjusting the O2CB Heartbeat Threshold
With previous versions of this article, (using FireWire as opposed to
iSCSI for the shared storage), I was able to install and configure OCFS2, format the new
volume, and finally install Oracle Clusterware (with its two required shared files;
the voting disk and OCR file), located on the new OCFS2 volume. While I was able to
install Oracle Clusterware and see the shared drive using FireWire, however, I was
receiving many lock-ups and hanging after about 15 minutes when the Clusterware software
was running on both nodes. It always varied on which node would hang (either
linux1 or linux2 in my example). It also didn't matter whether there
was a high I/O load or none at all for it to crash (hang).
After looking through the trace files for OCFS2, it was apparent that access to the
voting disk was too slow (exceeding the O2CB heartbeat threshold) and causing the
Oracle Clusterware software (and the node) to crash. On the console would be a message
similar to the following:
...
Index 0: took 0 ms to do submit_bio for read
Index 1: took 3 ms to do waiting for read completion
Index 2: took 0 ms to do bio alloc write
Index 3: took 0 ms to do bio add page write
Index 4: took 0 ms to do submit_bio for write
Index 5: took 0 ms to do checking slots
Index 6: took 4 ms to do waiting for write completion
Index 7: took 1993 ms to do msleep
Index 8: took 0 ms to do allocating bios for read
Index 9: took 0 ms to do bio alloc read
Index 10: took 0 ms to do bio add page read
Index 11: took 0 ms to do submit_bio for read
Index 12: took 10006 ms to do waiting for read completion
(13,3):o2hb_stop_all_regions:1888 ERROR: stopping heartbeat on all active regions.
Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing
The solution I used was to increase the O2CB heartbeat threshold from its
default value of 31 (which used to be 7 in previous versions of OCFS2), to a value of 61.
Some setups may require an even higher setting. This is
a configurable parameter that is used to compute the time it takes for a node
to "fence" itself. During the installation and configuration of OCFS2, we adjusted
this value in the section
"Configure O2CB to Start on Boot and Adjust O2CB Heartbeat Threshold".
If you encounter a kernel panic from OCFS2 and need to increase the heartbeat threshold, use
the same procedures described in the section
"Configure O2CB to Start on Boot and Adjust O2CB Heartbeat Threshold".
The following
describes how to manually adjust the O2CB heartbeat threshold.
First, let's see how to determine what the O2CB heartbeat threshold is currently set to.
This can be done by querying the /proc file system as follows:
# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
31
We see that the value is 31, but what does this value represent? Well,
it is used in the formula below to determine the fence time (in seconds):
[fence time in seconds] = (O2CB_HEARTBEAT_THRESHOLD - 1) * 2
So, with an O2CB heartbeat threshold of 31, we would have a fence time of:
(31 - 1) * 2 = 60 seconds
If we want a larger threshold (say 120 seconds), we would need
to adjust O2CB_HEARTBEAT_THRESHOLD to 61 as shown below:
(61 - 1) * 2 = 120 seconds
Let's see now how to manually increase the O2CB heartbeat threshold from
31 to 61. This task will need to be performed on all Oracle RAC nodes in the cluster.
We first need to modify the file /etc/sysconfig/o2cb
and set O2CB_HEARTBEAT_THRESHOLD to 61:
#
# This is a configuration file for automatic startup of the O2CB
# driver. It is generated by running /etc/init.d/o2cb configure.
# Please use that method to modify this file
#
# O2CB_ENABELED: 'true' means to load the driver on boot.
O2CB_ENABLED=true
# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=ocfs2
# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD=61
# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead.
O2CB_IDLE_TIMEOUT_MS=30000
# O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent
O2CB_KEEPALIVE_DELAY_MS=2000
# O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
O2CB_RECONNECT_DELAY_MS=2000
After modifying the file /etc/sysconfig/o2cb, we need to
alter the o2cb configuration. Again, this should be performed
on all Oracle RAC nodes in the cluster.
# umount /u02
# /etc/init.d/o2cb offline ocfs2
# /etc/init.d/o2cb unload
# /etc/init.d/o2cb configure
Configuring the O2CB driver.
This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot. The current values will be shown in brackets ('[]'). Hitting
<ENTER> without typing an answer will keep that current value. Ctrl-C
will abort.
Load O2CB driver on boot (y/n) [n]: y
Cluster to start on boot (Enter "none" to clear) [ocfs2]: ocfs2
Specify heartbeat dead threshold (>=7) [31]: 61
Specify network idle timeout in ms (>=5000) [30000]: 30000
Specify network keepalive delay in ms (>=1000) [2000]: 2000
Specify network reconnect delay in ms (>=2000) [2000]: 2000
Writing O2CB configuration: OK
Loading module "configfs": OK
Mounting configfs filesystem at /sys/kernel/config: OK
Loading module "ocfs2_nodemanager": OK
Loading module "ocfs2_dlm": OK
Loading module "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Starting O2CB cluster ocfs2: OK
We can now check again to make sure the settings took place
in for the o2cb cluster stack:
# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
61
It is important to note that the value of 61 I used for the
O2CB heartbeat threshold may not work for all configurations.
In some cases, the O2CB heartbeat
threshold value may have to be increased to as high as 601 in order to prevent
OCFS2 from panicking the kernel.
Oracle Clusterware Installation: Running root.sh Fails on the Last Node
After the Oracle Clusterware install process, running
root.sh on the last node will fail while attempting
to configure vipca at the end of the script:
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
/u01/app/crs/jdk/jre//bin/java: error while loading
shared libraries: libpthread.so.0:
cannot open shared object file: No such file or directory
After receiving this error, please leave the OUI up. Do not hit the OK button on the
"Execute Configuration Scripts" dialog until all of the
issues described in this section have been resolved.
Note that srvctl will produce similar output
until the workaround described below is performed.
This error occurs because these releases of the Linux kernel
fix an old bug in the Linux threading that Oracle worked around
using LD_ASSUME_KERNEL settings in both vipca
and srvctl, this workaround is no longer valid on
OEL5 or RHEL5 or SLES10 hence the failures.
To workaround this issue, edit vipca (in the CRS bin directory
on all nodes) to undo the setting of LD_ASSUME_KERNEL.
After the IF statement around line 120, add an unset
command to ensure LD_ASSUME_KERNEL is not set as follows:
if [ "$arch" = "i686" -o "$arch" = "ia64" ]
then
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
fi
unset LD_ASSUME_KERNEL <<== Line to be added
|
Similarly for srvctl (in both the CRS and,
when installed, RDBMS and ASM bin directories on all nodes),
unset LD_ASSUME_KERNEL by adding one line, around
line 168 should look like this:
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
unset LD_ASSUME_KERNEL <<== Line to be added
|
Note: Remember to re-edit these files on all nodes:
<ORA_CRS_HOME>/bin/vipca
<ORA_CRS_HOME>/bin/srvctl
<RDBMS_HOME>/bin/srvctl
<ASM_HOME>/bin/srvctl # (If exists)
after applying the 10.2.0.2 or 10.2.0.3 patch sets, as these patchset will
still include those settings unnecessary for OEL5 or RHEL5 or SLES10.
This issue was raised with development and is fixed in the
10.2.0.4 patchsets.
Also note that we are explicitly unsetting LD_ASSUME_KERNEL
and not merely commenting out its setting to handle a case where
the user has it set in their environment (login shell).
After working around the LD_ASSUME_KERNEL issue above, vipca will now fail
to run with the following error if the VIP IP's are in a non-routable
range [10.x.x.x, 172.(16-31).x.x, or 192.168.x.x]:
[root@linux2 ~]# $ORA_CRS_HOME/bin/vipca
Error 0(Native: listNetInterfaces:[3])
[Error 0(Native: listNetInterfaces:[3])]
There are several ways to workaround this issue. The goal to this workaround
is to get the output of "$ORA_CRS_HOME/bin/oifcfg getif" to include both
public and cluster_interconnect interfaces. If you try to run the above
command, you will notice it provides nothing which means we have some work
to do!
The first step is to identify the current interfaces and IP addresses:
[root@linux2 ~]# $ORA_CRS_HOME/bin/oifcfg iflist
eth1 192.168.2.0
eth0 192.168.1.0
Remember during the Oracle Clusterware install that
192.168.1.0 is my public interface while
192.168.2.0 is the cluster_interconnect interface.
Using this information, we can manually set the public / private
interfaces accordingly using the setif option of the
$ORA_CRS_HOME/bin/oifcfg command:
# $ORA_CRS_HOME/bin/oifcfg setif -global eth0/192.168.1.0:public
# $ORA_CRS_HOME/bin/oifcfg setif -global eth1/192.168.2.0:cluster_interconnect
Let's know run the "$ORA_CRS_HOME/bin/oifcfg getif" command again
to verify its output:
[root@linux2 ~]# $ORA_CRS_HOME/bin/oifcfg getif
eth0 192.168.1.0 global public
eth1 192.168.2.0 global cluster_interconnect
After resolving all of the issues above, manually
re-run vipca (GUI) as root from the last node in which the
errors occurred. Please keep in mind that vipca is a GUI
and will need to set your DISPLAY variable accordingly
to your X server:
# $ORA_CRS_HOME/bin/vipca
33.
Conclusion
Oracle10g RAC allows the DBA to configure a database solution with superior
fault tolerance and load balancing. For those DBAs, however, that want to
become more familiar with the features and benefits of Oracle10g RAC will
find the costs of configuring even a small RAC cluster costing in the
range of US$15,000 to US$20,000.
This article has hopefully given you an economical solution to setting up
and configuring an inexpensive Oracle10g Release 2 RAC Cluster using Oracle Enterprise Linux
and iSCSI technology. The RAC solution presented in this article can be
put together for around US$2,700 and will provide the DBA with a fully
functional Oracle10g Release 2 RAC cluster. While the hardware used for this
article should be stable enough for educational purposes, it should never be considered for a
production environment.
34.
Acknowledgements
An article of this magnitude and complexity is generally not the work
of one person alone. Although I was able to author and successfully
demonstrate the validity of the components that make up this configuration,
there are several other individuals that deserve credit in making this
article a success.
First, I would like to thank
Bane Radulovic from
the Server BDE Team at Oracle. Bane not only introduced me to Openfiler,
but shared with me his experience and knowledge of the product and how to best
utilize it for Oracle RAC.
His research and hard work made the task of configuring Openfiler seamless.
Bane was also involved with hardware recommendations and testing.
A special thanks to K Gopalakrishnan for his assistance in delivering
the Oracle RAC 10g Overview section of this article.
In this section, much of the content regarding the history of Oracle RAC
can be found in his very popular book
Oracle Database 10g Real Application Clusters Handbook.
This book comes highly recommended for both DBAs and Developers wanting to
successfully implement Oracle RAC and fully understand how many of the advanced
services like Cache Fusion and Global Resource Directory operate.
I would next like to thank Oracle ACE Werner Puschitz for his outstanding work on
"Installing Oracle Database 10g with Real Application Cluster (RAC) on
Red Hat Enterprise Linux Advanced Server 3". This article,
along with several others of his, provided information on Oracle RAC that could not
be found in any other Oracle documentation. Without his hard work and research
into issues like configuring OCFS2 and ASMLib,
this article may have never come to fruition. If you are interested in examining
technical articles on Linux internals and in-depth Oracle configurations written
by Werner Puschitz, please visit his website at
www.puschitz.com.
Also, thanks to Tzvika Lemel
for his comments and suggestions on using Oracle's Cluster Verification Utility (CVU).
Lastly, I would like to express my appreciation to the following vendors for generously supplying the
hardware for this article; Stallard Technologies, Inc.,
Seagate,
Avocent Corporation,
Intel,
D-Link,
SIIG,
and
LaCie.
Jeffrey Hunter
[www.idevelopment.info]
graduated from Stanislaus State University in Turlock, California, with a Bachelor's degree in
Computer Science and has been a senior DBA and software engineer for over 16 years. He is an
Oracle Certified Professional, Java Development Certified Professional, an Oracle ACE, and author, and
currently works as a Senior Database Administrator for
The DBA Zone, Inc.
Jeff's work includes advanced performance tuning, Java and PL/SQL programming, capacity planning,
database security, and physical / logical database design in a UNIX, Linux, and
Windows NT environment. Jeff's other interests include mathematical encryption theory, programming
language processors (compilers and interpreters) in Java and C, LDAP, writing web-based database
administration tools, and of course Linux.
Send us your comments
Page
1 Page 2 Page
3
|