Page 1   Page 2   Page 3

Build Your Own Oracle RAC Cluster on Oracle Enterprise Linux and iSCSI (Continued)

The information in this guide is not validated by Oracle, is not supported by Oracle, and should only be used at your own risk; it is for educational purposes only.


21. Install Oracle 10g Clusterware Software

Perform the following installation procedures from only one of the Oracle RAC nodes in the cluster (linux1)! The Oracle Clusterware software will be installed to both of the Oracle RAC nodes in the cluster by the Oracle Universal Installer.

You are now ready to install the "cluster" part of the environment: the Oracle Clusterware. In the previous section, you downloaded and extracted the install files for Oracle Clusterware to linux1 in the directory /home/oracle/orainstall/clusterware. This is the only node from which you need to perform the install from.

During the installation of Oracle Clusterware, you will be asked for the nodes involved and to configure in the RAC cluster. Once the actual installation starts, it will copy the required software to all nodes using the remote access we configured in the section Section 15 ("Configure RAC Nodes for Remote Access using SSH").

So, what exactly is the Oracle Clusterware responsible for? It contains all of the cluster and database configuration metadata along with several system management features for RAC. It allows the DBA to register and invite an Oracle instance (or instances) to the cluster. During normal operation, Oracle Clusterware will send messages (via a special ping operation) to all nodes configured in the cluster, often called the "heartbeat." If the heartbeat fails for any of the nodes, it checks with the Oracle Clusterware configuration files (on the shared disk) to distinguish between a real node failure and a network failure.

After installing Oracle Clusterware, the Oracle Universal Installer (OUI) used to install the Oracle Database software (next section) will automatically recognize these nodes. Like the Oracle Clusterware install you will be performing in this section, the Oracle Database software only needs to be run from one node. The OUI will copy the software packages to all nodes configured in the RAC cluster.

Oracle Clusterware Shared Files

The two shared files (actually file groups) used by Oracle Clusterware will be stored on the Oracle Cluster File System, Release 2 (OFCS2) we created earlier. The two shared Oracle Clusterware file groups are:

 

  • Oracle Cluster Registry (OCR)

     

    • File 1 : /u02/oradata/racdb/OCRFile
    • File 2 : /u02/oradata/racdb/OCRFile_mirror
    • Size : (2 * 100MB) = 200M

     

  • CRS Voting Disk

     

    • File 1 : /u02/oradata/racdb/CSSFile
    • File 2 : /u02/oradata/racdb/CSSFile_mirror1
    • File 3 : /u02/oradata/racdb/CSSFile_mirror2
    • Size : (3 * 20MB) = 60MB


Note: It is not possible to use Automatic Storage Management (ASM) for the two shared Oracle Clusterware files: Oracle Cluster Registry (OCR) or the CRS Voting Disk files. The problem is that these files need to be in place and accessible BEFORE any Oracle instances can be started. For ASM to be available, the ASM instance would need to be run first.

Also note that the two shared files could be stored on the OCFS2, shared RAW devices, or another vendor's clustered file system.

Verifying Terminal Shell Environment

Before starting the Oracle Universal Installer, you should first verify you are logged onto the server you will be running the installer from (i.e. linux1) then run the xhost command as root from the console to allow X Server connections. Next, login as the oracle user account. If you are using a remote client to connect to the node performing the installation (SSH or Telnet to linux1 from a workstation configured with an X Server), you will need to set the DISPLAY variable to point to your local workstation. Finally, verify remote access / user equivalence to all nodes in the cluster:

Verify Server and Enable X Server Access

 

#  
                                   hostname
linux1

#  
                                   xhost +
access control disabled, clients can connect from any host
                                

Login as the oracle User Account and Set DISPLAY (if necessary)

 

#  
                                   su - oracle
$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$  
                                   DISPLAY=<your local workstation>:0.0
$  
                                   export DISPLAY
                                

Verify Remote Access / User Equivalence

Verify you are able to run the Secure Shell commands ( ssh or scp) on the Linux server you will be running the Oracle Universal Installer from against all other Linux servers in the cluster without being prompted for a password. When using the secure shell method, user equivalence will need to be enabled for the terminal shell session before attempting to run the OUI. To enable user equivalence for the current terminal shell session, perform the following steps remembering to enter the pass phrase for the RSA key you generated when prompted:

$  
                                   exec /usr/bin/ssh-agent $SHELL
$  
                                   /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa:  
                                   xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)

$  
                                   ssh linux1 "date;hostname"
Fri Jul 31 22:28:43 EDT 2009
linux1

$  
                                   ssh linux2 "date;hostname"
Fri Jul 31 22:29:00 EDT 2009
linux2
                                

Installing Cluster Ready Services

Perform the following tasks to install the Oracle Clusterware:

$  
                                 cd ~oracle
$  
                                 /home/oracle/orainstall/clusterware/runInstaller -ignoreSysPrereqs
                              

 

Screen Name Response
Welcome Screen Click Next
Specify Inventory directory and credentials Accept the default values:
   Inventory directory: /u01/app/oracle/oraInventory
   Operating System group name: oinstall
Specify Home Details Set the Name and Path for the ORACLE_HOME (actually the $ORA_CRS_HOME that I will be using in this article) as follows:
   Name: OraCrs10g_home
   Path: /u01/app/crs
Product-Specific Prerequisite Checks The installer will run through a series of checks to determine if the node meets the minimum requirements for installing and configuring the Oracle Clusterware software. If any of the checks fail, you will need to manually verify the check that failed by clicking on the checkbox. For my installation, all checks passed with no problems.

Click Next to continue.

Specify Cluster Configuration Cluster Name: crs
Public Node Name Private Node Name Virtual Node Name
linux1 linux1-priv linux1-vip
linux2 linux2-priv linux2-vip
Specify Network Interface Usage
Interface Name Subnet Interface Type
eth0 192.168.1.0 Public
eth1 192.168.2.0 Private
Specify OCR Location Starting with Oracle Database 10g Release 2 (10.2) with RAC, Oracle Clusterware provides for the creation of a mirrored OCR file, enhancing cluster reliability. For the purpose of this example, I did choose to mirror the OCR file by keeping the default option of "Normal Redundancy":

Specify OCR Location: /u02/oradata/racdb/OCRFile
Specify OCR Mirror Location: /u02/oradata/racdb/OCRFile_mirror

Specify Voting Disk Location Starting with Oracle Database 10g Release 2 (10.2) with RAC, CSS has been modified to allow you to configure CSS with multiple voting disks. In Release 1 (10.1), you could configure only one voting disk. By enabling multiple voting disk configuration, the redundant voting disks allow you to configure a RAC database with multiple voting disks on independent shared physical disks. This option facilitates the use of the iSCSI network protocol, and other Network Attached Storage (NAS) storage solutions. Note that to take advantage of the benefits of multiple voting disks, you must configure at least three voting disks. For the purpose of this example, I did choose to mirror the voting disk by keeping the default option of "Normal Redundancy":

Voting Disk Location: /u02/oradata/racdb/CSSFile
Additional Voting Disk 1 Location: /u02/oradata/racdb/CSSFile_mirror1
Additional Voting Disk 2 Location: /u02/oradata/racdb/CSSFile_mirror2

Summary

Click Install to start the installation!

Execute Configuration Scripts

orainstRoot.sh

After the installation has completed, you will be prompted to run the orainstRoot.sh and root.sh script. Open a new console window on both Oracle RAC nodes in the cluster, (starting with the node you are performing the install from), as the "root" user account.

Navigate to the /u01/app/oracle/oraInventory directory and run orainstRoot.sh on all nodes in the RAC cluster.

Note: After executing the orainstRoot.sh on both nodes, verify the permissions of the file /etc/oraInst.loc are 644 (-rw-r--r--) and owned by root. Problems can occur during the installation of Oracle if the oracle user account does not have read permissions to this file - (the location of the oraInventory directory cannot be determined). For example, during the Oracle Clusterware post-installation process (while running the Oracle Clusterware Verification Utility), the following error will occur: " CRS is not installed on any of the nodes." If the permissions to /etc/oraInst.loc are not set correctly, it is possible you didn't run orainstRoot.sh on both nodes before running root.sh. Also, the umask setting may be off - it should be 0022. Run the following on both nodes in the RAC cluster to correct this problem:

#  
                                         chmod 644 /etc/oraInst.loc
#  
                                         ls -l /etc/oraInst.loc
-rw-r--r-- 1 root root 63 Aug  1 12:31 /etc/oraInst.loc
                                      

 




root.sh

Within the same new console window on both Oracle RAC nodes in the cluster, (starting with the node you are performing the install from), stay logged in as the "root" user account.

Navigate to the /u01/app/crs directory and locate the root.sh file for each node in the cluster - (starting with the node you are performing the install from). Run the root.sh file ON ALL NODES in the RAC cluster ONE AT A TIME.

If the Oracle Clusterware home directory is a sub-directory of the ORACLE_BASE directory (which should never be!), you will receive several warnings regarding permission while running the root.sh script on both nodes. These warnings can be safely ignored.

The root.sh may take awhile to run.

Running root.sh on the Last Node will Fail!

After receiving any of the errors described in this section, please leave the OUI up. Do not hit the OK button on the "Execute Configuration Scripts" dialog until all of the issues described in this section have been resolved.

Issue #1

At the end of root.sh on the last node, vipca will fail to run with the following error:

                                         
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
/u01/app/crs/jdk/jre//bin/java: error while loading
shared libraries: libpthread.so.0: 
cannot open shared object file: No such file or directory
                                      

Note that srvctl will produce similar output until the workaround described below is performed.

This error occurs because these releases of the Linux kernel fix an old bug in the Linux threading that Oracle worked around using LD_ASSUME_KERNEL settings in both vipca and srvctl, this workaround is no longer valid on OEL5 or RHEL5 or SLES10 hence the failures.

To workaround Issue #1 above, edit vipca (in the CRS bin directory on all nodes) to undo the setting of LD_ASSUME_KERNEL. After the IF statement around line 120, add an unset command to ensure LD_ASSUME_KERNEL is not set as follows:

 

if [ "$arch" = "i686" -o "$arch" = "ia64" ]
then
    LD_ASSUME_KERNEL=2.4.19
    export LD_ASSUME_KERNEL
fi

                                                unset LD_ASSUME_KERNEL    <<== Line to be added
                                            


Similarly for srvctl (in both the CRS and, when installed, RDBMS and ASM bin directories on all nodes), unset LD_ASSUME_KERNEL by adding one line, around line 168 should look like this:

 

LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL

                                                unset LD_ASSUME_KERNEL    <<== Line to be added
                                            


Note: Remember to re-edit these files on all nodes:

  <ORA_CRS_HOME>/bin/vipca
  <ORA_CRS_HOME>/bin/srvctl 
  <RDBMS_HOME>/bin/srvctl
  <ASM_HOME>/bin/srvctl  # (If exists)

after applying the 10.2.0.2 or 10.2.0.3 patch sets, as these patchset will still include those settings unnecessary for OEL5 or RHEL5 or SLES10. This issue was raised with development and is fixed in the 10.2.0.4 patchsets.

Also note that we are explicitly unsetting LD_ASSUME_KERNEL and not merely commenting out its setting to handle a case where the user has it set in their environment (login shell).

Issue #2

After working around Issue #1 above, vipca will now fail to run with the following error if the VIP IP's are in a non-routable range [ 10.x.x.x, 172.(16-31).x.x, or 192.168.x.x]:

                                         
[root@linux2 ~]#  
                                           $ORA_CRS_HOME/bin/vipca
Error 0(Native: listNetInterfaces:[3])
  [Error 0(Native: listNetInterfaces:[3])]
                                        
                                      

There are several ways to workaround Issue #2. The goal to this workaround is to get the output of " $ORA_CRS_HOME/bin/oifcfg getif" to include both public and cluster_interconnect interfaces. If you try to run the above command, you will notice it provides nothing which means we have some work to do!

The first step is to identify the current interfaces and IP addresses:

[root@linux2 ~]#  
                                         $ORA_CRS_HOME/bin/oifcfg iflist
eth1  192.168.2.0
eth0  192.168.1.0
                                      

Remember during the Oracle Clusterware install that 192.168.1.0 is my public interface while 192.168.2.0 is the cluster_interconnect interface.

Using this information, we can manually set the public / private interfaces accordingly using the setif option of the $ORA_CRS_HOME/bin/oifcfg command:

#  
                                         $ORA_CRS_HOME/bin/oifcfg setif -global eth0/192.168.1.0:public
#  
                                         $ORA_CRS_HOME/bin/oifcfg setif -global eth1/192.168.2.0:cluster_interconnect
                                      

Let's know run the " $ORA_CRS_HOME/bin/oifcfg getif" command again to verify its output:

[root@linux2 ~]#  
                                         $ORA_CRS_HOME/bin/oifcfg getif
eth0  192.168.1.0  global  public
eth1  192.168.2.0  global  cluster_interconnect
                                      

Manually Run vipca from Last Node

After resolving all of the issues above, manually re-run vipca (GUI) as root from the last node in which the errors occurred. Please keep in mind that vipca is a GUI and will need to set your DISPLAY variable accordingly to your X server:

# $ORA_CRS_HOME/bin/vipca

When the "VIP Configuration Assistant" appears, this is how I answered the screen prompts:

   Welcome: Click Next
   Network interfaces: Select only the public interface - eth0
   Virtual IPs for cluster nodes:
       Node Name: linux1
       IP Alias Name: linux1-vip
       IP Address: 192.168.1.200
       Subnet Mask: 255.255.255.0

       Node Name: linux2
       IP Alias Name: linux2-vip
       IP Address: 192.168.1.201
       Subnet Mask: 255.255.255.0

   Summary: Click Finish
   Configuration Assistant Progress Dialog: Click OK after configuration is complete.
   Configuration Results: Click Exit

Go back to the OUI and acknowledge the "Execute Configuration scripts" dialog window.

End of installation At the end of the installation, exit from the OUI.

Verify Oracle Clusterware Installation

After the installation of Oracle Clusterware, we can run through several tests to verify the install was successful. Run the following commands on both nodes in the RAC cluster.

Check cluster nodes

$  
                                   /u01/app/crs/bin/olsnodes -n
linux1 1
linux2 2
                                

Confirm Oracle Clusterware Function

$  
                                   $ORA_CRS_HOME/bin/crs_stat -t -v
Name           Type           R/RA   F/FT   Target    State     Host
----------------------------------------------------------------------
ora.linux1.gsd application    0/5    0/0    ONLINE    ONLINE    linux1
ora.linux1.ons application    0/3    0/0    ONLINE    ONLINE    linux1
ora.linux1.vip application    0/0    0/0    ONLINE    ONLINE    linux1
ora.linux2.gsd application    0/5    0/0    ONLINE    ONLINE    linux2
ora.linux2.ons application    0/3    0/0    ONLINE    ONLINE    linux2
ora.linux2.vip application    0/0    0/0    ONLINE    ONLINE    linux2
                                

Check CRS Status

$  
                                   $ORA_CRS_HOME/bin/crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
                                
Check Oracle Clusterware Auto-Start Scripts
$  
                                   ls -l /etc/init.d/init.*
-r-xr-xr-x 1 root root  1951 Aug  1 12:49 /etc/init.d/init.crs
-r-xr-xr-x 1 root root  4699 Aug  1 12:49 /etc/init.d/init.crsd
-r-xr-xr-x 1 root root 35379 Aug  1 12:49 /etc/init.d/init.cssd
-r-xr-xr-x 1 root root  3175 Aug  1 12:49 /etc/init.d/init.evmd
                                

 


22. Install Oracle Database 10g Software

Perform the following installation procedures from only one of the Oracle RAC nodes in the cluster (linux1)! The Oracle Database software will be installed to both of Oracle RAC nodes in the cluster by the Oracle Universal Installer.

After successfully installing the Oracle Clusterware software, the next step is to install Oracle Database 10g Release 2 (10.2.0.1.0) with RAC.

For the purpose of this example, you will forgo the "Create Database" option when installing the software. You will, instead, create the database using the Database Configuration Assistant (DBCA) after the install.

Like the Oracle Clusterware install (previous section), the Oracle 10g database software only needs to be run from one node. The OUI will copy the software packages to all nodes configured in the RAC cluster.

Verifying Terminal Shell Environment

As discussed in the previous section, (Install Oracle Clusterware 10g Software), the terminal shell environment needs to be configured for remote access and user equivalence to all nodes in the cluster before running the Oracle Universal Installer. Note that you can utilize the same terminal shell session used in the previous section which in this case, you do not have to perform any of the actions described below with regards to setting up remote access and the DISPLAY variable:

Login as the oracle User Account and Set DISPLAY (if necessary)

 

#  
                                   su - oracle
$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$  
                                   DISPLAY=<your local workstation>:0.0
$  
                                   export DISPLAY
                                

Verify Remote Access / User Equivalence

Verify you are able to run the Secure Shell commands ( ssh or scp) on the Linux server you will be running the Oracle Universal Installer from against all other Linux servers in the cluster without being prompted for a password. When using the secure shell method, user equivalence will need to be enabled for the terminal shell session before attempting to run the OUI. To enable user equivalence for the current terminal shell session, perform the following steps remembering to enter the pass phrase for the RSA key you generated when prompted:

$  
                                   exec /usr/bin/ssh-agent $SHELL
$  
                                   /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa:  
                                   xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)

$  
                                   ssh linux1 "date;hostname"
Sat Aug  1 14:14:42 EDT 2009
linux1

$  
                                   ssh linux2 "date;hostname"
Sat Aug  1 14:15:16 EDT 2009
linux2
                                

Run the Oracle Cluster Verification Utility

Before installing the Oracle Database Software, we should run the following database pre-installation check using the Cluster Verification Utility (CVU).

Note: Instructions for configuring CVU can be found in the section " Prerequisites for Using Cluster Verification Utility discussed earlier in this article.

$  
                                 cd /home/oracle/orainstall/clusterware/cluvfy
$  
                                 CV_HOME=/home/oracle/orainstall/clusterware/cluvfy; export CV_HOME
$  
                                 CV_JDKHOME=/home/oracle/orainstall/clusterware/cluvfy/jdk14; export CV_JDKHOME
$  
                                 ./runcluvfy.sh stage -pre dbinst -n linux1,linux2 -r 10gR2 -verbose
                              

Review the CVU report. Note that this report will contain the same errors we received when checking pre-installation tasks for CRS — failure to find a suitable set of interfaces for VIPs and the failure to find specific RPM packages that are not required with Oracle Enterprise Linux 5. These two errors can be safely ignored.

Install Oracle Database 10g Release 2 Software

Install the Oracle Database 10g Release 2 software with the following:

$  
                                 cd ~oracle
$  
                                 /home/oracle/orainstall/database/runInstaller -ignoreSysPrereqs
                              

 

Screen Name Response
Welcome Screen Click Next
Select Installation Type Select the type of installation you would like to perform: "Enterprise Edition", "Standard Edition", or "Custom".

If you need advanced components like Oracle Label Security or if you simply want to customize the environment, select "Custom".

For the purpose of this article, I selected the "Custom" option.

Specify Home Details Set the Name and Path for the ORACLE_HOME as follows:
   Name: OraDb10g_home1
   Path: /u01/app/oracle/product/10.2.0/db_1
Specify Hardware Cluster Installation Mode Select the Cluster Installation option then select all nodes available. Click "Select All" to select all servers: linux1 and linux2.

If the installation stops here and the status of any of the RAC nodes is "Node not reachable", perform the following checks:

  • Ensure Oracle Clusterware is running on the node in question.
  • Ensure you are able to reach the node in question from the node you are performing the installation from.
Available Product Components
(Custom Database Installs Only)
Select the components that you plan on using for your database environment.
Product-Specific Prerequisite Checks The installer will run through a series of checks to determine if the node meets the minimum requirements for installing and configuring the Oracle database software. If any of the checks fail, you will need to manually verify the check that failed by clicking on the checkbox.

It is possible to receive an error about the available swap space not meeting its minimum requirements:

                                         
Checking available swap space requirements...
Expected result: 3036MB
Actual Result: 1983MB
                                      

In most cases, you will have the minimum required swap space (as shown above) and this can be safely ignored. Simply click the check-box for "Checking available swap space requirements..." and click Next to continue.

Privileged Operating System Groups
(Custom Database Installs Only)
Select the UNIX groups that will be used for each of the Oracle group names as follows:

   Database Administrator (OSDBA) Group: dba
   Database Operator (OSOPER) Group: oper

Create Database Select the option to "Install database Software only."

Remember that we will create the clustered database as a separate step using DBCA.

Summary

Click on Install to start the installation!

Root Script Window - Run root.sh After the installation has completed, you will be prompted to run the root.sh script. It is important to keep in mind that the root.sh script will need to be run on all nodes in the RAC cluster one at a time starting with the node you are running the database installation from.

First, open a new console window on the node you are installing the Oracle 10g database software from as the root user account. For me, this was "linux1".

Navigate to the /u01/app/oracle/product/10.2.0/db_1 directory and run root.sh.

After running the root.sh script on all nodes in the cluster, go back to the OUI and acknowledge the "Execute Configuration scripts" dialog window.

End of installation At the end of the installation, exit from the OUI.

Unset LD_ASSUME_KERNEL in SRVCTL

During the Oracle Clusterware installation in the previous section, we needed to modify both vipca and srvctl in the CRS bin directory. Those same modifications need to be performed with the new srvctl in the Database bin directory:

$  
                                   $ORACLE_HOME/bin/srvctl
                                   
/u01/app/oracle/product/10.2.0/db_1/jdk/jre/bin/java: error while loading
shared libraries: libpthread.so.0:
cannot open shared object file: No such file or directory
                                

This error occurs because these releases of the Linux kernel fix an old bug in the Linux threading that Oracle worked around using LD_ASSUME_KERNEL settings in both vipca and srvctl, this workaround is no longer valid on OEL5 or RHEL5 or SLES10 hence the failures.

To workaround this issue, edit srvctl (in the Database bin directory on all nodes) to undo the setting of LD_ASSUME_KERNEL by adding one line, around line 168:

 

LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL

                                        unset LD_ASSUME_KERNEL    <<== Line to be added
                                    


Note: Remember to re-edit these files on all nodes:

  <ORA_CRS_HOME>/bin/vipca
  <ORA_CRS_HOME>/bin/srvctl 
  <RDBMS_HOME>/bin/srvctl
  <ASM_HOME>/bin/srvctl  # (If exists)

after applying the 10.2.0.2 or 10.2.0.3 patch sets, as these patchset will still include those settings unnecessary for OEL5 or RHEL5 or SLES10. This issue was raised with development and is fixed in the 10.2.0.4 patchsets.

Also note that we are explicitly unsetting LD_ASSUME_KERNEL and not merely commenting out its setting to handle a case where the user has it set in their environment (login shell).

 


23. Install Oracle 10g Companion CD Software

Perform the following installation procedures from only one of the Oracle RAC nodes in the cluster (linux1)! The Oracle10g Companion CD software will be installed to both of Oracle RAC nodes in the cluster by the Oracle Universal Installer.

After successfully installing the Oracle Database software, the next step is to install the Oracle Database 10g Companion CD Release 2 software (10.2.0.1.0).

Please keep in mind that this is an optional step. For the purpose of this article, my testing database will often make use of the Java Virtual Machine (Java VM) and Oracle interMedia and therefore will require the installation of the Oracle Database 10g Companion CD. The type of installation to perform will be the Oracle Database 10g Products installation type.

This installation type includes the Natively Compiled Java Libraries (NCOMP) files to improve Java performance. If you do not install the NCOMP files, the ORA-29558:JAccelerator (NCOMP) not installed error occurs when a database that uses Java VM is upgraded to the patch release.

Verifying Terminal Shell Environment

As discussed in the previous section, (Install Oracle Database 10g Software), the terminal shell environment needs to be configured for remote access and user equivalence to all nodes in the cluster before running the Oracle Universal Installer. Note that you can utilize the same terminal shell session used in the previous section which in this case, you do not have to perform any of the actions described below with regards to setting up remote access and the DISPLAY variable:

Login as the oracle User Account and Set DISPLAY (if necessary)

 

#  
                                   su - oracle
$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$  
                                   DISPLAY=<your local workstation>:0.0
$  
                                   export DISPLAY
                                

Verify Remote Access / User Equivalence

Verify you are able to run the Secure Shell commands ( ssh or scp) on the Linux server you will be running the Oracle Universal Installer from against all other Linux servers in the cluster without being prompted for a password. When using the secure shell method, user equivalence will need to be enabled for the terminal shell session before attempting to run the OUI. To enable user equivalence for the current terminal shell session, perform the following steps remembering to enter the pass phrase for the RSA key you generated when prompted:

$  
                                   exec /usr/bin/ssh-agent $SHELL
$  
                                   /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa:  
                                   xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)

$  
                                   ssh linux1 "date;hostname"
Sat Aug  1 14:14:42 EDT 2009
linux1

$  
                                   ssh linux2 "date;hostname"
Sat Aug  1 14:15:16 EDT 2009
linux2
                                

Install Oracle Database 10g Companion CD Software

Install the Oracle Database 10g Companion CD software with the following:

$  
                                 cd ~oracle
$  
                                 /home/oracle/orainstall/companion/runInstaller -ignoreSysPrereqs
                              

 

Screen Name Response
Welcome Screen Click Next
Select a Product to Install Select the "Oracle Database 10g Products 10.2.0.1.0" option.
Specify Home Details Set the destination for the ORACLE_HOME Name and Path to that of the previous Oracle10g Database software install as follows:
   Name: OraDb10g_home1
   Path: /u01/app/oracle/product/10.2.0/db_1
Specify Hardware Cluster Installation Mode The Cluster Installation option will be selected along with all of the available nodes in the cluster by default. Stay with these default options and click Next to continue.

If the installation stops here and the status of any of the RAC nodes is "Node not reachable", perform the following checks:

  • Ensure Oracle Clusterware is running on the node in question.
  • Ensure you are able to reach the node in question from the node you are performing the installation from.
Product-Specific Prerequisite Checks The installer will run through a series of checks to determine if the node meets the minimum requirements for installing and configuring the Companion CD Software. If any of the checks fail, you will need to manually verify the check that failed by clicking on the checkbox. For my installation, all checks passed with no problems.

Click on Next to continue.

Summary On the Summary screen, click Install to start the installation!
End of installation At the end of the installation, exit from the OUI.

 


24. Create TNS Listener Process

Perform the following configuration procedures from only one of the Oracle RAC nodes in the cluster (linux1)! The Network Configuration Assistant (NETCA) will setup the TNS listener in a clustered configuration on both of Oracle RAC nodes in the cluster.

The DBCA requires the Oracle TNS Listener process to be configured and running on all nodes in the RAC cluster before it can create the clustered database.

The process of creating the TNS listener only needs to be performed on one node in the cluster. All changes will be made and replicated to all nodes in the cluster. On one of the nodes (I will be using linux1) bring up the NETCA and run through the process of creating a new TNS listener process and also configure the node for local access.

Verifying Terminal Shell Environment

As discussed in the previous section, (Install Oracle Database 10g Companion CD Software), the terminal shell environment needs to be configured for remote access and user equivalence to all nodes in the cluster before running the Oracle Universal Installer. Note that you can utilize the same terminal shell session used in the previous section which in this case, you do not have to perform any of the actions described below with regards to setting up remote access and the DISPLAY variable:

Login as the oracle User Account and Set DISPLAY (if necessary)

 

#  
                                   su - oracle
$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$  
                                   DISPLAY=<your local workstation>:0.0
$  
                                   export DISPLAY
                                

Verify Remote Access / User Equivalence

Verify you are able to run the Secure Shell commands ( ssh or scp) on the Linux server you will be running the Oracle Universal Installer from against all other Linux servers in the cluster without being prompted for a password. When using the secure shell method, user equivalence will need to be enabled for the terminal shell session before attempting to run the OUI. To enable user equivalence for the current terminal shell session, perform the following steps remembering to enter the pass phrase for the RSA key you generated when prompted:

$  
                                   exec /usr/bin/ssh-agent $SHELL
$  
                                   /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa:  
                                   xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)

$  
                                   ssh linux1 "date;hostname"
Sat Aug  1 14:14:42 EDT 2009
linux1

$  
                                   ssh linux2 "date;hostname"
Sat Aug  1 14:15:16 EDT 2009
linux2
                                

Run the Network Configuration Assistant

To start the NETCA, run the following:

$  
                                 netca &
                              
The following table walks you through the process of creating a new Oracle listener for our RAC environment.

 

Screen Name Response
Select the Type of Oracle
Net Services Configuration
Select Cluster Configuration
Select the nodes to configure Select all of the nodes: linux1 and linux2.
Type of Configuration Select Listener configuration.
Listener Configuration - Next 6 Screens The following screens are now like any other normal listener configuration. You can simply accept the default parameters for the next six screens:
   What do you want to do: Add
   Listener name: LISTENER
   Selected protocols: TCP
   Port number: 1521
   Configure another listener: No
   Listener configuration complete! [ Next ]
You will be returned to this Welcome (Type of Configuration) Screen.
Type of Configuration Select Naming Methods configuration.
Naming Methods Configuration The following screens are:
   Selected Naming Methods: Local Naming
   Naming Methods configuration complete! [ Next ]
You will be returned to this Welcome (Type of Configuration) Screen.
Type of Configuration Click Finish to exit the NETCA.


The Oracle TNS listener process should now be running on all nodes in the RAC cluster:

$  
                                 hostname
linux1

$  
                                 ps -ef | grep lsnr | grep -v 'grep' | grep -v 'ocfs' | awk '{print $9}'
LISTENER_LINUX1

=====================

$  
                                 hostname
linux2

$  
                                 ps -ef | grep lsnr | grep -v 'grep' | grep -v 'ocfs' | awk '{print $9}'
LISTENER_LINUX2
                              

 


25. Create the Oracle Cluster Database

The database creation process should only be performed from one of the Oracle RAC nodes in the cluster (linux1)!

We will use the DBCA to create the clustered database.

Before executing the DBCA, make sure that $ORACLE_HOME and $PATH are set appropriately for the $ORACLE_BASE/product/10.2.0/db_1 environment.

You should also verify that all services we have installed up to this point (Oracle TNS listener, Oracle Clusterware processes, etc.) are running before attempting to start the clustered database creation process.

Verifying Terminal Shell Environment

As discussed in the previous section, (Create TNS Listener Process), the terminal shell environment needs to be configured for remote access and user equivalence to all nodes in the cluster before running the Oracle Universal Installer. Note that you can utilize the same terminal shell session used in the previous section which in this case, you do not have to perform any of the actions described below with regards to setting up remote access and the DISPLAY variable:

Login as the oracle User Account and Set DISPLAY (if necessary)

 

#  
                                   su - oracle
$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$  
                                   DISPLAY=<your local workstation>:0.0
$  
                                   export DISPLAY
                                

Verify Remote Access / User Equivalence

Verify you are able to run the Secure Shell commands ( ssh or scp) on the Linux server you will be running the Oracle Universal Installer from against all other Linux servers in the cluster without being prompted for a password. When using the secure shell method, user equivalence will need to be enabled for the terminal shell session before attempting to run the OUI. To enable user equivalence for the current terminal shell session, perform the following steps remembering to enter the pass phrase for the RSA key you generated when prompted:

$  
                                   exec /usr/bin/ssh-agent $SHELL
$  
                                   /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa:  
                                   xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)

$  
                                   ssh linux1 "date;hostname"
Sat Aug  1 14:14:42 EDT 2009
linux1

$  
                                   ssh linux2 "date;hostname"
Sat Aug  1 14:15:16 EDT 2009
linux2
                                

Run the Oracle Cluster Verification Utility

Before creating the Oracle clustered database, we should run the following database configuration check using the Cluster Verification Utility (CVU).

Note: Instructions for configuring CVU can be found in the section " Prerequisites for Using Cluster Verification Utility discussed earlier in this article.

$  
                                 cd /home/oracle/orainstall/clusterware/cluvfy
$  
                                 CV_HOME=/home/oracle/orainstall/clusterware/cluvfy; export CV_HOME
$  
                                 CV_JDKHOME=/home/oracle/orainstall/clusterware/cluvfy/jdk14; export CV_JDKHOME
$  
                                 ./runcluvfy.sh stage -pre dbcfg -n linux1,linux2 -d ${ORACLE_HOME} -verbose
                              

Review the CVU report. Note that this report will contain the same error we received when checking pre-installation tasks for CRS — failure to find a suitable set of interfaces for VIPs. This error can be safely ignored.

Create the Clustered Database

To start the database creation process, run the following:

$  
                                 dbca &
                              
Screen Name Response
Welcome Screen Select "Oracle Real Application Clusters database."
Operations Select Create a Database.
Node Selection Click on the Select All button to select all servers: linux1 and linux2.
Database Templates Select Custom Database.
Database Identification Select:
   Global Database Name: racdb.idevelopment.info
   SID Prefix: racdb

I used idevelopment.info for the database domain. You may use any domain. Keep in mind that this domain does not have to be a valid DNS domain.

Management Option Leave the default options here, which is to "Configure the Database with Enterprise Manager / Use Database Control for Database Management."
Database Credentials I selected to Use the Same Password for All Accounts. Enter the password (twice) and make sure the password does not start with a digit number.
Storage Options For this guide, we will select to use Automatic Storage Management (ASM).
Create ASM Instance Supply the SYS password to use for the new ASM instance.

Also, starting with Oracle10g Release 2, the ASM instance server parameter file (SPFILE) needs to be on a shared disk. You will need to modify the default entry for "Create server parameter file (SPFILE)" to reside on the OCFS2 partition as follows: /u02/oradata/racdb/dbs/spfile+ASM.ora. All other options can stay at their defaults.

You will then be prompted with a dialog box asking if you want to create and start the ASM instance. Select the OK button to acknowledge this dialog.

The OUI will now create and start the ASM instance on all nodes in the RAC cluster.

ASM Disk Groups

To start, click the Create New button. This will bring up the "Create Disk Group" window with the four volumes we configured earlier using ASMLib.

 

If the volumes we created earlier in this article do not show up in the "Select Member Disks" window: (ORCL:VOL1, ORCL:VOL2, ORCL:VOL3, and ORCL:VOL4) then click on the "Change Disk Discovery Path" button and input "ORCL:VOL*".

For the first "Disk Group Name", I used the string "RACDB_DATA1". Select the first two ASM volumes (ORCL:VOL1 and ORCL:VOL2) in the "Select Member Disks" window. Keep the "Redundancy" setting to "Normal".

After verifying all values in this window are correct, click the [OK] button. This will present the "ASM Disk Group Creation" dialog. When the ASM Disk Group Creation process is finished, you will be returned to the "ASM Disk Groups" windows.

Click the Create New button again. For the second "Disk Group Name", I used the string "FLASH_RECOVERY_AREA". Select the last two ASM volumes (ORCL:VOL3 and ORCL:VOL4) in the "Select Member Disks" window. Keep the "Redundancy" setting to "Normal".

After verifying all values in this window are correct, click the [OK] button. This will present the "ASM Disk Group Creation" dialog.

When the ASM Disk Group Creation process is finished, you will be returned to the "ASM Disk Groups" window with two disk groups created and selected.

Select only one of the disk groups by using the checkbox next to the newly created Disk Group Name "RACDB_DATA1" (ensure that the disk group for "FLASH_RECOVERY_AREA" is not selected) and click [Next] to continue.

Database File Locations

I selected to use the default, which is to use Oracle Managed Files:

Database Area: +RACDB_DATA1

Recovery Configuration Check the option for "Specify Flash Recovery Area".

For the "Flash Recovery Area", click the [Browse] button and select the disk group name "+FLASH_RECOVERY_AREA".

My disk group has a size of about 17GB. When defining the Flash Recovery Area size, use the entire volume minus 10% — (17-10%=15 GB). I used a "Flash Recovery Area Size" of 15 GB (15360 MB).

Database Content I left all of the Database Components (and destination tablespaces) set to their default value, although it is perfectly OK to select the Sample Schemas. This option is available since we installed the Oracle Companion CD software.
Database Services For this test configuration, click Add, and enter racdb_taf as the "Service Name." Leave both instances set to Preferred and for the "TAF Policy" select "Basic".
Initialization Parameters Change any parameters for your environment. I left them all at their default settings.
Database Storage Change any parameters for your environment. I left them all at their default settings.
Creation Options Keep the default option Create Database selected. I also always select to "Generate Database Creation Scripts". Click Finish to start the database creation process. After acknowledging the database creation report and script generation dialog, the database creation will start.

Click OK on the "Summary" screen.

End of Database Creation At the end of the database creation, exit from the DBCA.

Note: When exiting the DBCA you will not receive any feedback from the dialog window for around 30-60 seconds. After awhile, another dialog will come up indicating that it is starting all Oracle instances and HA service "racdb_taf". This may take several minutes to complete. When finished, all windows and dialog boxes will disappear.

When the DBCA has completed, you will have a fully functional Oracle RAC cluster running!

$  
                                 $ORA_CRS_HOME/bin/crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    linux1
ora....X1.lsnr application    ONLINE    ONLINE    linux1
ora.linux1.gsd application    ONLINE    ONLINE    linux1
ora.linux1.ons application    ONLINE    ONLINE    linux1
ora.linux1.vip application    ONLINE    ONLINE    linux1
ora....SM2.asm application    ONLINE    ONLINE    linux2
ora....X2.lsnr application    ONLINE    ONLINE    linux2
ora.linux2.gsd application    ONLINE    ONLINE    linux2
ora.linux2.ons application    ONLINE    ONLINE    linux2
ora.linux2.vip application    ONLINE    ONLINE    linux2
ora.racdb.db   application    ONLINE    ONLINE    linux1
ora....b1.inst application    ONLINE    ONLINE    linux1
ora....b2.inst application    ONLINE    ONLINE    linux2
ora...._taf.cs application    ONLINE    ONLINE    linux1
ora....db1.srv application    ONLINE    ONLINE    linux1
ora....db2.srv application    ONLINE    ONLINE    linux2
                              

 

Verify the racdb_taf Service

During the creation of the Oracle clustered database, you added a service named racdb_taf that will be used to connect to the database with TAF enabled. Use the following to verify the racdb_taf service was successfully added:

SQL>  
                                 show parameter service

NAME                 TYPE        VALUE
-------------------- ----------- --------------------------------
service_names        string      racdb.idevelopment.info, racdb_taf
                              
If the only service defined was for racdb.idevelopment.info, then you will need to manually add the service to both instances:
SQL>  
                                 show parameter service

NAME                 TYPE        VALUE
-------------------- ----------- --------------------------
service_names        string      racdb.idevelopment.info

SQL>  
                                 alter system set service_names = 
  2   
                                 'racdb.idevelopment.info, racdb_taf' scope=both;
                              

 


26. Post-Installation Tasks - (Optional)

This chapter describes several optional tasks that can be applied to your new Oracle 10g environment in order to enhance availability as well as database management.

Re-compile Invalid Objects

Run the utlrp.sql script to recompile all invalid PL/SQL packages now instead of when the packages are accessed for the first time. This step is optional but recommended.
$  
                                 sqlplus / as sysdba
SQL>  
                                 @?/rdbms/admin/utlrp.sql
                              

Enabling Archive Logs in a RAC Environment

Whether a single instance or clustered database, Oracle tracks and logs all changes to database blocks in online redolog files. In an Oracle RAC environment, each instance will have its own set of online redolog files known as a thread. Each Oracle instance will use its group of online redologs in a circular manner. Once an online redolog fills, Oracle moves to the next one. If the database is in "Archive Log Mode", Oracle will make a copy of the online redo log before it gets reused. A thread must contain at least two online redologs (or online redolog groups). The same holds true for a single instance configuration. The single instance must contain at least two online redologs (or online redolog groups).

The size of an online redolog file is completely independent of another intances' redolog size. Although in most configurations the size is the same, it may be different depending on the workload and backup / recovery considerations for each node. It is also worth mentioning that each instance has exclusive write access to its own online redolog files. In a correctly configured RAC environment, however, each instance can read another instance's current online redolog file to perform instance recovery if that instance was terminated abnormally. It is therefore a requirement that online redo logs be located on a shared storage device (just like the database files).

As already mentioned, Oracle writes to its online redolog files in a circular manner. When the current online redolog fills, Oracle will switch to the next one. To facilitate media recovery, Oracle allows the DBA to put the database into "Archive Log Mode" which makes a copy of the online redolog after it fills (and before it gets reused). This is a process known as archiving.

The Database Configuration Assistant (DBCA) allows users to configure a new database to be in archive log mode, however most DBA's opt to bypass this option during initial database creation. In cases like this where the database is in no archive log mode, it is a simple task to put the database into archive log mode. Note however that this will require a short database outage. From one of the nodes in the Oracle RAC configuration, use the following tasks to put a RAC enabled database into archive log mode. For the purpose of this article, I will use the node linux1 which runs the racdb1 instance:

 

  1. Login to one of the nodes (i.e. linux1) and disable the cluster instance parameter by setting cluster_database to FALSE from the current instance:
    $  
                                         sqlplus / as sysdba
    SQL>  
                                         alter system set cluster_database=false scope=spfile sid='racdb1';
                                      

     

  2. Shutdown all instances accessing the clustered database:
    $  
                                         srvctl stop database -d racdb
                                      

     

  3. Using the local instance, MOUNT the database:
    $  
                                         sqlplus / as sysdba
    SQL>  
                                         startup mount
                                      

     

  4. Enable archiving:
    SQL>  
                                         alter database archivelog;
                                      

     

  5. Re-enable support for clustering by modifying the instance parameter cluster_database to TRUE from the current instance:
    SQL>  
                                         alter system set cluster_database=true scope=spfile sid='racdb1';
                                      

     

  6. Shutdown the local instance:
    SQL>  
                                         shutdown immediate
                                      

     

  7. Bring all instance back up using srvctl:
    $  
                                         srvctl start database -d racdb
                                      

     

  8. (Optional) Bring any services (i.e. TAF) back up using srvctl:
    $  
                                         srvctl start service -d racdb
                                      

     

  9. Login to the local instance and verify Archive Log Mode is enabled:
    $  
                                         sqlplus / as sysdba
    SQL>  
                                         archive log list
    Database log mode               
                                                                                 Archive Mode                                     
    Automatic archival              
                                                                                 Enabled                                     
    Archive destination            USE_DB_RECOVERY_FILE_DEST
    Oldest online log sequence     29
    Next log sequence to archive   30
    Current log sequence           30
                                      


After enabling Archive Log Mode, each instance in the RAC configuration can automatically archive redologs!

Download and Install Custom Oracle Database Scripts

DBA's rely on Oracle's data dictionary views and dynamic performance views in order to support and better manage their databases. Although these views provide a simple and easy mechanism to query critical information regarding the database, it helps to have a collection of accurate and readily available SQL scripts to query these views.

In this section you will download and install a collection of Oracle DBA scripts that can be used to manage many aspects of your database including space management, performance, backups, security, and session management. The Oracle DBA scripts archive can be downloaded using the following link http://www.idevelopment.info/data/Oracle/DBA_scripts/dba_scripts_archive_Oracle.zip. As the oracle user account, download the common.zip archive to the $ORACLE_BASE directory of each node in the cluster. For the purpose of this example, the common.zip archive will be copied to /u01/app/oracle. Next, unzip the archive file to the $ORACLE_BASE directory.

For example, perform the following on both nodes in the Oracle RAC cluster as the oracle user account:

$  
                                 mv common.zip /u01/app/oracle
$  
                                 cd /u01/app/oracle
$  
                                 unzip common.zip
                              


The final step is to verify (or set) the appropriate environment variable for the current UNIX shell to ensure the Oracle SQL scripts can be run from SQL*Plus while in any directory. For UNIX verify the following environment variable is set and included in your login shell script:

                                  ORACLE_PATH=                                    $ORACLE_BASE/common/oracle/sql:.:$ORACLE_HOME/rdbms/admin                                 
                                  export ORACLE_PATH
                              

Note: The ORACLE_PATH environment variable should already be set in the .bash_profile login script that was created in the section Create Login Script for oracle User Account.

Now that the Oracle DBA scripts have been unzipped and the UNIX environment variable ( $ORACLE_PATH) has been set to the appropriate directory, you should now be able to run any of the SQL scripts in your $ORACLE_BASE/common/oracle/sql while logged into SQL*Plus. For example, to query tablespace information while logged into the Oracle database as a DBA user:

SQL>  
                                 @dba_tablespaces

Status    Tablespace Name TS Type      Ext. Mgt.  Seg. Mgt.    Tablespace Size    Used (in bytes) Pct. Used
--------- --------------- ------------ ---------- --------- ------------------ ------------------ ---------
ONLINE    UNDOTBS1        UNDO         LOCAL      MANUAL         1,283,457,024          9,043,968         1
ONLINE    SYSAUX          PERMANENT    LOCAL      AUTO             524,288,000        378,732,544        72
ONLINE    USERS           PERMANENT    LOCAL      AUTO           2,147,483,648        321,257,472        15
ONLINE    SYSTEM          PERMANENT    LOCAL      MANUAL           838,860,800        505,544,704        60
ONLINE    INDX            PERMANENT    LOCAL      AUTO           1,073,741,824             65,536         0
ONLINE    UNDOTBS2        UNDO         LOCAL      MANUAL         1,283,457,024         22,282,240         2
ONLINE    TEMP            TEMPORARY    LOCAL      MANUAL         1,073,741,824         92,274,688         9
                                                      ------------------ ------------------ ---------
avg                                                                                                      23
sum                                                              8,225,030,144      1,329,201,152

7 rows selected.
                              

To obtain a list of all available Oracle DBA scripts while logged into SQL*Plus, run the help.sql script:

SQL>  
                                 @help.sql

========================================
Automatic Shared Memory Management
========================================
asmm_components.sql

========================================
Automatic Storage Management
========================================
asm_alias.sql
asm_clients.sql
asm_diskgroups.sql
asm_disks.sql
asm_disks_perf.sql
asm_drop_files.sql
asm_files.sql
asm_files2.sql
asm_templates.sql

                                 
< --- SNIP --- >

perf_top_sql_by_buffer_gets.sql
perf_top_sql_by_disk_reads.sql

========================================
Workspace Manager
========================================
wm_create_workspace.sql
wm_disable_versioning.sql
wm_enable_versioning.sql
wm_freeze_workspace.sql
wm_get_workspace.sql
wm_goto_workspace.sql
wm_merge_workspace.sql
wm_refresh_workspace.sql
wm_remove_workspace.sql
wm_unfreeze_workspace.sql
wm_workspaces.sql
                              

Create Shared Oracle Password Files

In this section, I present the steps required to configure a shared Oracle password file between all instances in the Oracle clustered database. The password file for the database in UNIX is located at $ORACLE_HOME/dbs/orapw<ORACLE_SID> for each instance and contains a list of all database users that have SYSDBA privileges. When a database user is granted the SYSDBA role, the instance records this in the database password file for the instance you are logged into. But what about the other instances in the cluster? The database password file on other instances do not get updated and will not contain the user who was just granted the SYSDBA role. Therefore a program (like RMAN) that tries to login as this new user with SYSDBA privileges will fail if it tries to use an instance with a password file that does not contain his or her name.

To resolve this problem, a common solution is to place a single database password file on a shared / clustered file system and then create symbolic links from each of the instances to this single version of the database password file. Since the environment described in this article makes use of the Oracle Clustered File System (OCFS2), we will use it to store the single version of the database password file.

In this section, we will also be including the Oracle password file for the ASM instance.

 

  1. Create the database password directory on the clustered file system mounted on /u02/oradata/racdb. Perform the following from only one node in the cluster as the oracle user account - ( linux1):
    $  
                                         mkdir -p /u02/oradata/racdb/dbs
                                      
  2. From one node in the cluster ( linux1), move the database password files to the database password directory on the clustered file system. Chose a node that contains a database password file that has the most recent SYSDBA additions. In most cases, this will not matter since any missing entries can be easily added by granting them the SYSDBA role - (plus the fact that this is a fresh install and unlikely you created any SYSDBA users at this point!). Note that the database server does not need to be shutdown while performing the following actions. From linux1 as the oracle user account:

     

    $  
                                         mv $ORACLE_HOME/dbs/orapw+ASM1 /u02/oradata/racdb/dbs/orapw+ASM
    $  
                                         mv $ORACLE_HOME/dbs/orapwracdb1 /u02/oradata/racdb/dbs/orapwracdb
    
    $  
                                         ln -s /u02/oradata/racdb/dbs/orapw+ASM $ORACLE_HOME/dbs/orapw+ASM1
    $  
                                         ln -s /u02/oradata/racdb/dbs/orapwracdb $ORACLE_HOME/dbs/orapwracdb1
                                      
  3. From the second node in the cluster ( linux2):
    $  
                                         rm $ORACLE_HOME/dbs/orapw+ASM2
    $  
                                         rm $ORACLE_HOME/dbs/orapwracdb2
    
    $  
                                         ln -s /u02/oradata/racdb/dbs/orapw+ASM $ORACLE_HOME/dbs/orapw+ASM2
    $  
                                         ln -s /u02/oradata/racdb/dbs/orapwracdb $ORACLE_HOME/dbs/orapwracdb2
                                      


Now, when a user is granted the SYSDBA role, all instances will have access to the same password file:

SQL>  
                                 GRANT sysdba TO scott;
                              


 


27. Verify TNS Networking Files

Ensure that the TNS networking files are configured on both Oracle RAC nodes in the cluster!

listener.ora

We already covered how to create a TNS listener configuration file ( listener.ora) for a clustered environment in Section 24. The listener.ora file should be properly configured and no modifications should be needed.

For clarity, I have included a copy of the listener.ora file from my node linux1 in this guide support files. I've also included a copy of my tnsnames.ora file that was configured by Oracle and can be used for testing the Transparent Application Failover (TAF). This file should already be configured on both Oracle RAC nodes in the cluster.

You can include any of these entries on other client machines that need access to the clustered database.

Connecting to Clustered Database From an External Client

This is an optional step, but I like to perform it in order to verify my TNS files are configured correctly. Use another machine (i.e. a Windows machine connected to the network) that has Oracle installed and add the TNS entries (in the tnsnames.ora) from either of the nodes in the cluster that were created for the clustered database.

Note: Verify that the machine you are connecting from can resolve all host names exactly how they appear in the listener.ora and tnsnames.ora files. For the purpose of this document, the machine you are connecting from should be able to resolve the following host names in the local hosts file or through DNS:

192.168.1.100    linux1
192.168.1.101    linux2
192.168.1.200    linux1-vip
192.168.1.201    linux2-vip

Try to connect to the clustered database using all available service names defined in the tnsnames.ora file:

C:\>  
                                 sqlplus system/manager@racdb2
C:\>  
                                 sqlplus system/manager@racdb1
C:\>  
                                 sqlplus system/manager@racdb_taf
C:\>  
                                 sqlplus system/manager@racdb
                              

 


28. Create / Alter Tablespaces

When creating the clustered database, we left all tablespaces set to their default size. If you are using a large drive for the shared storage, you may want to make a sizable testing database.

Below are several optional SQL commands for modifying and creating all tablespaces for the test database. Please keep in mind that the database file names (OMF files) used in this example may differ from what the Oracle Database Configuration Assistant (DBCA) creates for your environment. When working through this section, substitute the data file names that were created in your environment where appropriate. The following query can be used to determine the file names for your environment:

 

SQL>  
                                 select tablespace_name, file_name
  2   
                                 from dba_data_files
  3   
                                 union
  4   
                                 select tablespace_name, file_name
  5   
                                 from dba_temp_files;

TABLESPACE_NAME     FILE_NAME
--------------- --------------------------------------------------
EXAMPLE         +RACDB_DATA1/racdb/datafile/example.257.570913311
INDX            +RACDB_DATA1/racdb/datafile/indx.270.570920045
SYSAUX          +RACDB_DATA1/racdb/datafile/sysaux.260.570913287
SYSTEM          +RACDB_DATA1/racdb/datafile/system.262.570913215
TEMP            +RACDB_DATA1/racdb/tempfile/temp.258.570913303
UNDOTBS1        +RACDB_DATA1/racdb/datafile/undotbs1.261.570913263
UNDOTBS2        +RACDB_DATA1/racdb/datafile/undotbs2.265.570913331
USERS           +RACDB_DATA1/racdb/datafile/users.264.570913355
                              

 

$  
                                 sqlplus / as sysdba

SQL>  
                                 create user scott identified by tiger default tablespace users;
SQL>  
                                 grant dba, resource, connect to scott;

SQL>  
                                 alter database datafile '+RACDB_DATA1/racdb/datafile/users.264.570913355' resize 1024m;
SQL>  
                                 alter tablespace users add datafile '+RACDB_DATA1' size 1024m autoextend off;

SQL>  
                                 create tablespace indx datafile '+RACDB_DATA1' size 1024m
  2   
                                 autoextend on next 50m maxsize unlimited
  3   
                                 extent management local autoallocate
  4   
                                 segment space management auto;

SQL>  
                                 alter database datafile '+RACDB_DATA1/racdb/datafile/system.262.570913215' resize 800m;

SQL>  
                                 alter database datafile '+RACDB_DATA1/racdb/datafile/sysaux.260.570913287' resize 500m;

SQL>  
                                 alter tablespace undotbs1 add datafile '+RACDB_DATA1' size 1024m
  2   
                                 autoextend on next 50m maxsize 2048m;

SQL>  
                                 alter tablespace undotbs2 add datafile '+RACDB_DATA1' size 1024m
  2   
                                 autoextend on next 50m maxsize 2048m;

SQL>  
                                 alter database tempfile '+RACDB_DATA1/racdb/tempfile/temp.258.570913303' resize 1024m;
                              

Here is a snapshot of the tablespaces I have defined for my test database environment:

                                 
Status    Tablespace Name TS Type      Ext. Mgt.  Seg. Mgt.    Tablespace Size    Used (in bytes) Pct. Used
--------- --------------- ------------ ---------- --------- ------------------ ------------------ ---------
ONLINE    UNDOTBS1        UNDO         LOCAL      MANUAL         1,283,457,024         85,065,728         7
ONLINE    SYSAUX          PERMANENT    LOCAL      AUTO             524,288,000        275,906,560        53
ONLINE    USERS           PERMANENT    LOCAL      AUTO           2,147,483,648            131,072         0
ONLINE    SYSTEM          PERMANENT    LOCAL      MANUAL           838,860,800        500,301,824        60
ONLINE    EXAMPLE         PERMANENT    LOCAL      AUTO             157,286,400         83,820,544        53
ONLINE    INDX            PERMANENT    LOCAL      AUTO           1,073,741,824             65,536         0
ONLINE    UNDOTBS2        UNDO         LOCAL      MANUAL         1,283,457,024          3,801,088         0
ONLINE    TEMP            TEMPORARY    LOCAL      MANUAL         1,073,741,824         27,262,976         3
                                                            ------------------ ------------------ ---------
avg                                                                                                      22
sum                                                              8,382,316,544        976,355,328

8 rows selected.
                              

 


29. Verify the RAC Cluster & Database Configuration

The following RAC verification checks should be performed on both Oracle RAC nodes in the cluster! For this article, however, I will only be performing checks from linux1.

This section provides several srvctl commands and SQL queries you can use to validate your Oracle RAC 10g configuration.

There are five node-level tasks defined for SRVCTL:

  • Adding and deleting node-level applications
  • Setting and unsetting the environment for node-level applications
  • Administering node applications
  • Administering ASM instances
  • Starting and stopping a group of programs that includes virtual IP addresses, listeners, Oracle Notification Services, and Oracle Enterprise Manager agents (for maintenance purposes).

Status of all instances and services

$  
                                 srvctl status database -d racdb
Instance racdb1 is running on node linux1
Instance racdb2 is running on node linux2
                              

Status of a single instance

$  
                                 srvctl status instance -d racdb -i racdb2
Instance racdb2 is running on node linux2
                              

Status of a named service globally across the database

$  
                                 srvctl status service -d racdb -s racdb_taf
Service racdb_taf is running on instance(s) racdb1, racdb2
                              

Status of node applications on a particular node

$  
                                 srvctl status nodeapps -n linux1
VIP is running on node: linux1
GSD is running on node: linux1
Listener is running on node: linux1
ONS daemon is running on node: linux1
                              

Status of an ASM instance

$  
                                 srvctl status asm -n linux1
ASM instance +ASM1 is running on node linux1.
                              

List all configured databases

$  
                                 srvctl config database
racdb
                              

Display configuration for our RAC database

$  
                                 srvctl config database -d racdb
linux1 racdb1 /u01/app/oracle/product/10.2.0/db_1
linux2 racdb2 /u01/app/oracle/product/10.2.0/db_1
                              

Display all services for the specified cluster database

$  
                                 srvctl config service -d racdb
racdb_taf PREF: racdb1 racdb2 AVAIL:
                              

Display the configuration for node applications - (VIP, GSD, ONS, Listener)

$  
                                 srvctl config nodeapps -n linux1 -a -g -s -l
VIP exists.: /linux1-vip/192.168.1.200/255.255.255.0/eth0
GSD exists.
ONS daemon exists.
Listener exists.
                              

Display the configuration for the ASM instance(s)

$  
                                 srvctl config asm -n linux1
+ASM1 /u01/app/oracle/product/10.2.0/db_1
                              

All running instances in the cluster

                                  SELECT     inst_id   , instance_number inst_no   , instance_name inst_name   , parallel   , status   , database_status db_status   , active_state state   , host_name host FROM gv$instance ORDER BY inst_id;

 INST_ID  INST_NO INST_NAME  PAR STATUS  DB_STATUS    STATE     HOST
-------- -------- ---------- --- ------- ------------ --------- -------
       1        1 racdb1      YES OPEN    ACTIVE       NORMAL    linux1
       2        2 racdb2      YES OPEN    ACTIVE       NORMAL    linux2
                              

All data files which are in the disk group

                                  select name from v$datafile union select member from v$logfile union select name from v$controlfile union select name from v$tempfile;

NAME
-------------------------------------------
+FLASH_RECOVERY_AREA/racdb/controlfile/current.258.570913191
+FLASH_RECOVERY_AREA/racdb/onlinelog/group_1.257.570913201
+FLASH_RECOVERY_AREA/racdb/onlinelog/group_2.256.570913211
+FLASH_RECOVERY_AREA/racdb/onlinelog/group_3.259.570918285
+FLASH_RECOVERY_AREA/racdb/onlinelog/group_4.260.570918295
+RACDB_DATA1/racdb/controlfile/current.259.570913189
+RACDB_DATA1/racdb/datafile/example.257.570913311
+RACDB_DATA1/racdb/datafile/indx.270.570920045
+RACDB_DATA1/racdb/datafile/sysaux.260.570913287
+RACDB_DATA1/racdb/datafile/system.262.570913215
+RACDB_DATA1/racdb/datafile/undotbs1.261.570913263
+RACDB_DATA1/racdb/datafile/undotbs1.271.570920865
+RACDB_DATA1/racdb/datafile/undotbs2.265.570913331
+RACDB_DATA1/racdb/datafile/undotbs2.272.570921065
+RACDB_DATA1/racdb/datafile/users.264.570913355
+RACDB_DATA1/racdb/datafile/users.269.570919829
+RACDB_DATA1/racdb/onlinelog/group_1.256.570913195
+RACDB_DATA1/racdb/onlinelog/group_2.263.570913205
+RACDB_DATA1/racdb/onlinelog/group_3.266.570918279
+RACDB_DATA1/racdb/onlinelog/group_4.267.570918289
+RACDB_DATA1/racdb/tempfile/temp.258.570913303

21 rows selected.
                              

All ASM disk that belong to the 'RACDB_DATA1' disk group

                                  SELECT path FROM   v$asm_disk WHERE  group_number IN (select group_number                         from v$asm_diskgroup                         where name = 'RACDB_DATA1');

PATH
----------------------------------
ORCL:VOL1
ORCL:VOL2
                              

 


30. Starting / Stopping the Cluster

At this point, we've installed and configured Oracle RAC 10g entirely and have a fully functional clustered database.

After all the work done up to this point, you may well ask, "OK, so how do I start and stop services?". If you have followed the instructions in this guide, all services—including Oracle Clusterware, all Oracle instances, Enterprise Manager Database Console, and so on—should start automatically on each reboot of the Linux nodes.

There are times, however, when you might want to shut down a node and manually start it back up. Or you may find that Enterprise Manager is not running and need to start it. This section provides the commands responsible for starting and stopping the cluster environment.

Ensure that you are logged in as the oracle UNIX user. We will run all commands in this section from linux1:

#  
                                 su - oracle
$  
                                 hostname
linux1
                                

                              

Stopping the Oracle RAC 10g Environment

The first step is to stop the Oracle instance. When the instance (and related services) is down, then bring down the ASM instance. Finally, shut down the node applications (Virtual IP, GSD, TNS Listener, and ONS).

$  
                                 export ORACLE_SID=racdb1
$  
                                 emctl stop dbconsole
$  
                                 srvctl stop instance -d racdb -i racdb1
$  
                                 srvctl stop asm -n linux1
$  
                                 srvctl stop nodeapps -n linux1
                              

Starting the Oracle RAC 10g Environment

The first step is to start the node applications (Virtual IP, GSD, TNS Listener, and ONS). When the node applications are successfully started, then bring up the ASM instance. Finally, bring up the Oracle instance (and related services) and the Enterprise Manager Database console.

$  
                                 export ORACLE_SID=racdb1
$  
                                 srvctl start nodeapps -n linux1
$  
                                 srvctl start asm -n linux1
$  
                                 srvctl start instance -d racdb -i racdb1
$  
                                 emctl start dbconsole
                              

Start/Stop All Instances with SRVCTL

Start/stop all the instances and their enabled services. I have included this step just for fun as a way to bring down all instances!

$  
                                 srvctl start database -d racdb
$  
                                 srvctl stop database -d racdb
                              

 


31. Transparent Application Failover (TAF)

It is not uncommon for businesses to demand 99.99% (or even 99.999%) availability for their enterprise applications. Think about what it would take to ensure a downtime of no more than .5 hours or even no downtime during the year. To answer many of these high-availability requirements, businesses are investing in mechanisms that provide for automatic failover when one participating system fails. When considering the availability of the Oracle database, Oracle RAC 10g provides a superior solution with its advanced failover mechanisms. Oracle RAC 10g includes the required components that all work within a clustered configuration responsible for providing continuous availability; when one of the participating systems fail within the cluster, the users are automatically migrated to the other available systems.

A major component of Oracle RAC 10g that is responsible for failover processing is the Transparent Application Failover (TAF) option. All database connections (and processes) that lose connections are reconnected to another node within the cluster. The failover is completely transparent to the user.

This final section provides a short demonstration on how TAF works in Oracle RAC 10g. Please note that a complete discussion of failover in Oracle RAC 10g would require an article in itself; my intention here is to present only a brief overview.

One important note is that TAF happens automatically within the OCI libraries. Thus your application (client) code does not need to change in order to take advantage of TAF. Certain configuration steps, however, will need to be done on the Oracle TNS file tnsnames.ora. (Keep in mind that as of this writing, the Java thin client will not be able to participate in TAF because it never reads tnsnames.ora.)

Setup the tnsnames.ora File

Before demonstrating TAF, we need to verify that a valid entry exists in the tnsnames.ora file on a non-RAC client machine (if you have a Windows machine lying around). Ensure that you have the Oracle RDBMS software installed. (Actually, you only need a client install of the Oracle software.)

During the creation of the clustered database in this guide, we created a new service that will be used for testing TAF named RACDB_TAF. It provides all the necessary configuration parameters for load balancing and failover. You can copy the contents of this entry to the %ORACLE_HOME%\network\admin\tnsnames.ora file on the client machine (my Windows laptop is being used in this example) in order to connect to the new Oracle clustered database:

...
RACDB_TAF =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = linux1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = linux2-vip)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = racdb_taf.idevelopment.info)
      (FAILOVER_MODE =
        (TYPE = SELECT)
        (METHOD = BASIC)
        (RETRIES = 180)
        (DELAY = 5)
      )
    )
  )
...
SQL Query to Check the Session's Failover Information

The following SQL query can be used to check a session's failover type, failover method, and if a failover has occurred. We will be using this query throughout this example.

COLUMN instance_name    FORMAT a13
COLUMN host_name        FORMAT a9
COLUMN failover_method  FORMAT a15
COLUMN failed_over      FORMAT a11

SELECT
    instance_name
  , host_name
  , NULL AS failover_type
  , NULL AS failover_method
  , NULL AS failed_over
FROM v$instance
UNION
SELECT
    NULL
  , NULL
  , failover_type
  , failover_method
  , failed_over
FROM v$session
WHERE username = 'SYSTEM';

TAF Demo

From a Windows machine (or other non-RAC client machine), login to the clustered database using the racdb_taf service as the SYSTEM user:

C:\> sqlplus system/manager@racdb_taf

COLUMN instance_name    FORMAT a13
COLUMN host_name        FORMAT a9
COLUMN failover_method  FORMAT a15
COLUMN failed_over      FORMAT a11

SELECT
    instance_name
  , host_name
  , NULL AS failover_type
  , NULL AS failover_method
  , NULL AS failed_over
FROM v$instance
UNION
SELECT
    NULL
  , NULL
  , failover_type
  , failover_method
  , failed_over
FROM v$session
WHERE username = 'SYSTEM';

INSTANCE_NAME HOST_NAME FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
------------- --------- ------------- --------------- -----------
racdb1         linux1
                        SELECT        BASIC           NO

DO NOT log out of the above SQL*Plus session!

Now that we have run the query (above), we should now shutdown the instance racdb1 on linux1 using the abort option. To perform this operation, you can use the srvctl command-line utility as follows:

# su - oracle
$ srvctl status database -d racdb
Instance racdb1 is running on node linux1
Instance racdb2 is running on node linux2

$ srvctl stop instance -d racdb -i racdb1 -o abort

$ srvctl status database -d racdb
Instance racdb1 is not running on node linux1
Instance racdb2 is running on node linux2
Now let's go back to our SQL session and rerun the SQL statement in the buffer:
COLUMN instance_name    FORMAT a13
COLUMN host_name        FORMAT a9
COLUMN failover_method  FORMAT a15
COLUMN failed_over      FORMAT a11

SELECT
    instance_name
  , host_name
  , NULL AS failover_type
  , NULL AS failover_method
  , NULL AS failed_over
FROM v$instance
UNION
SELECT
    NULL
  , NULL
  , failover_type
  , failover_method
  , failed_over
FROM v$session
WHERE username = 'SYSTEM';


INSTANCE_NAME HOST_NAME FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
------------- --------- ------------- --------------- -----------
racdb2         linux2
                        SELECT        BASIC           YES

SQL> exit

From the above demonstration, we can see that the above session has now been failed over to instance racdb2 on linux2.

 


32. Troubleshooting

Confirm the RAC Node Name is Not Listed in Loopback Address

Ensure that the node names ( linux1 or linux2) are not included for the loopback address in the /etc/hosts file. If the machine name is listed in the in the loopback address entry as below:

127.0.0.1  
                                  
                                      linux1
                                   localhost.localdomain localhost
                                  

                                
it will need to be removed as shown below:
127.0.0.1 localhost.localdomain localhost
                                  

                                

If the RAC node name is listed for the loopback address, you will receive the following error during the RAC installation:

ORA-00603: ORACLE server session terminated by fatal error
                                  

                                
or
ORA-29702: error occurred in Cluster Group Service operation
                                  

                                

Confirm localhost is defined in the /etc/hosts file for the loopback address

Ensure that the entry for localhost.localdomain and localhost are included for the loopback address in the /etc/hosts file for each of the Oracle RAC nodes:

    127.0.0.1        localhost.localdomain localhost
If an entry does not exist for localhost in the /etc/hosts file, Oracle Clusterware will be unable to start the application resources — notably the ONS process. The error would indicate " Failed to get IP for localhost " and will be written to the log file for ONS. For example:
CRS-0215 could not start resource 'ora.linux1.ons'. Check log file
"/u01/app/crs/log/linux1/racg/ora.linux1.ons.log"
for more details.
The ONS log file will contain lines similar to the following:

Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
2007-04-14 13:10:02.729: [ RACG][3086871296][13316][3086871296][ora.linux1.ons]: Failed to get IP for localhost (1)
Failed to get IP for localhost (1)
Failed to get IP for localhost (1)
onsctl: ons failed to start
...

Setting the Correct Date and Time on All Cluster Nodes

During the installation of Oracle Clusterware, the Database, and the Companion CD, the Oracle Universal Installer (OUI) first installs the software to the local node running the installer (i.e. linux1). The software is then copied remotely to all of the remaining nodes in the cluster (i.e. linux2). During the remote copy process, the OUI will execute the UNIX " tar" command on each of the remote nodes to extract the files that were archived and copied over. If the date and time on the node performing the install is greater than that of the node it is copying to, the OUI will throw an error from the " tar" command indicating it is attempting to extract files stamped with a time in the future:

                                                                              Error while copying directory      /u01/app/crs with exclude file list 'null' to nodes 'linux2'. [PRKC-1002 : All the submitted commands did not execute successfully] --------------------------------------------- linux2:    /bin/tar: ./bin/lsnodes: time stamp 2009-07-28 09:21:34 is 735 s in the future    /bin/tar: ./bin/olsnodes: time stamp 2009-07-28 09:21:34 is 735 s in the future    ...(more errors on this node)                                     
                                  

Please note that although this would seem like a severe error from the OUI, it can safely be disregarded as a warning. The " tar" command DOES actually extract the files; however, when you perform a listing of the files (using ls -l) on the remote node, they will be missing the time field until the time on the server is greater than the timestamp of the file.

Before starting any of the above noted installations, ensure that each member node of the cluster is set as closely as possible to the same date and time. Oracle strongly recommends using the Network Time Protocol feature of most operating systems for this purpose, with both Oracle RAC nodes using the same reference Network Time Protocol server.

Accessing a Network Time Protocol server, however, may not always be an option. In this case, when manually setting the date and time for the nodes in the cluster, ensure that the date and time of the node you are performing the software installations from (linux1) is less than all other nodes in the cluster (linux2). I generally use a 20 second difference as shown in the following example:

Setting the date and time from linux1:

#  
                                   date -s "7/28/2009 23:00:00"
                                

Setting the date and time from linux2:

#  
                                   date -s "7/28/2009 23:00:20"
                                

The two-node RAC configuration described in this article does not make use of a Network Time Protocol server.

 

Openfiler - Logical Volumes Not Active on Boot

One issue that I have run into several times occurs when using a USB drive connected to the Openfiler server. When the Openfiler server is rebooted, the system is able to recognize the USB drive however, it is not able to load the logical volumes and writes the following message to /var/log/messages - (also available through dmesg):
iSCSI Enterprise Target Software - version 0.4.14
iotype_init(91) register fileio
iotype_init(91) register blockio
iotype_init(91) register nullio
open_path(120) Can't open /dev/rac1/crs -2
fileio_attach(268) -2
open_path(120) Can't open /dev/rac1/asm1 -2
fileio_attach(268) -2
open_path(120) Can't open /dev/rac1/asm2 -2
fileio_attach(268) -2
open_path(120) Can't open /dev/rac1/asm3 -2
fileio_attach(268) -2
open_path(120) Can't open /dev/rac1/asm4 -2
fileio_attach(268) -2

Please note that I am not suggesting that this only occurs with USB drives connected to the Openfiler server. It may occur with other types of drives, however I have only seen it with USB drives!

If you do receive this error, you should first check the status of all logical volumes using the lvscan command from the Openfiler server:

#  
                                   lvscan
                                                                          inactive                                             '/dev/rac1/crs' [2.00 GB] inherit
                                                                          inactive                                             '/dev/rac1/asm1' [115.94 GB] inherit
                                                                          inactive                                             '/dev/rac1/asm2' [115.94 GB] inherit
                                                                          inactive                                             '/dev/rac1/asm3' [115.94 GB] inherit
                                                                          inactive                                             '/dev/rac1/asm4' [115.94 GB] inherit
                                

Notice that the status for each of the logical volumes is set to inactive - (the status for each logical volume on a working system would be set to ACTIVE).

I currently know of two methods to get Openfiler to automatically load the logical volumes on reboot, both of which are described below.

Method 1

One of the first steps is to shutdown both of the Oracle RAC nodes in the cluster - ( linux1 and linux2). Then, from the Openfiler server, manually set each of the logical volumes to ACTIVE for each consecutive reboot:

#  
                                     lvchange -a y /dev/rac1/crs
#  
                                     lvchange -a y /dev/rac1/asm1
#  
                                     lvchange -a y /dev/rac1/asm2
#  
                                     lvchange -a y /dev/rac1/asm3
#  
                                     lvchange -a y /dev/rac1/asm4
                                  

Another method to set the status to active for all logical volumes is to use the Volume Group change command as follows:

#  
                                     vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "rac1" using metadata type lvm2

#  
                                     vgchange -ay
  5 logical volume(s) in volume group "rac1" now active
                                  

After setting each of the logical volumes to active, use the lvscan command again to verify the status:

#  
                                     lvscan
   
                                                                             ACTIVE                                                 '/dev/rac1/crs' [2.00 GB] inherit
   
                                                                             ACTIVE                                                 '/dev/rac1/asm1' [115.94 GB] inherit
   
                                                                             ACTIVE                                                 '/dev/rac1/asm2' [115.94 GB] inherit
   
                                                                             ACTIVE                                                 '/dev/rac1/asm3' [115.94 GB] inherit
   
                                                                             ACTIVE                                                 '/dev/rac1/asm4' [115.94 GB] inherit
                                  

As a final test, reboot the Openfiler server to ensure each of the logical volumes will be set to ACTIVE after the boot process. After you have verified that each of the logical volumes will be active on boot, check that the iSCSI target service is running:

#  
                                     service iscsi-target status
ietd (pid 2668) is running...
                                  

Finally, restart each of the Oracle RAC nodes in the cluster - ( linux1 and linux2).

Method 2

This method was kindly provided by Martin Jones. His workaround includes amending the /etc/rc.sysinit script to basically wait for the USB disk ( /dev/sda in my example) to be detected. After making the changes to the /etc/rc.sysinit script (described below), verify the external drives are powered on and then reboot the Openfiler server.

The following is a small portion of the /etc/rc.sysinit script on the Openfiler server with the changes (highlighted in blue) proposed by Martin:

..............................................................
# LVM2 initialization, take 2
        if [ -c /dev/mapper/control ]; then
                if [ -x /sbin/multipath.static ]; then
                        modprobe dm-multipath >/dev/null 2>&1
                        /sbin/multipath.static -v 0
                        if [ -x /sbin/kpartx ]; then
                                /sbin/dmsetup ls --target multipath --exec "/sbin/kpartx -a"
                        fi
                fi
 

                if [ -x /sbin/dmraid ]; then
                        modprobe dm-mirror > /dev/null 2>&1
                        /sbin/dmraid -i -a y
                fi

                                     
#-----
#-----  MJONES - Customisation Start
#-----

       # Check if /dev/sda is ready
         while [ ! -e /dev/sda ]
         do
             echo "Device /dev/sda for first USB Drive is not yet ready."
             echo "Waiting..."
             sleep 5
         done
         echo "INFO - Device /dev/sda for first USB Drive is ready."

#-----
#-----  MJONES - Customisation END
#-----
                if [ -x /sbin/lvm.static ]; then
                        if /sbin/lvm.static vgscan > /dev/null 2>&1 ; then
                                action $"Setting up Logical Volume
Management:" /sbin/lvm.static vgscan --mknodes --ignorelockingfailure &&
/sbin/lvm.static vgchange -a y --ignorelockingfailure
                        fi
                fi
        fi
 

# Clean up SELinux labels
if [ -n "$SELINUX" ]; then
   for file in /etc/mtab /etc/ld.so.cache ; do
      [ -r $file ] && restorecon $file  >/dev/null 2>&1
   done
fi
..............................................................
                                  

Finally, restart each of the Oracle RAC nodes in the cluster - ( linux1 and linux2).

OCFS2 - o2cb_ctl: Unable to access cluster service while creating node

While configuring the nodes for OCFS2 using ocfs2console, it is possible to run into the error:

                                                                          o2cb_ctl: Unable to access cluster service while creating node                                   
                                

This error does not show up when you startup ocfs2console for the first time. This message comes up when there is a problem with the cluster configuration or if you do not save the cluster configuration initially while setting it up using ocfs2console. This is a bug!

The work-around is to exit from the ocfs2console, unload the o2cb module and remove the ocfs2 cluster configuration file /etc/ocfs2/cluster.conf. I also like to remove the /config directory. After removing the ocfs2 cluster configuration file, restart the ocfs2console program.

For example:

#  
                                   /etc/init.d/o2cb offline ocfs2
#  
                                   /etc/init.d/o2cb unload
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK

#  
                                   rm -f /etc/ocfs2/cluster.conf
#  
                                   rm -rf /config

#  
                                   ocfs2console &
                                

This time, it will add the nodes!

OCFS2 - Adjusting the O2CB Heartbeat Threshold

With previous versions of this article, (using FireWire as opposed to iSCSI for the shared storage), I was able to install and configure OCFS2, format the new volume, and finally install Oracle Clusterware (with its two required shared files; the voting disk and OCR file), located on the new OCFS2 volume. While I was able to install Oracle Clusterware and see the shared drive using FireWire, however, I was receiving many lock-ups and hanging after about 15 minutes when the Clusterware software was running on both nodes. It always varied on which node would hang (either linux1 or linux2 in my example). It also didn't matter whether there was a high I/O load or none at all for it to crash (hang).

After looking through the trace files for OCFS2, it was apparent that access to the voting disk was too slow (exceeding the O2CB heartbeat threshold) and causing the Oracle Clusterware software (and the node) to crash. On the console would be a message similar to the following:

...
Index 0: took 0 ms to do submit_bio for read
Index 1: took 3 ms to do waiting for read completion
Index 2: took 0 ms to do bio alloc write
Index 3: took 0 ms to do bio add page write
Index 4: took 0 ms to do submit_bio for write
Index 5: took 0 ms to do checking slots
Index 6: took 4 ms to do waiting for write completion
Index 7: took 1993 ms to do msleep
Index 8: took 0 ms to do allocating bios for read
Index 9: took 0 ms to do bio alloc read
Index 10: took 0 ms to do bio add page read
Index 11: took 0 ms to do submit_bio for read
                                                                          Index 12: took 10006 ms to do waiting for read completion (13,3):o2hb_stop_all_regions:1888 ERROR: stopping heartbeat on all active regions. Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing                                   
                                

The solution I used was to increase the O2CB heartbeat threshold from its default value of 31 (which used to be 7 in previous versions of OCFS2), to a value of 61. Some setups may require an even higher setting. This is a configurable parameter that is used to compute the time it takes for a node to "fence" itself. During the installation and configuration of OCFS2, we adjusted this value in the section " Configure O2CB to Start on Boot and Adjust O2CB Heartbeat Threshold". If you encounter a kernel panic from OCFS2 and need to increase the heartbeat threshold, use the same procedures described in the section " Configure O2CB to Start on Boot and Adjust O2CB Heartbeat Threshold".

The following describes how to manually adjust the O2CB heartbeat threshold.

First, let's see how to determine what the O2CB heartbeat threshold is currently set to. This can be done by querying the /proc file system as follows:

#  
                                   cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
31
                                

We see that the value is 31, but what does this value represent? Well, it is used in the formula below to determine the fence time (in seconds):

[fence time in seconds] = (O2CB_HEARTBEAT_THRESHOLD - 1) * 2

So, with an O2CB heartbeat threshold of 31, we would have a fence time of:

(31 - 1) * 2 = 60 seconds

If we want a larger threshold (say 120 seconds), we would need to adjust O2CB_HEARTBEAT_THRESHOLD to 61 as shown below:

(61 - 1) * 2 = 120 seconds

Let's see now how to manually increase the O2CB heartbeat threshold from 31 to 61. This task will need to be performed on all Oracle RAC nodes in the cluster. We first need to modify the file /etc/sysconfig/o2cb and set O2CB_HEARTBEAT_THRESHOLD to 61:

#
# This is a configuration file for automatic startup of the O2CB
# driver.  It is generated by running /etc/init.d/o2cb configure.
# Please use that method to modify this file
#

# O2CB_ENABELED: 'true' means to load the driver on boot.
O2CB_ENABLED=true

# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=ocfs2

# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
                                     
                                        O2CB_HEARTBEAT_THRESHOLD=61
                                    

# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead.
O2CB_IDLE_TIMEOUT_MS=30000

# O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent
O2CB_KEEPALIVE_DELAY_MS=2000

# O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
O2CB_RECONNECT_DELAY_MS=2000
                                  

After modifying the file /etc/sysconfig/o2cb, we need to alter the o2cb configuration. Again, this should be performed on all Oracle RAC nodes in the cluster.

#  
                                   umount /u02
#  
                                   /etc/init.d/o2cb offline ocfs2
#  
                                   /etc/init.d/o2cb unload
#  
                                   /etc/init.d/o2cb configure
Configuring the O2CB driver.

This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot.  The current values will be shown in brackets ('[]').  Hitting
<ENTER> without typing an answer will keep that current value.  Ctrl-C
will abort.

Load O2CB driver on boot (y/n) [n]:  
                                   y
Cluster to start on boot (Enter "none" to clear) [ocfs2]:  
                                   ocfs2
Specify heartbeat dead threshold (>=7) [31]:  
                                   61
Specify network idle timeout in ms (>=5000) [30000]:  
                                   30000
Specify network keepalive delay in ms (>=1000) [2000]:  
                                   2000
Specify network reconnect delay in ms (>=2000) [2000]:  
                                   2000
Writing O2CB configuration: OK
Loading module "configfs": OK
Mounting configfs filesystem at /sys/kernel/config: OK
Loading module "ocfs2_nodemanager": OK
Loading module "ocfs2_dlm": OK
Loading module "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Starting O2CB cluster ocfs2: OK
                                

We can now check again to make sure the settings took place in for the o2cb cluster stack:

#  
                                   cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
61
                                

It is important to note that the value of 61 I used for the O2CB heartbeat threshold may not work for all configurations. In some cases, the O2CB heartbeat threshold value may have to be increased to as high as 601 in order to prevent OCFS2 from panicking the kernel.

Oracle Clusterware Installation: Running root.sh Fails on the Last Node

After the Oracle Clusterware install process, running root.sh on the last node will fail while attempting to configure vipca at the end of the script:

                                   
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
/u01/app/crs/jdk/jre//bin/java: error while loading
shared libraries: libpthread.so.0: 
cannot open shared object file: No such file or directory
                                

After receiving this error, please leave the OUI up. Do not hit the OK button on the "Execute Configuration Scripts" dialog until all of the issues described in this section have been resolved.

Note that srvctl will produce similar output until the workaround described below is performed.

This error occurs because these releases of the Linux kernel fix an old bug in the Linux threading that Oracle worked around using LD_ASSUME_KERNEL settings in both vipca and srvctl, this workaround is no longer valid on OEL5 or RHEL5 or SLES10 hence the failures.

To workaround this issue, edit vipca (in the CRS bin directory on all nodes) to undo the setting of LD_ASSUME_KERNEL. After the IF statement around line 120, add an unset command to ensure LD_ASSUME_KERNEL is not set as follows:

 

if [ "$arch" = "i686" -o "$arch" = "ia64" ]
then
    LD_ASSUME_KERNEL=2.4.19
    export LD_ASSUME_KERNEL
fi

                                          unset LD_ASSUME_KERNEL    <<== Line to be added
                                      


Similarly for srvctl (in both the CRS and, when installed, RDBMS and ASM bin directories on all nodes), unset LD_ASSUME_KERNEL by adding one line, around line 168 should look like this:

 

LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL

                                          unset LD_ASSUME_KERNEL    <<== Line to be added
                                      


Note: Remember to re-edit these files on all nodes:

  <ORA_CRS_HOME>/bin/vipca
  <ORA_CRS_HOME>/bin/srvctl 
  <RDBMS_HOME>/bin/srvctl
  <ASM_HOME>/bin/srvctl  # (If exists)

after applying the 10.2.0.2 or 10.2.0.3 patch sets, as these patchset will still include those settings unnecessary for OEL5 or RHEL5 or SLES10. This issue was raised with development and is fixed in the 10.2.0.4 patchsets.

Also note that we are explicitly unsetting LD_ASSUME_KERNEL and not merely commenting out its setting to handle a case where the user has it set in their environment (login shell).

After working around the LD_ASSUME_KERNEL issue above, vipca will now fail to run with the following error if the VIP IP's are in a non-routable range [ 10.x.x.x, 172.(16-31).x.x, or 192.168.x.x]:

                                   
[root@linux2 ~]#  
                                     $ORA_CRS_HOME/bin/vipca
Error 0(Native: listNetInterfaces:[3])
  [Error 0(Native: listNetInterfaces:[3])]
                                  
                                

There are several ways to workaround this issue. The goal to this workaround is to get the output of " $ORA_CRS_HOME/bin/oifcfg getif" to include both public and cluster_interconnect interfaces. If you try to run the above command, you will notice it provides nothing which means we have some work to do!

The first step is to identify the current interfaces and IP addresses:

[root@linux2 ~]#  
                                   $ORA_CRS_HOME/bin/oifcfg iflist
eth1  192.168.2.0
eth0  192.168.1.0
                                

Remember during the Oracle Clusterware install that 192.168.1.0 is my public interface while 192.168.2.0 is the cluster_interconnect interface.

Using this information, we can manually set the public / private interfaces accordingly using the setif option of the $ORA_CRS_HOME/bin/oifcfg command:

#  
                                   $ORA_CRS_HOME/bin/oifcfg setif -global eth0/192.168.1.0:public
#  
                                   $ORA_CRS_HOME/bin/oifcfg setif -global eth1/192.168.2.0:cluster_interconnect
                                

Let's know run the " $ORA_CRS_HOME/bin/oifcfg getif" command again to verify its output:

[root@linux2 ~]#  
                                   $ORA_CRS_HOME/bin/oifcfg getif
eth0  192.168.1.0  global  public
eth1  192.168.2.0  global  cluster_interconnect
                                

After resolving all of the issues above, manually re-run vipca (GUI) as root from the last node in which the errors occurred. Please keep in mind that vipca is a GUI and will need to set your DISPLAY variable accordingly to your X server:

# $ORA_CRS_HOME/bin/vipca

 


33. Conclusion

Oracle10g RAC allows the DBA to configure a database solution with superior fault tolerance and load balancing. For those DBAs, however, that want to become more familiar with the features and benefits of Oracle10g RAC will find the costs of configuring even a small RAC cluster costing in the range of US$15,000 to US$20,000.

This article has hopefully given you an economical solution to setting up and configuring an inexpensive Oracle10g Release 2 RAC Cluster using Oracle Enterprise Linux and iSCSI technology. The RAC solution presented in this article can be put together for around US$2,700 and will provide the DBA with a fully functional Oracle10g Release 2 RAC cluster. While the hardware used for this article should be stable enough for educational purposes, it should never be considered for a production environment.

 


34. Acknowledgements

An article of this magnitude and complexity is generally not the work of one person alone. Although I was able to author and successfully demonstrate the validity of the components that make up this configuration, there are several other individuals that deserve credit in making this article a success.

First, I would like to thank Bane Radulovic from the Server BDE Team at Oracle. Bane not only introduced me to Openfiler, but shared with me his experience and knowledge of the product and how to best utilize it for Oracle RAC. His research and hard work made the task of configuring Openfiler seamless. Bane was also involved with hardware recommendations and testing.

A special thanks to K Gopalakrishnan for his assistance in delivering the Oracle RAC 10g Overview section of this article. In this section, much of the content regarding the history of Oracle RAC can be found in his very popular book Oracle Database 10g Real Application Clusters Handbook . This book comes highly recommended for both DBAs and Developers wanting to successfully implement Oracle RAC and fully understand how many of the advanced services like Cache Fusion and Global Resource Directory operate.

I would next like to thank Oracle ACE Werner Puschitz for his outstanding work on "Installing Oracle Database 10g with Real Application Cluster (RAC) on Red Hat Enterprise Linux Advanced Server 3". This article, along with several others of his, provided information on Oracle RAC that could not be found in any other Oracle documentation. Without his hard work and research into issues like configuring OCFS2 and ASMLib, this article may have never come to fruition. If you are interested in examining technical articles on Linux internals and in-depth Oracle configurations written by Werner Puschitz, please visit his website at www.puschitz.com.

Also, thanks to Tzvika Lemel for his comments and suggestions on using Oracle's Cluster Verification Utility (CVU).

Lastly, I would like to express my appreciation to the following vendors for generously supplying the hardware for this article; Stallard Technologies, Inc., Seagate, Avocent Corporation, Intel, D-Link, SIIG, and LaCie.

 


Jeffrey Hunter [ www.idevelopment.info] graduated from Stanislaus State University in Turlock, California, with a Bachelor's degree in Computer Science and has been a senior DBA and software engineer for over 16 years. He is an Oracle Certified Professional, Java Development Certified Professional, an Oracle ACE, and author, and currently works as a Senior Database Administrator for The DBA Zone, Inc. Jeff's work includes advanced performance tuning, Java and PL/SQL programming, capacity planning, database security, and physical / logical database design in a UNIX, Linux, and Windows NT environment. Jeff's other interests include mathematical encryption theory, programming language processors (compilers and interpreters) in Java and C, LDAP, writing web-based database administration tools, and of course Linux.

Send us your comments

 

Page 1   Page 2   Page 3