Page 1  Page 2  Page 3

Build Your Own Oracle RAC Cluster on Oracle Enterprise Linux and iSCSI (Continued)

For development and testing only; production deployments will not be supported!


11. Create "oracle" User and Directories

Perform the following tasks on both Oracle RAC nodes in the cluster!

We will be using the Oracle Cluster File System, Release 2 (OCFS2) to store the files required to be shared for the Oracle Clusterware software. When using OCFS2, the UID of the UNIX user "oracle" and GID of the UNIX group "oinstall" must be the same on both of the Oracle RAC nodes in the cluster. If either the UID or GID are different, the files on the OCFS2 file system will show up as "unowned" or may even be owned by a different user. For this article, I will use 175 for the "oracle" UID and 115 for the "oinstall" GID.

Create Group and User for Oracle

Lets start this section by creating the UNIX oinstall and dba group and oracle user account:

# groupadd -g 115 oinstall
# groupadd -g 116 dba
# useradd -m -u 175 -g oinstall -G dba -d /home/oracle -s /bin/bash -c "Oracle Software Owner" oracle
# id oracle
uid=175(oracle) gid=115(oinstall) groups=115(oinstall),116(dba)

Set the password for the oracle account:

# passwd oracle
Changing password for user oracle.
New UNIX password: xxxxxxxxxxx
Retype new UNIX password: xxxxxxxxxxx
passwd: all authentication tokens updated successfully.

Note that members of the UNIX group oinstall are considered the "owners" of the Oracle software. Members of the dba group can administer Oracle databases, for example starting up and shutting down databases. In this article, we are creating the oracle user account to have both responsibilities!

Verify That the User nobody Exists

Before installing the Oracle software, complete the following procedure to verify that the user nobody exists on the system:

  1. To determine if the user exists, enter the following command:
    # id nobody
    uid=99(nobody) gid=99(nobody) groups=99(nobody)

    If this command displays information about the nobody user, then you do not have to create that user.

  2. If the user nobody does not exist, then enter the following command to create it:
    # /usr/sbin/useradd nobody
  3. Repeat this procedure on all the other Oracle RAC nodes in the cluster.

Create the Oracle Base Directory

The next step is to create a new directory that will be used to store the Oracle Database software. When configuring the oracle user's environment (later in this section) we will be assigning the location of this directory to the $ORACLE_BASE environment variable.

Note that this guide adheres to the Optimal Flexible Architecture (OFA) for naming conventions used in creating the directory structure.

The following assumes that the directories are being created in the root file system. Please note that this is being done for the sake of simplicity and is not recommended as a general practice. Normally, these directories would be created on a separate file system.

After the directory is created, you must then specify the correct owner, group, and permissions for it. Perform the following on both Oracle RAC nodes:

# mkdir -p /u01/app/oracle
# chown -R oracle:oinstall /u01/app/oracle
# chmod -R 775 /u01/app/oracle

At the end of this procedure, you will have the following:

  • /u01 owned by root.
  • /u01/app owned by root.
  • /u01/app/oracle owned by oracle:oinstall with 775 permissions. This ownership and permissions enables the OUI to create the oraInventory directory, in the path /u01/app/oracle/oraInventory.

Create the Oracle Clusterware Home Directory

Next, create a new directory that will be used to store the Oracle Clusterware software. When configuring the oracle user's environment (later in this section) we will be assigning the location of this directory to the $ORA_CRS_HOME environment variable.

As noted in the previous section, the following assumes that the directories are being created in the root file system. This is being done for the sake of simplicity and is not recommended as a general practice. Normally, these directories would be created on a separate file system.

After the directory is created, you must then specify the correct owner, group, and permissions for it. Perform the following on both Oracle RAC nodes:

# mkdir -p /u01/app/crs
# chown -R oracle:oinstall /u01/app/crs
# chmod -R 775 /u01/app/crs

At the end of this procedure, you will have the following:

  • /u01 owned by root.
  • /u01/app owned by root.
  • /u01/app/crs owned by oracle:oinstall with 775 permissions. These permissions are required for Oracle Clusterware installation and are changed during the installation process.

Create Mount Point for OCFS2 / Clusterware

Let's now create the mount point for the Oracle Cluster File System, Release 2 (OCFS2) that will be used to store the two Oracle Clusterware shared files.

As noted in the previous section, the following assumes that the directories are being created in the root file system. This is being done for the sake of simplicity and is not recommended as a general practice. Normally, these directories would be created on a separate file system. Perform the following on both Oracle RAC nodes:

# mkdir -p /u02/oradata/orcl
# chown -R oracle:oinstall /u02/oradata/orcl
# chmod -R 775 /u02/oradata/orcl

Create Login Script for oracle User Account

After creating the "oracle" UNIX user account on both nodes, make sure that you are logged in as the oracle user and verify that the environment is setup correctly by using the .bash_profile provided in this section.

Note: When you are setting the Oracle environment variables for each Oracle RAC node, ensure to assign each RAC node a unique Oracle SID! For this example, I used:

  • linux1: ORACLE_SID=orcl1
  • linux2: ORACLE_SID=orcl2

Login to each node as the oracle user account:

# su - oracle
.bash_profile for "oracle" User Account
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
      . ~/.bashrc
fi

alias ls="ls -FA"

export JAVA_HOME=/usr/local/java

# User specific environment and startup programs
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/10.2.0/db_1
export ORA_CRS_HOME=/u01/app/crs
export ORACLE_PATH=$ORACLE_BASE/common/oracle/sql:.:$ORACLE_HOME/rdbms/admin
export CV_JDKHOME=/usr/local/java

# Each RAC node must have a unique ORACLE_SID. (i.e. orcl1, orcl2,...)
export ORACLE_SID=orcl1

export PATH=.:${JAVA_HOME}/bin:${PATH}:$HOME/bin:$ORACLE_HOME/bin
export PATH=${PATH}:/usr/bin:/bin:/usr/bin/X11:/usr/local/bin
export PATH=${PATH}:$ORACLE_BASE/common/oracle/bin
export ORACLE_TERM=xterm
export TNS_ADMIN=$ORACLE_HOME/network/admin
export ORA_NLS10=$ORACLE_HOME/nls/data
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$ORACLE_HOME/oracm/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
export CLASSPATH=$ORACLE_HOME/JRE
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/rdbms/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/network/jlib
export THREADS_FLAG=native
export TEMP=/tmp
export TMPDIR=/tmp

 


12. Configure the Linux Servers for Oracle

Perform the following configuration procedures on both Oracle RAC nodes in the cluster!

The kernel parameters discussed in this section will need to be defined on both Oracle RAC nodes in the cluster every time the machine is booted. This section provides very detailed information about setting those kernel parameters required for Oracle. Instructions for placing them in a startup script (/etc/sysctl.conf) are included in Section 15 ("All Startup Commands for Both Oracle RAC Nodes").

Overview

This section focuses on configuring both Oracle RAC Linux servers - getting each one prepared for the Oracle RAC 10g installation. This includes verifying enough swap space, setting shared memory and semaphores, setting the maximum amount of file handles, setting the IP local port range, setting shell limits for the oracle user, activating all kernel parameters for the system, and finally how to verify the correct date and time for both nodes in the cluster.

Throughout this section you will notice that there are several different ways to configure (set) these parameters. For the purpose of this article, I will be making all changes permanent (through reboots) by placing all commands in the /etc/sysctl.conf file.

Swap Space Considerations

  • Installing Oracle Database 10g Release 2 requires a minimum of 512MB of memory. (Note: An inadequate amount of swap during the installation will cause the Oracle Universal Installer to either "hang" or "die")
  • To check the amount of memory you have, type:
    # cat /proc/meminfo | grep MemTotal
    MemTotal: 1033116 kB
  • To check the amount of swap you have allocated, type:
    # cat /proc/meminfo | grep SwapTotal
    SwapTotal: 2031608 kB
  • If you have less than 512MB of memory (between your RAM and SWAP), you can add temporary swap space by creating a temporary swap file. This way you do not have to use a raw device or even more drastic, rebuild your system.

    As root, make a file that will act as additional swap space, let's say about 300MB:

    # dd if=/dev/zero of=tempswap bs=1k count=300000

    Now we should change the file permissions:

    # chmod 600 tempswap

    Finally we format the "partition" as swap and add it to the swap space:

    # mke2fs tempswap
    # mkswap tempswap
    # swapon tempswap

Setting Shared Memory

Shared memory allows processes to access common structures and data by placing them in a shared memory segment. This is the fastest form of inter-process communications (IPC) available, mainly due to the fact that no kernel involvement occurs when data is being passed between the processes. Data does not need to be copied between processes.

Oracle makes use of shared memory for its Shared Global Area (SGA) which is an area of memory that is shared by all Oracle backup and foreground processes. Adequate sizing of the SGA is critical to Oracle performance because it is responsible for holding the database buffer cache, shared SQL, access paths, and so much more.

To determine all shared memory limits, use the following:

# ipcs -lm
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 32768
max total shared memory (kbytes) = 8388608
min seg size (bytes) = 1
Setting SHMMAX

The SHMMAX parameters defines the maximum size (in bytes) for a shared memory segment. The Oracle SGA is comprised of shared memory and it is possible that incorrectly setting SHMMAX could limit the size of the SGA. When setting SHMMAX, keep in mind that the size of the SGA should fit within one shared memory segment. An inadequate SHMMAX setting could result in the following:

ORA-27123: unable to attach to shared memory segment
You can determine the value of SHMMAX by performing the following:
# cat /proc/sys/kernel/shmmax
33554432
The default value for SHMMAX is 32MB. This size is often too small to configure the Oracle SGA. I generally set the SHMMAX parameter to 2GB using the following methods:
  • You can alter the default setting for SHMMAX without rebooting the machine by making the changes directly to the /proc file system (/proc/sys/kernel/shmmax) by using the following command:
    # sysctl -w kernel.shmmax=2147483648
                
  • You should then make this change permanent by inserting the kernel parameter in the /etc/sysctl.conf startup file:
    # echo "kernel.shmmax=2147483648" >> /etc/sysctl.conf
                

Setting SHMMNI

We now look at the SHMMNI parameters. This kernel parameter is used to set the maximum number of shared memory segments system wide. The default value for this parameter is 4096.

You can determine the value of SHMMNI by performing the following:

# cat /proc/sys/kernel/shmmni
4096
The default setting for SHMMNI should be adequate for your Oracle RAC 10g Release 2 installation.

Setting SHMALL

Finally, we look at the SHMALL shared memory kernel parameter. This parameter controls the total amount of shared memory (in pages) that can be used at one time on the system. In short, the value of this parameter should always be at least:

ceil(SHMMAX/PAGE_SIZE)
The default size of SHMALL is 2097152 and can be queried using the following command:
# cat /proc/sys/kernel/shmall
2097152
The default setting for SHMALL should be adequate for our Oracle RAC 10g Release 2 installation.

(Note: The page size in Red Hat Linux on the i386 platform is 4,096 bytes. You can, however, use bigpages which supports the configuration of larger memory page sizes.)

Setting Semaphores

Now that you have configured your shared memory settings, it is time to configure your semaphores. The best way to describe a "semaphore" is as a counter that is used to provide synchronization between processes (or threads within a process) for shared resources like shared memory. Semaphore sets are supported in UNIX System V where each one is a counting semaphore. When an application requests semaphores, it does so using "sets."

To determine all semaphore limits, use the following:

# ipcs -ls
------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767
You can also use the following command:
# cat /proc/sys/kernel/sem
250 32000 32 128
Setting SEMMSL

The SEMMSL kernel parameter is used to control the maximum number of semaphores per semaphore set.

Oracle recommends setting SEMMSL to the largest PROCESS instance parameter setting in the init.ora file for all databases on the Linux system plus 10. Also, Oracle recommends setting the SEMMSL to a value of no less than 100.

Setting SEMMNI

The SEMMNI kernel parameter is used to control the maximum number of semaphore sets in the entire Linux system. Oracle recommends setting the SEMMNI to a value of no less than 100.

Setting SEMMNS

The SEMMNS kernel parameter is used to control the maximum number of semaphores (not semaphore sets) in the entire Linux system.

Oracle recommends setting the SEMMNS to the sum of the PROCESSES instance parameter setting for each database on the system, adding the largest PROCESSES twice, and then finally adding 10 for each Oracle database on the system.

Use the following calculation to determine the maximum number of semaphores that can be allocated on a Linux system. It will be the lesser of:

SEMMNS -or- (SEMMSL * SEMMNI)

Setting SEMOPM

The SEMOPM kernel parameter is used to control the number of semaphore operations that can be performed per semop system call.

The semop system call (function) provides the ability to do operations for multiple semaphores with one semop system call. A semaphore set can have the maximum number of SEMMSL semaphores per semaphore set and is therefore recommended to set SEMOPM equal to SEMMSL.

Oracle recommends setting the SEMOPM to a value of no less than 100.

Setting Semaphore Kernel Parameters

Finally, we see how to set all semaphore parameters using several methods. In the following, the only parameter I care about changing (raising) is SEMOPM. All other default settings should be sufficient for our example installation.

  • You can alter the default setting for all semaphore settings without rebooting the machine by making the changes directly to the /proc file system (/proc/sys/kernel/sem) by using the following command:
    # sysctl -w kernel.sem="250 32000 100 128"
  • You should then make this change permanent by inserting the kernel parameter in the /etc/sysctl.conf startup file:
    # echo "kernel.sem=250 32000 100 128" >> /etc/sysctl.conf
                

Setting File Handles

When configuring our Red Hat Linux server, it is critical to ensure that the maximum number of file handles is sufficiently large. The setting for file handles denotes the number of open files that you can have on the Linux system.

Use the following command to determine the maximum number of file handles for the entire system:

# cat /proc/sys/fs/file-max
102563
Oracle recommends that the file handles for the entire system be set to at least 65536.
  • You can alter the default setting for the maximum number of file handles without rebooting the machine by making the changes directly to the /proc file system (/proc/sys/fs/file-max) using the following:
    # sysctl -w fs.file-max=65536
              

  • You should then make this change permanent by inserting the kernel parameter in the /etc/sysctl.conf startup file:
    # echo "fs.file-max=65536" >> /etc/sysctl.conf
              
You can query the current usage of file handles by using the following:
# cat /proc/sys/fs/file-nr
825 0 65536
The file-nr file displays three parameters: total allocated file handles, currently used file handles, and maximum file handles that can be allocated.

(Note: If you need to increase the value in /proc/sys/fs/file-max, then make sure that the ulimit is set properly. Usually for 2.4.20 it is set to unlimited. Verify the ulimit setting my issuing the ulimit command:

# ulimit
unlimited

Setting IP Local Port Range

Configure the system to allow a local port range of 1024 through 65000.

Use the following command to determine the value of ip_local_port_range:

# cat /proc/sys/net/ipv4/ip_local_port_range
32768 61000
The default value for ip_local_port_range is ports 32768 through 61000. Oracle recommends a local port range of 1024 to 65000.

  • You can alter the default setting for the local port range without rebooting the machine by making the changes directly to the /proc file system (/proc/sys/net/ipv4/ip_local_port_range) by using the following command:
    # sysctl -w net.ipv4.ip_local_port_range="1024 65000"
  • You should then make this change permanent by inserting the kernel parameter in the /etc/sysctl.conf startup file:
    # echo "net.ipv4.ip_local_port_range = 1024 65000" >> /etc/sysctl.conf

Setting Shell Limits for the oracle User

To improve the performance of the software on Linux systems, Oracle recommends you increase the following shell limits for the oracle user:

Shell Limit
Item in limits.conf
Hard Limit
Maximum number of open file descriptors
nofile
65536
Maximum number of processes available to a single user
nproc
16384

To make these changes, run the following as root:
cat >> /etc/security/limits.conf <<EOF
oracle soft nproc 2047
oracle hard nproc 16384
oracle soft nofile 1024
oracle hard nofile 65536
EOF
cat >> /etc/pam.d/login <<EOF
session required /lib/security/pam_limits.so
EOF

Update the default shell startup file for the "oracle" UNIX account.

  • For the Bourne, Bash, or Korn shell, add the following lines to the /etc/profile file by running the following command:
    cat >> /etc/profile <<EOF
    if [ \$USER = "oracle" ]; then
    if [ \$SHELL = "/bin/ksh" ]; then
    ulimit -p 16384
    ulimit -n 65536
    else
    ulimit -u 16384 -n 65536
    fi
    umask 022
    fi
    EOF
  • For the C shell (csh or tcsh), add the following lines to the /etc/csh.login file by running the following command:
    cat >> /etc/csh.login <<EOF
    if ( \$USER == "oracle" ) then
    limit maxproc 16384
    limit descriptors 65536
    endif
    EOF

Activating All Kernel Parameters for the System

At this point, we have covered all of the required Linux kernel parameters needed for a successful Oracle installation and configuration. Within each section above, we configured the Linux system to persist each of the kernel parameters on system startup by placing them all in the /etc/sysctl.conf file.

We could reboot at this point to ensure all of these parameters are set in the kernel or we could simply "run" the /etc/sysctl.conf file by running the following command as root. Perform this on each node of the cluster!
# sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 262144
net.core.wmem_max = 262144
kernel.shmmax = 2147483648
kernel.sem = 250 32000 100 128
fs.file-max = 65536
net.ipv4.ip_local_port_range = 1024 65000

Setting the Correct Date and Time on All Cluster Nodes

During the installation of Oracle Clusterware, the Database, and the Companion CD, the Oracle Universal Installer (OUI) first installs the software to the local node running the installer (i.e. linux1). The software is then copied remotely to all of the remaining nodes in the cluster (i.e. linux2). During the remote copy process, the OUI will execute the UNIX "tar" command on each of the remote nodes to extract the files that were archived and copied over. If the date and time on the node performing the install is greater than that of the node it is copying to, the OUI will throw an error from the "tar" command indicating it is attempting to extract files stamped with a time in the future:

Error while copying directory 
    /u01/app/crs with exclude file list 'null' to nodes 'linux2'.
[PRKC-1002 : All the submitted commands did not execute successfully]
---------------------------------------------
linux2:
   /bin/tar: ./bin/lsnodes: time stamp 2006-09-13 09:21:34 is 735 s in the future
   /bin/tar: ./bin/olsnodes: time stamp 2006-09-13 09:21:34 is 735 s in the future
   ...(more errors on this node)

Please note that although this would seem like a severe error from the OUI, it can safely be disregarded as a warning. The "tar" command DOES actually extract the files; however, when you perform a listing of the files (using ls -l) on the remote node, they will be missing the time field until the time on the server is greater than the timestamp of the file.

Before starting any of the above noted installations, ensure that each member node of the cluster is set as closely as possible to the same date and time. Oracle strongly recommends using the Network Time Protocol feature of most operating systems for this purpose, with both Oracle RAC nodes using the same reference Network Time Protocol server.

Accessing a Network Time Protocol server, however, may not always be an option. In this case, when manually setting the date and time for the nodes in the cluster, ensure that the date and time of the node you are performing the software installations from (linux1) is less than all other nodes in the cluster (linux2). I generally use a 20 second difference as shown in the following example:

Setting the date and time from linux1:

# date -s "6/25/2007 23:00:00"

Setting the date and time from linux2:

# date -s "6/25/2007 23:00:20"

The two-node RAC configuration described in this article does not make use of a Network Time Protocol server.

 


13. Configure the hangcheck-timer Kernel Module

Perform the following configuration procedures on both Oracle RAC nodes in the cluster!

Oracle9i Release 1 (9.0.1) and Oracle9i Release 2 ( 9.2.0.1) used a userspace watchdog daemon called watchdogd to monitor the health of the cluster and to restart a RAC node in case of a failure. Starting with Oracle9i Release 2 (9.2.0.2) (and still available in Oracle 10g Release 2), the watchdog daemon has been deprecated by a Linux kernel module named hangcheck-timer which addresses availability and reliability problems much better. The hang-check timer is loaded into the Linux kernel and checks if the system hangs. It will set a timer and check the timer after a certain amount of time. There is a configurable threshold to hang-check that, if exceeded will reboot the machine. Although the hangcheck-timer module is not required for Oracle Clusterware (Cluster Manager) operation, it is highly recommended by Oracle.

The hangcheck-timer.ko Module

The hangcheck-timer module uses a kernel-based timer that periodically checks the system task scheduler to catch delays in order to determine the health of the system. If the system hangs or pauses, the timer resets the node. The hangcheck-timer module uses the Time Stamp Counter (TSC) CPU register, which is incremented at each clock signal. The TCS offers much more accurate time measurements because this register is updated by the hardware automatically.

Much more information about the hangcheck-timer project can be found here.

Installing the hangcheck-timer.ko Module

The hangcheck-timer was normally shipped only by Oracle, however, this module is now included with Red Hat Linux AS starting with kernel versions 2.4.9-e.12 and higher. The hangcheck-timer should already be included. Use the following to ensure that you have the module included:

# find /lib/modules -name "hangcheck-timer.ko"
/lib/modules/2.6.9-55.0.0.0.2.ELhugemem/kernel/drivers/char/hangcheck-timer.ko
/lib/modules/2.6.9-55.0.0.0.2.ELsmp/kernel/drivers/char/hangcheck-timer.ko
/lib/modules/2.6.9-55.0.0.0.2.EL/kernel/drivers/char/hangcheck-timer.ko
In the above output, we care about the hangcheck timer object (hangcheck-timer.ko) in the /lib/modules/2.6.9-55.0.0.0.2.ELhugemem/kernel/drivers/char directory since this is the kernel we are running.

Configuring and Loading the hangcheck-timer Module

There are two key parameters to the hangcheck-timer module:

  • hangcheck-tick: This parameter defines the period of time between checks of system health. The default value is 60 seconds; Oracle recommends setting it to 30 seconds.
  • hangcheck-margin: This parameter defines the maximum hang delay that should be tolerated before hangcheck-timer resets the RAC node. It defines the margin of error in seconds. The default value is 180 seconds; Oracle recommends setting it to 180 seconds.
Note: The two hangcheck-timer module parameters indicate how long a RAC node must hang before it will reset the system. A node reset will occur when the following is true:
system hang time > (hangcheck_tick + hangcheck_margin)
Configuring Hangcheck Kernel Module Parameters

Each time the hangcheck-timer kernel module is loaded (manually or by Oracle), it needs to know what value to use for each of the two parameters we just discussed: (hangcheck-tick and hangcheck-margin). These values need to be available after each reboot of the Linux server. To do that, make an entry with the correct values to the /etc/modprobe.conf file as follows:

# su -
# echo "options hangcheck-timer hangcheck_tick=30 hangcheck_margin=180" >> /etc/modprobe.conf
Each time the hangcheck-timer kernel module gets loaded, it will use the values defined by the entry I made in the /etc/modprobe.conf file.

Manually Loading the Hangcheck Kernel Module for Testing

Oracle is responsible for loading the hangcheck-timer kernel module when required. For that reason, it is not required to perform a modprobe or insmod of the hangcheck-timer kernel module in any of the startup files (i.e. /etc/rc.local).

It is only out of pure habit that I continue to include a modprobe of the hangcheck-timer kernel module in the /etc/rc.local file. Someday I will get over it, but realize that it does not hurt to include a modprobe of the hangcheck-timer kernel module during startup.

So to keep myself sane and able to sleep at night, I always configure the loading of the hangcheck-timer kernel module on each startup as follows:

# echo "/sbin/modprobe hangcheck-timer" >> /etc/rc.local
      

(Note: You don't have to manually load the hangcheck-timer kernel module using modprobe or insmod after each reboot. The hangcheck-timer module will be loaded by Oracle automatically when needed.)

Now, to test the hangcheck-timer kernel module to verify it is picking up the correct parameters we defined in the /etc/modprobe.conf file, use the modprobe command. Although you could load the hangcheck-timer kernel module by passing it the appropriate parameters (e.g. insmod hangcheck-timer hangcheck_tick=30 hangcheck_margin=180), we want to verify that it is picking up the options we set in the /etc/modprobe.conf file.

To manually load the hangcheck-timer kernel module and verify it is using the correct values defined in the /etc/modprobe.conf file, run the following command:

# su -
# modprobe hangcheck-timer
# grep Hangcheck /var/log/messages | tail -2
Jun 25 18:18:31 linux1 kernel: Hangcheck: starting hangcheck timer 0.9.0 (tick is 30 seconds, margin is 180 seconds).
Jun 25 18:18:31 linux1 kernel: Hangcheck: Using monotonic_clock().
      

 


14. Configure RAC Nodes for Remote Access

Perform the following configuration procedures on both Oracle RAC nodes in the cluster!

Before you can install and use Oracle Real Application clusters, you must configure either secure shell (SSH) or remote shell (RSH) for the "oracle" UNIX user account on all cluster nodes. The goal here is to setup user equivalence for the "oracle" UNIX user account. User equivalence enables the "oracle" UNIX user account to access all other nodes in the cluster (running commands and copying files) without the need for a password. This can be configured using either SSH or RSH where SSH is the preferred method. Oracle added support in 10g Release 1 for using the SSH tool suite for setting up user equivalence. Before Oracle Database 10g, user equivalence had to be configured using remote shell.

Note that if the Oracle Universal Installer in 10g does not detect the presence of the secure shell tools (ssh and scp), it will attempt to use the remote shell tools instead (rsh and rcp).

So, why do we have to setup user equivalence? Installing Oracle Clusterware and the Oracle Database software is only performed from one node in a RAC cluster. When running the Oracle Universal Installer (OUI) on that particular node, it will use the ssh and scp commands (or rsh and rcp commands if using remote shell) to run remote commands on and copy files (the Oracle software) to all other nodes within the RAC cluster. The "oracle" UNIX user account on the node running the OUI (runInstaller) must be trusted by all other nodes in your RAC cluster. This means that you must be able to run the secure shell commands (ssh or scp) or the remote shell commands (rsh and rcp) on the Linux server you will be running the OUI from against all other Linux servers in the cluster without being prompted for a password.

Note that the use of secure shell or remote shell is not required for normal RAC operation. This configuration, however, must to be enabled for RAC and patchset installations as well as creating the clustered database.

The first step is to decide which method of remote access to use - secure shell or remote shell. Both of them have their pros and cons. Remote shell, for example, is extremely easy to setup and configure. It takes fewer steps to construct and is always available in the terminal session when logging on to the trusted node (the node you will be performing the install from). The connection to the remote nodes, however, is not secure during the installation and any patching process. Secure shell on the other hand does provide a secure connection when installing and patching but does require a greater number of steps. It also needs to be enabled in the terminal session each time the oracle user logs in to the trusted node. The official Oracle documentation only describes the steps for setting up secure shell and is considered the preferred method.

Both methods for configuring user equivalence are described in the following two sections:

Using the Secure Shell Method
This section describes how to configure OpenSSH version 3.

To determine if SSH is installed and running, enter the following command:

# pgrep sshd
2808
If SSH is running, then the response to this command is a list of process ID number(s). Please run this command on both Oracle RAC nodes in the cluster to verify the SSH daemons are installed and running!

To find out more about SSH, refer to the man page:
# man ssh

Creating RSA and DSA Keys on Both Oracle RAC Nodes
The first step in configuring SSH is to create RSA and DSA key pairs on both Oracle RAC nodes in the cluster. The command to do this will create a public and private key for both RSA and DSA (for a total of four keys per node). The content of the RSA and DSA public keys will then need to be copied into an authorized key file which is then distributed to both Oracle RAC nodes in the cluster.

Use the following steps to create the RSA and DSA key pairs. Please note that these steps will need to be completed on both Oracle RAC nodes in the cluster:

  1. Logon as the "oracle" UNIX user account.
    # su - oracle

  2. If necessary, create the .ssh directory in the "oracle" user's home directory and set the correct permissions on it:
    $ mkdir -p ~/.ssh
    $ chmod 700 ~/.ssh

  3. Enter the following command to generate an RSA key pair (public and private key) for version 3 of the SSH protocol:
    $ /usr/bin/ssh-keygen -t rsa
    At the prompts:
    • Accept the default location for the key files.
    • Enter and confirm a pass phrase. This should be different from the "oracle" UNIX user account password however it is not a requirement.

    This command will write the public key to the ~/.ssh/id_rsa.pub file and the private key to the ~/.ssh/id_rsa file. Note that you should never distribute the private key to anyone!

  4. Enter the following command to generate a DSA key pair (public and private key) for version 3 of the SSH protocol:
    $ /usr/bin/ssh-keygen -t dsa
    At the prompts:
    • Accept the default location for the key files.
    • Enter and confirm a pass phrase. This should be different from the "oracle" UNIX user account password however it is not a requirement.

    This command will write the public key to the ~/.ssh/id_dsa.pub file and the private key to the ~/.ssh/id_dsa file. Note that you should never distribute the private key to anyone!

  5. Repeat the above steps for both Oracle RAC nodes in the cluster.

Now that both Oracle RAC nodes contain a public and private key for both RSA and DSA, you will need to create an authorized key file on one of the nodes. An authorized key file is nothing more than a single file that contains a copy of everyone's (every node's) RSA and DSA public key. Once the authorized key file contains all of the public keys, it is then distributed to all other nodes in the cluster.

Complete the following steps on one of the nodes in the cluster to create and then distribute the authorized key file. For the purpose of this article, I am using linux1:

  1. First, determine if an authorized key file already exists on the node (~/.ssh/authorized_keys). In most cases this will not exist since this article assumes you are working with a new install. If the file doesn't exist, create it now:
    $ touch ~/.ssh/authorized_keys
    $ cd ~/.ssh
    $ ls -l *.pub
    -rw-r--r--  1 oracle oinstall 603 Aug 31 23:40 id_dsa.pub
    -rw-r--r--  1 oracle oinstall 223 Aug 31 23:36 id_rsa.pub
    The listing above should show the id_rsa.pub and id_dsa.pub public keys created in the previous section.

  2. In this step, use SSH to copy the content of the ~/.ssh/id_rsa.pub and ~/.ssh/id_dsa.pub public key from both Oracle RAC nodes in the cluster to the authorized key file just created (~/.ssh/authorized_keys). Again, this will be done from linux1. You will be prompted for the "oracle" UNIX user account password for both Oracle RAC nodes accessed. Notice that when using SSH to access the node you are on (linux1), the first time prompts for the "oracle" UNIX user account password. The second attempt at accessing this node will prompt for the pass phrase used to unlock the private key. For any of the remaining nodes, it will always ask for the "oracle" UNIX user account password.

    The following example is being run from linux1 and assumes a two-node cluster, with nodes linux1 and linux2:

    $ ssh linux1 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    The authenticity of host 'linux1 (192.168.1.100)' can't be established.
    RSA key fingerprint is 61:8a:f9:9e:28:a2:b7:d3:70:8d:dc:76:ca:d9:23:43.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'linux1,192.168.1.100' (RSA) to the list of known hosts.
    oracle@linux1's password: xxxxx
    
    $ ssh linux1 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    Enter passphrase for key '/home/oracle/.ssh/id_rsa': xxxxx
    
    $ ssh linux2 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    The authenticity of host 'linux2 (192.168.1.101)' can't be established.
    RSA key fingerprint is 84:2b:bd:eb:31:2c:23:36:55:c2:ee:54:d2:23:6a:e4.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'linux2,192.168.1.101' (RSA) to the list of known hosts.
    oracle@linux2's password: xxxxx
    
    $ ssh linux2 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    oracle@linux2's password: xxxxx
    Note: The first time you use SSH to connect to a node from a particular system, you may see a message similar to the following:
    The authenticity of host 'linux1 (192.168.1.100)' can't be established.
    RSA key fingerprint is 61:8a:f9:9e:28:a2:b7:d3:70:8d:dc:76:ca:d9:23:43.
    Are you sure you want to continue connecting (yes/no)? yes
    Enter yes at the prompt to continue. You should not see this message again when you connect from this system to the same node.

  3. At this point, we have the content of the RSA and DSA public keys from every node in the cluster in the authorized key file (~/.ssh/authorized_keys) on linux1. We now need to copy it to the remaining nodes in the cluster. In our two-node cluster example, the only remaining node is linux2. Use the scp command to copy the authorized key file to all remaining nodes in the cluster:
    $ scp ~/.ssh/authorized_keys linux2:.ssh/authorized_keys
    oracle@linux2's password: xxxxx
    authorized_keys                                     100% 1652 1.6KB/s 00:00

  4. Change the permission of the authorized key file for both Oracle RAC nodes in the cluster by logging into the node and running the following:
    $ chmod 600 ~/.ssh/authorized_keys

  5. At this point, if you use ssh to log in to or run a command on another node, you are prompted for the pass phrase that you specified when you created the DSA key. For example, test the following from linux1:
    $ ssh linux1 hostname
    Enter passphrase for key '/home/oracle/.ssh/id_rsa': xxxxx
    linux1
    
    $ ssh linux2 hostname
    Enter passphrase for key '/home/oracle/.ssh/id_rsa': xxxxx
    linux2
    Note: If you see any other messages or text, apart from the host name, then the Oracle installation can fail. Make any changes required to ensure that only the host name is displayed when you enter these commands. You should ensure that any part of a login script(s) that generate any output, or ask any questions, are modified so that they act only when the shell is an interactive shell.

Enabling SSH User Equivalency for the Current Shell Session
When running the OUI, it will need to run the secure shell tool commands (ssh and scp) without being prompted for a pass phrase. Even though SSH is configured on both Oracle RAC nodes in the cluster, using the secure shell tool commands will still prompt for a pass phrase. Before running the OUI, you need to enable user equivalence for the terminal session you plan to run the OUI from. For the purpose of this article, all Oracle installations will be performed from linux1.

User equivalence will need to be enabled on any new terminal shell session before attempting to run the OUI. If you log out and log back in to the node you will be performing the Oracle installation from, you must enable user equivalence for the terminal shell session as this is not done by default.

To enable user equivalence for the current terminal shell session, perform the following steps:

  1. Logon to the node where you want to run the OUI from (linux1) as the "oracle" UNIX user account.
    # su - oracle

  2. Enter the following commands:
    $ exec /usr/bin/ssh-agent $SHELL
    $ /usr/bin/ssh-add
    Enter passphrase for /home/oracle/.ssh/id_rsa: xxxxx
    Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)
    Identity added: /home/oracle/.ssh/id_dsa (/home/oracle/.ssh/id_dsa)
    At the prompts, enter the pass phrase for each key that you generated.

  3. If SSH is configured correctly, you will be able to use the ssh and scp commands without being prompted for a password or pass phrase from this terminal session:
    $ ssh linux1 "date;hostname"
    Mon Jun 25 18:24:23 EDT 2007
    linux1
    
    $ ssh linux2 "date;hostname"
    Mon Jun 25 18:26:15 EDT 2007
    linux2
    Note: The commands above should display the date set on both Oracle RAC nodes along with its hostname. If any of the nodes prompt for a password or pass phrase then verify that the ~/.ssh/authorized_keys file on that node contains the correct public keys. Also, if you see any other messages or text, apart from the date and hostname, then the Oracle installation can fail. Make any changes required to ensure that only the date is displayed when you enter these commands. You should ensure that any part of a login script(s) that generate any output, or ask any questions, are modified so that they act only when the shell is an interactive shell.

  4. The Oracle Universal Installer is a GUI interface and requires the use of an X Server. From the terminal session enabled for user equivalence (the node you will be performing the Oracle installations from), set the environment variable DISPLAY to a valid X Windows display:

    Bourne, Korn, and Bash shells:

    $ DISPLAY=<Any X-Windows Host>:0
    $ export DISPLAY
    C shell:
    $ setenv DISPLAY <Any X-Windows Host>:0
    After setting the DISPLAY variable to a valid X Windows display, you should perform another test of the current terminal session to ensure that X11 forwarding is not enabled:
    $ ssh linux1 hostname
    linux1
    
    $ ssh linux2 hostname
    linux2
    Note: If you are using a remote client to connect to the node performing the installation, and you see a message similar to: "Warning: No xauth data; using fake authentication data for X11 forwarding." then this means that your authorized keys file is configured correctly; however, your SSH configuration has X11 forwarding enabled. For example:
    $ export DISPLAY=melody:0
    $ ssh linux2 hostname
    Warning: No xauth data; using fake authentication data for X11 forwarding.
    linux2
    Note that having X11 Forwarding enabled will cause the Oracle installation to fail. To correct this problem, create a user-level SSH client configuration file for the "oracle" UNIX user account that disables X11 Forwarding:

    • Using a text editor, edit or create the file ~/.ssh/config
    • Make sure that the ForwardX11 attribute is set to no. For example, insert the following into the ~/.ssh/config file:
      Host *
      ForwardX11 no

  5. You must run the Oracle Universal Installer from this terminal session or remember to repeat the steps to enable user equivalence (steps 2, 3, and 4 from this section) before you start the Oracle Universal Installer from a different terminal session.

Remove any stty Commands
When installing the Oracle software, any hidden files on the system (i.e. .bashrc, .cshrc, .profile) will cause the installation process to fail if they contain stty commands.

To avoid this problem, you must modify these files to suppress all output on STDERR as in the following examples:

  • Bourne, Bash, or Korn shell:
    if [ -t 0 ]; then
    stty intr ^C
    fi

  • C shell:
    test -t 0
    if ($status == 0) then
    stty intr ^C
    endif

Note: If there are hidden files that contain stty commands that are loaded by the remote shell, then OUI indicates an error and stops the installation.

Using the Remote Shell Method
The services provided by remote shell are disabled by default on most Linux systems. This section describes the tasks required for enabling and configuring user equivalence for use by the Oracle Universal Installer when commands should be run and files copied to the remote nodes in the cluster using the remote shell tools. The goal is to enable the Oracle Universal Installer to use rsh and rcp to run commands and copy files to a remote node without being prompted for a password. Please note that using the remote shell method for configuring user equivalence is not secure.

The rsh daemon validates users using the /etc/hosts.equiv file or the .rhosts file found in the user's (oracle's) home directory.

First, let's make sure that we have the rsh RPMs installed on both Oracle RAC nodes in the cluster:

# rpm -q rsh rsh-server
rsh-0.17-25.4
rsh-server-0.17-25.4
From the above, we can see that we have the rsh and rsh-server installed. Were rsh not installed, we would run the following command from the CD where the RPM is located:
# su -
# rpm -ivh rsh-0.17-25.4.i386.rpm rsh-server-0.17-25.4.i386.rpm

To enable the "rsh" and "rlogin" services, the "disable" attribute in the /etc/xinetd.d/rsh file must be set to "no" and xinetd must be reloaded. This can be done by running the following commands on all nodes in the cluster:

# su -
# chkconfig rsh on # chkconfig rlogin on # service xinetd reload Reloading configuration: [ OK ]
To allow the "oracle" UNIX user account to be trusted among the RAC nodes, create the /etc/hosts.equiv file on all nodes in the cluster:
# su -
# touch /etc/hosts.equiv
# chmod 600 /etc/hosts.equiv
# chown root.root /etc/hosts.equiv
Now add all RAC nodes to the /etc/hosts.equiv file similar to the following example for both Oracle RAC nodes in the cluster:
# cat /etc/hosts.equiv
+linux1 oracle
+linux2 oracle
+linux1-priv oracle
+linux2-priv oracle
Note: In the above example, the second field permits only the oracle user account to run rsh commands on the specified nodes. For security reasons, the /etc/hosts.equiv file should be owned by root and the permissions should be set to 600. In fact, some systems will only honor the content of this file if the owner of this file is root and the permissions are set to 600.

Before attempting to test your rsh command, ensure that you are using the correct version of rsh. By default, Red Hat Linux puts /usr/kerberos/sbin at the head of the $PATH variable. This will cause the Kerberos version of rsh to be executed.

I will typically rename the Kerberos version of rsh so that the normal rsh command will be used. Use the following:

# su -
# which rsh
/usr/kerberos/bin/rsh
# mv /usr/kerberos/bin/rsh /usr/kerberos/bin/rsh.original
# mv /usr/kerberos/bin/rcp /usr/kerberos/bin/rcp.original
# mv /usr/kerberos/bin/rlogin /usr/kerberos/bin/rlogin.original
# which rsh
/usr/bin/rsh

You should now test your connections and run the rsh command from the node that will be performing the Oracle Clusterware and 10g RAC installation. I will be using the node linux1 to perform all installs so this is where I will run the following commands from:

# su - oracle

$ rsh linux1 ls -l /etc/hosts.equiv
-rw-------  1 root root 70 Jun 25 18:29 /etc/hosts.equiv

$ rsh linux1-priv ls -l /etc/hosts.equiv
-rw-------  1 root root 70 Jun 25 18:29 /etc/hosts.equiv

$ rsh linux2 ls -l /etc/hosts.equiv
-rw-------  1 root root 70 Jun 25 18:29 /etc/hosts.equiv

$ rsh linux2-priv ls -l /etc/hosts.equiv
-rw-------  1 root root 70 Jun 25 18:29 /etc/hosts.equiv

Unlike when using secure shell, no other actions or commands are needed to enable user equivalence using the remote shell. User equivalence will be enabled for the "oracle" UNIX user account after successfully logging in to a terminal session.

 


15. All Startup Commands for Both Oracle RAC Nodes

Verify that the following startup commands are included on both of the Oracle RAC nodes in the cluster!

Up to this point, we have talked in great detail about the parameters and resources that need to be configured on both nodes in the Oracle RAC 10g configuration. This section will take a deep breath and recap those parameters, commands, and entries (in previous sections of this document) that need to happen on both Oracle RAC nodes when the machine is booted.

For each of the startup files below, entries in gray should be included in each startup file.

/etc/modprobe.conf

(All parameters and values to be used by kernel modules.)

alias eth0 r8169
alias eth1 e1000
alias scsi_hostadapter ata_piix
alias snd-card-0 snd-intel8x0
options snd-card-0 index=0
install snd-intel8x0 /sbin/modprobe --ignore-install snd-intel8x0 && /usr/sbin/alsactl restore >/dev/null 2>&1 || :
remove snd-intel8x0 { /usr/sbin/alsactl store >/dev/null 2>&1 || : ; }; /sbin/modprobe -r --ignore-remove snd-intel8x0
alias usb-controller ehci-hcd
alias usb-controller1 uhci-hcd
options hangcheck-timer hangcheck_tick=30 hangcheck_margin=180
      

/etc/sysctl.conf

(We wanted to adjust the default and maximum send buffer size as well as the default and maximum receive buffer size for the interconnect. This file also contains those parameters responsible for configuring shared memory, semaphores, file handles, and local IP range for use by the Oracle instance.)

# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled.  See sysctl(8) and
# sysctl.conf(5) for more details.

# Controls IP packet forwarding
net.ipv4.ip_forward = 0

# Controls source route verification
net.ipv4.conf.default.rp_filter = 1

# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0

# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1

# +---------------------------------------------------------+
# | ADJUSTING NETWORK SETTINGS                              |
# +---------------------------------------------------------+
# | With Oracle 9.2.0.1 and onwards, Oracle now makes use   |
# | of UDP as the default protocol on Linux for             |
# | inter-process communication (IPC), such as Cache Fusion |
# | and Cluster Manager buffer transfers between instances  |
# | within the RAC cluster. Oracle strongly suggests to     |
# | adjust the default and maximum receive buffer size      |
# | (SO_RCVBUF socket option) to 256 KB, and the default    |
# | and maximum send buffer size (SO_SNDBUF socket option)  |
# | to 256 KB. The receive buffers are used by TCP and UDP  |
# | to hold received data until it is read by the           |
# | application. The receive buffer cannot overflow because |
# | the peer is not allowed to send data beyond the buffer  |
# | size window. This means that datagrams will be          |
# | discarded if they don't fit in the socket receive       |
# | buffer. This could cause the sender to overwhelm the    |
# | receiver.                                               |
# +---------------------------------------------------------+

# +---------------------------------------------------------+
# | Default setting in bytes of the socket "receive" buffer |
# | which may be set by using the SO_RCVBUF socket option.  |
# +---------------------------------------------------------+
net.core.rmem_default=262144

# +---------------------------------------------------------+
# | Maximum setting in bytes of the socket "receive" buffer |
# | which may be set by using the SO_RCVBUF socket option.  |
# +---------------------------------------------------------+
net.core.rmem_max=262144

# +---------------------------------------------------------+
# | Default setting in bytes of the socket "send" buffer    |
# | which may be set by using the SO_SNDBUF socket option.  |
# +---------------------------------------------------------+
net.core.wmem_default=262144

# +---------------------------------------------------------+
# | Maximum setting in bytes of the socket "send" buffer    |
# | which may be set by using the SO_SNDBUF socket option.  |
# +---------------------------------------------------------+
net.core.wmem_max=262144

# +---------------------------------------------------------+
# | ADJUSTING ADDITIONAL KERNEL PARAMETERS FOR ORACLE       |
# +---------------------------------------------------------+
# | Configure the kernel parameters for all Oracle Linux    |
# | servers by setting shared memory and semaphores,        |
# | setting the maximum amount of file handles, and setting |
# | the IP local port range.                                |
# +---------------------------------------------------------+

# +---------------------------------------------------------+
# | SHARED MEMORY                                           |
# +---------------------------------------------------------+
kernel.shmmax=2147483648

# +---------------------------------------------------------+
# | SEMAPHORES                                              |
# | ----------                                              |
# |                                                         |
# | SEMMSL_value  SEMMNS_value  SEMOPM_value  SEMMNI_value  |
# |                                                         |
# +---------------------------------------------------------+
kernel.sem=250 32000 100 128

# +---------------------------------------------------------+
# | FILE HANDLES                                            |
# ----------------------------------------------------------+
fs.file-max=65536

# +---------------------------------------------------------+
# | LOCAL IP RANGE                                          |
# ----------------------------------------------------------+
net.ipv4.ip_local_port_range=1024 65000
Note: Verify that each of the required kernel parameters (above) are configured in the /etc/sysctl.conf file. Then, ensure that each of these parameters are truly in effect by running the following command on both Oracle RAC nodes in the cluster:
# sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.core.rmem_default = 262144
net.core.rmem_max = 262144
net.core.wmem_default = 262144
net.core.wmem_max = 262144
kernel.shmmax = 2147483648
kernel.sem = 250 32000 100 128
fs.file-max = 65536
net.ipv4.ip_local_port_range = 1024 65000

/etc/hosts

(All machine/IP entries for nodes in our RAC cluster.)

# Do not remove the following line, or various programs
# that require network functionality will fail.

127.0.0.1        localhost.localdomain   localhost

# Public Network - (eth0)
192.168.1.100    linux1
192.168.1.101    linux2

# Private Interconnect - (eth1)
192.168.2.100    linux1-priv
192.168.2.101    linux2-priv

# Public Virtual IP (VIP) addresses - (eth0)
192.168.1.200    linux1-vip
192.168.1.201    linux2-vip

# Private Storage Network for Openfiler - (eth1)
192.168.1.195    openfiler1
192.168.2.195    openfiler1-priv

192.168.1.106    melody
192.168.1.102    alex
192.168.1.105    bartman
192.168.1.120    cartman

/etc/hosts.equiv

(The /etc/hosts.equiv file is only required when using the remote shell method to establish remote access and user equivalency. Allow logins to both Oracle RAC nodes as the oracle user account without the need for a password when using the remote shell method for enabling user equivalency.)

+linux1 oracle
+linux2 oracle
+linux1-priv oracle
+linux2-priv oracle

/etc/rc.local

(Loading the hangcheck-timer kernel module.)

#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.

touch /var/lock/subsys/local

# +---------------------------------------------------------+
# | HANGCHECK TIMER                                         |
# | (I do not believe this is required, but doesn't hurt)   |
# +---------------------------------------------------------+

/sbin/modprobe hangcheck-timer
      

 


16. Install & Configure Oracle Cluster File System (OCFS2)

Most of the configuration procedures in this section should be performed on both Oracle RAC nodes in the cluster! Creating the OCFS2 filesystem, however, should only be executed on one of nodes in the RAC cluster.

It is now time to configure the Oracle Cluster File System, Release 2 (OCFS2). OCFS2, developed by Oracle Corporation, is a Cluster File System which allows all nodes in a cluster to concurrently access a device via the standard file system interface. This allows for easy management of applications that need to run across a cluster.

OCFS (Release 1) was released in December 2002 to enable Oracle Real Application Cluster (RAC) users to run the clustered database without having to deal with RAW devices. The file system was designed to store database related files, such as data files, control files, redo logs, archive logs, etc. OCFS2 is the next generation of the Oracle Cluster File System. It has been designed to be a general purpose cluster file system. With it, one can store not only database related files on a shared disk, but also store Oracle binaries and configuration files (shared Oracle Home) making management of RAC even easier.

In this guide, you will be using the release of OCFS2 included with Enterprise Linux Release 4 Update 5 (OCFS2 Release 1.2.5-1) to store the two files that are required to be shared by the Oracle Clusterware software. Along with these two files, you will also be using this space to store the shared SPFILE for all Oracle RAC ASM instances.

See this page for more information on OCFS2 (including Installation Notes) for Linux.

Install OCFS2

In previous editions of this article, this would be the time where you would need to download the OCFS2 software from http://oss.oracle.com/. The OCFS2 software includes the following packages:

  • OCFS2 Kernel Driver
    • ocfs2-x.x.x-xx.EL-x.x.x-x.i686.rpm - (for single processor)
    • ocfs2-x.x.x-xx.ELsmp-x.x.x-x.i686.rpm - (for multiple processors)
    • ocfs2-x.x.x-xx.ELhugemem-x.x.x-x.i686.rpm - (for hugemem)
  • OCFS2 tools
    • ocfs2-tools-x.x.x-x.i386.rpm
  • OCFS2 console
    • ocfs2console-x.x.x-x.i386.rpm

This, however, is no longer necessary since the OCFS2 software is included with Enterprise Linux. If you followed the instructions I used for installing Enterprise Linux, you would have installed Everything, in which case you will have all of the required RPM packages for OCFS2. If you performed another installation type (i.e. "Advanced Server), you may be missing the OCFS2 packages and will need to install them. All of the required RPMs for OCFS2 are included on Disk 3 of Enterprise Linux. To determine if OCFS2 is installed on your system, run the following from both nodes in the Oracle RAC cluster:

# rpm -qa | grep ocfs2 | sort
ocfs2-2.6.9-55.0.0.0.2.EL-1.2.5-1
ocfs2-2.6.9-55.0.0.0.2.ELhugemem-1.2.5-1
ocfs2-2.6.9-55.0.0.0.2.ELsmp-1.2.5-1
ocfs2console-1.2.4-1
ocfs2-tools-1.2.4-1
ocfs2-tools-devel-1.2.4-1

Note that the above listing includes the OCFS2 kernel driver for all three architecture types — single processor, hugemem, and multiple processors. By default, Enterprise Linux installs with the hugemem kernel which means that only ocfs2-2.6.9-55.0.0.0.2.ELhugemem-1.2.5-1 would be required. Having the other two OCFS2 kernel drivers installed, however, does not hurt the configuration.

If you are missing the OCFS2 packages and need to install them, load Disk 3 of Enterprise Linux and run the following as the root user account. Make sure to perform this on both Oracle RAC nodes in the cluster.

$ su -
# mount -r /media/cdrom
# cd /media/cdrom/Enterprise/RPMS

# rpm -Uvh ocfs2*
warning: ocfs2-2.6.9-55.0.0.0.2.EL-1.2.5-1.i686.rpm: V3 DSA signature: NOKEY, key ID b38a8516
Preparing...                ########################################### [100%]
   1:ocfs2-tools            ########################################### [ 20%]
   2:ocfs2-2.6.9-55.0.0.0.2.########################################### [ 40%]
   3:ocfs2-2.6.9-55.0.0.0.2.########################################### [ 60%]
   4:ocfs2-2.6.9-55.0.0.0.2.########################################### [ 80%]
   5:ocfs2console           ########################################### [100%]

# rpm -qa | grep ocfs2 | sort
ocfs2-2.6.9-55.0.0.0.2.EL-1.2.5-1
ocfs2-2.6.9-55.0.0.0.2.ELhugemem-1.2.5-1
ocfs2-2.6.9-55.0.0.0.2.ELsmp-1.2.5-1
ocfs2console-1.2.4-1
ocfs2-tools-1.2.4-1

Disable SELinux (RHEL4 U2 and higher)

Users of RHEL4 U2 and higher (Enterprise Linux 4.5 is based on RHEL4 U5) are advised that OCFS2 currently does not work with SELinux enabled. If you are using RHEL4 U2 or higher (which includes us since we are using Enterprise Linux 4.5) you will need to verify SELinux is disabled in order to get the O2CB service to execute.

During the installation of Enterprise Linux, we Disabled SELinux on the Firewall screen. If however you did not disable SELinux during the installation phase, you can use the tool system-config-securitylevel to disable SELinux.

To disable SELinux (or verify SELinux is disabled), run the "Security Level Configuration" GUI utility:

# /usr/bin/system-config-securitylevel &

This will bring up the following screen:


Figure 13 Security Level Configuration Opening Screen

Now, click the SELinux tab and check off the "Enabled" checkbox. After clicking on [OK], you will be presented with a warning dialog. Simply acknowledge this warning by clicking "Yes". Your screen should now look like the following after disabling the SELinux option:


Figure 14 SELinux Disabled

If you needed to disable SELinux in this section on any of the nodes, those nodes will need to be rebooted to implement the change. SELinux must be disabled before you can continue with configuring OCFS2!

# init 6

Configure OCFS2

The next step is to generate and configure the /etc/ocfs2/cluster.conf file on both Oracle RAC nodes in the cluster. The easiest way to accomplish this is to run the GUI tool ocfs2console. In this section, we will not only create and configure the /etc/ocfs2/cluster.conf file using ocfs2console, but will also create and start the cluster stack O2CB. When the /etc/ocfs2/cluster.conf file is not present, (as will be the case in our example), the ocfs2console tool will create this file along with a new cluster stack service (O2CB) with a default cluster name of ocfs2. This will need to be done on both Oracle RAC nodes in the cluster as the root user account:

$ su -
# ocfs2console &
This will bring up the GUI as shown below:


Figure 15 ocfs2console GUI

Using the ocfs2console GUI tool, perform the following steps:

  1. Select [Cluster] -> [Configure Nodes...]. This will start the OCFS2 Cluster Stack (Figure 16) and bring up the "Node Configuration" dialog.
  2. On the "Node Configuration" dialog, click the [Add] button.
    • This will bring up the "Add Node" dialog.
    • In the "Add Node" dialog, enter the Host name and IP address for the first node in the cluster. Leave the IP Port set to its default value of 7777. In my example, I added both nodes using linux1 / 192.168.1.100 for the first node and linux2 / 192.168.1.101 for the second node.
    • Click [Apply] on the "Node Configuration" dialog - All nodes should now be "Active" as shown in Figure 17.
    • Click [Close] on the "Node Configuration" dialog.
  3. After verifying all values are correct, exit the application using [File] -> [Quit]. This needs to be performed on both Oracle RAC nodes in the cluster.


Figure 16. Starting the OCFS2 Cluster Stack

The following dialog show the OCFS2 settings I used for the node linux1 and linux2:


Figure 17 Configuring Nodes for OCFS2

Note: See the Troubleshooting section if you get the error:

o2cb_ctl: Unable to access cluster service while creating node

After exiting the ocfs2console, you will have a /et