I. INTRODUCTION

Oracle Cluster Health Monitor - OS Tool (CHM/OS)
--------------------------------------------------------------------------------

CHM/OS is designed to detect and analyze operating system (OS) and cluster
resources related degradation and failures in order to improve diagnosibility
of Oracle Clusterware and Oracle RAC issues such as a node eviction.
It continuously tracks the OS resource use at node, process, and
device level. It collects and analyzes cluster-wide data. When run in real-time
mode, an alert is shown to the operator when built in thresholds are exceeded. 
For root cause analysis, historical data can be replayed to examine what was
occuring at the time of failure.


II. LINUX INSTALLATION

NOTE: If CHM/OS v1.x is already installed on your cluster, you will have
      to remove the existing version before installing the new one.

The installation requires setting a CRFPERLBIN environment variable to the
location of a perl version greater than or equal to 5.6.0. 
Bash Example: export CRFPERLBIN=/usr/bin
 
CHM/OS requires a Linux kernel version greater than or equal
to 2.6.9 and x86 architecture. It will also work on x86_64 if the kernel is
configured to run 32-bit binaries.

The CHM/OS v1.x standalone installation can only be performed on systems which does
not have v11.2+ integrated version of CHM/OS that is part of the Oracle Grid
Infrastructure installation beginning in version 11.2.0.2.


NOTE: References to the install home in this README assume /usr/lib/oracrf
      for the standalone install.

The following are the steps for CHM/OS v1 standalone install. 

1. Create user '<username>:<group>' (e.g. crfuser:oinstall) on all 
   the nodes where CHM/OS is being installed. Make sure the user name's home is
   the same on all nodes. Example command while logged in as root:

       # useradd -d /opt/crfuser -s /bin/sh -g oinstall crfuser 

2. Setup passwordless ssh for each user created in step 1 between
   each pair of nodes. Confirm from each node that this user can ssh to all 
   other nodes nodes including the local node using only the hostname (without
   domain). No password and or additional user intervention such as 
   acknowledging prompts should be required. 
   
   This can be done by generating the key at one node for '<username>' and then
   adding the public key to ~/.ssh/authorized_keys and copying the whole ~/.ssh 
   directory to all nodes under home dir of the '<username>'.
   Now, confirm by ssh'ing into all nodes from all nodes.

   For ssh-key reference see http://fedoranews.org/dowen/sshkeys/

3. If there exists a previous installation of CHM/OS v1 (or IPD/OS), delete it 
   from all nodes prior to installation.
   
   Perform the follwing steps: 
    
     a. Log in as root and diable the daemons start script:

       # /etc/init.d/init.crfd disable
   
     b. Run the uninstall script:

       # $CRFPERLBIN /usr/lib/oracrf/install/crfinst.pl -d

     c. Delete all BDB databases from all nodes.

     d. Manually delete the install home if it still exists.

4. Login as '<username>' and unzip crfpack.zip into a new subdirectory of 
   the home directory. This subdirectory (Ex: /tmp/crfinstall) must be easily
   recreatable by ssh on all nodes. This subdirectory will be deleted
   when the installation is completed.

5. Run the ./install/crfinst.pl script on a node with desired node list, specified as 
   comma separated list, for cluster-wide install. 
   Example: 

    # $CRFPERLBIN crfinst.pl -i node1,node2,node3 -b /opt/oracrfdb -m node1 

6. Once Step 5 completes, you will be prompted to run crfinst.pl script
   again with -f and optionally -b <bdb location> on each node while logged in as
   root to finalize the install on that node.

7. Once the finalize operation is complete, run  the following while logged in 
   as root to enable the CHM/OS on all nodes.
   Example:

    # /etc/init.d/init.crfd enable 

DO NOT bypass any of above steps or try other ways to install because the
daemons will not work correctly. 

Also remember that only root can de-install. DO NOT attempt re-install
without doing de-install with '$CRFPERLBIN crfinst.pl -d' option.

All node names in the nodelist should be specified without the domain name to
maintain uniformity and ease of use. Entering node names with domain names 
may lead to a failed installation. 

Usage: crfinst.pl -a [<nodelist>]
                  -c [<nodelist>]
                  -d
                  -f [-b <bdb loc>]
                  -h
                  -i <nodelist> -b <bdb loc> [-m <master>]
                  -N ClusterName

       -a : Add nodes for OS resource metrics collection from a configured
            node. The proper mechanism for adding a node is to stop CHM/OS
	        on all the nodes, add a new node and then restart on all nodes.

       -b : Specify the path <bdb loc> where a Berkeley DB can be created to 
            store OS metrics. This location MUST be outside of the location
            where the ZIP file was unpacked because all of the directories
            under that location which were created by unzip will be removed.
            BDB files can be kept as it is for later usage. The location should
	        be a path on a volume with at least 2GB per node space available
            and only writable by the root user. The BDB should not be on root
            filesystem. Its location is required to be same on all nodes.
            If that can not be done, please specify a different location
            during finalize (-f) operation on each node, following the above
            size requirements. The path MUST not be on shared disk. If a shared
            BDB path is provided to multiple hosts, BDB corruption will occur.

       -c : Perform the user equivalence and other pre-install checks for the
            nodes. The installation steps will be prompted for but the actual
            installation will not occur.

       -d : De-install software on the node where the script is run (Only 
            root user may perform this operation).

       -f : Finalize the install on a node (Only root may perform this 
            operation).

       -h : Show this help

       -i : Install software on nodes listed in the <nodelist>.
       
       -N : Specify the name of the cluster

       <bdb loc>        : Path for Berkeley DB to store OS metrics.
       <master>         : Name of the node for master logger daemon.
       <nodelist>       : Comma separated list of nodes to install CHM/OS.


III. THE DAEMONS

CHM/OS consists of three daemons: ologgerd, oproxyd and osysmond.

There is one ologgerd master daemon on only one node in the installed set of
nodes and there is one osysmond on every node. If there is more than 1 node in
the installed set of nodes, an additional node is chosen as the standby for the
master ologgerd. If master daemon terminates because it is not able to restart
after a fixed number of retries or the node where master was running is
down, the standby becomes the master and selects a new standby. The master manages
the OS metric repository in a Berkeley DB and interacts with the standby to manage
a replica of the master OS metrics database on the standby node.

The osysmond daemon is the monitoring and OS metric collection daemon that sends
the data to ologgerd for storing in the master and standby repositories. 

The oproxyd daemon is the proxy that handles connections on the public network.
If CHM/OS is configured with private node names, only orpoxyd is listening
on the public network for external clients such as oclumon and chmosg.
It runs on all the nodes and is highly avaliable.

NOTE: If the oclumon or chmosg clients are run remotely, all data is sent in
      clear text including process names. Additionally clients do not 
      authenicate to ologgerd. Therefore, if security of this data is a concern
      it is recommended to run clients from a local node.

Copyright 2008, 2012 Oracle and/or its affiliates. All rights reserved.