Oracle Database 11g: by Arup Nanda
Learn how the Health Monitor monitors the health of some database components automatically and then records its findings in Automatic Diagnostic Repository, how to create a package to be sent to Oracle Support in case of problems, and how to read the new version of the alert log, listener log, and other logs.
How do you know if your database is humming along smoothly? Well, one way is to check "everything"—a rather time-consuming and error-prone process. In some shops dedicated DBAs perform the same task over and over to assess and report the health of the database, but most can't afford to hire full time staff for this effort. The alternative is to have the regular DBA staff perform health checks, but the result is usually not very encouraging. Dividing one's attention across too many things can lead to missing something potentially dangerous.
In Oracle Database 11g the effort has become somewhat simpler with the introduction of the Automatic Health Monitor. Similar to the Advisors introduced in Oracle Database 10g, the Automatic Health Monitor "checkers" monitor (automatically after a failure or on demand) various components, such as the datafiles and dictionary, to make sure they are not corrupted physically or logically. When the checkers find something, the information is reported and then may be fed to various recovery advisors. At a minimum, the new Incident Packaging Service (described later) allows you to make a bundle of all issues and supporting files for easier reporting to Oracle Support.
Like many other features of Oracle Database 11g, this process can be managed either from the command line or via the Oracle Enterprise Manager GUI. Here you'll see how it's done with the latter.
On the main Database page, scroll all the way down to the section marked Related Links as shown below.
First, let's focus on the multiple checkers available.
DB Structure Integrity Check. This checker is best explained through an example. First click on DB Structure Integrity Checks, which brings up a small screen where you can name the run as "DB_Struct_Int1", as an example. The other input you make here is to time-limit the run, which you may ignore to effectively state that there is no limit.
After the health check runs successfully, the confirmation comes up as shown in the top portion of the screen, shown below:
In this case, a datafile was somehow corrupted and this checker will identify that. The information is then fed to the Data Recovery Advisor (discussed in the installment on RMAN) to take appropriate action. You can invoke this checker any time to check the datafile's integrity.
Your most popular checker will most likely be this one. On the screen that shows the past runs, choose any run of that type and click on the Details button, and you will see the screen as shown below:
Data Block Integrity Checker. Data Block Integrity Checker is similar to DB Structure Integrity Check but checks only specific blocks rather than the entire file. As previously, you give it a name and other relevant details. Here is what the screen looks like:
Redo Integrity Check. This checker scans the contents of the redo and archive logs for accessibility and corruption.
Undo Segment Integrity Check. This check finds logical undo corruptions, which are sometimes identified during rollback operations. After locating an undo corruption, this check uses PMON and SMON to try to recover the corrupted transaction. If this recovery fails, then Automatic Health Monitor stores information about the corruption in V$CORRUPT_XID_LIST. Most undo corruptions can be resolved by forcing a commit.
Transaction Integrity Check. Transaction Integrity Check is almost identical to Undo Segment Check except that it checks only one specific transaction, which is passed to the check as an input parameter. After locating an undo corruption, this check uses PMON and SMON to try to recover the corrupted transaction. If this recovery fails, then Automatic Health Monitor stores information about the corruption in V$CORRUPT_XID_LIST. Most undo corruptions can be resolved by forcing a commit.
Dictionary Integrity Check. This check examines the integrity of core dictionary objects, such as tab$ and col$. It verifies the contents of dictionary entries for each dictionary object, that logical constraints on rows in the dictionary are enforced, and that parent-child relationships between dictionary objects are enforced.
begin /*Clear the Log Group*/ execute immediate 'ALTER DATABASE CLEAR LOGFILE GROUP 3'; end;
From the Enterprise Manager main Database page, click on the tab named Software and Support and then click on Support Workbench, which brings up a screen similar to the one shown below.
You can combine the incidents into one "envelope" to be sent to Oracle Support and that's what you should do next: package them. To do this, click inside the check box under Select and click on the button labeled Package. This brings up a screen similar to one shown below:
During Oracle Database 11g installation, you were asked if you wanted to install and configure that tool. If you answered "yes", then you have already done it, but if you answered "no", you can configure it now.
Go to Oracle Home and cd to ccr/bin. Then run the file setupCCR to set up Configuration Manager for the very first time. It displays a license message, and in the end you will be asked to accept or reject it. Later you will be asked some questions such as CSI#, MetaLink userid, and so on. The following text shows the interactive session. The user input is shown in bold.
*** Do you accept this license agreement? (Y/N) [Y]:Once Configuration Manager is set up, you can change the parameters using the script configCCR found in the same directory.
Y Configuration requires the following piece(s) of information. Customer Support Identifier (CSI):
XXXXXXX Oracle MetaLink User Name:
firstname.lastname@example.org The two character country code:
US ** Installing base package ** Deploying core - Version 10.2.6.0.0 ** Registering installation with Oracle Configuration Manager server(s) ** Deploying engines - Version 10.2.2.0.3 Deploying metricdata - Version 10.2.4.0.2 Deploying scripts - Version 10.2.6.0.0 ** Getting package updates from ContentServer ** ** Starting the Oracle Configuration Manager Scheduler ** Oracle Configuration Manager - Release: 10.2.6.0.0 - Production Copyright (c) 2005, 2007, Oracle. All rights reserved. ------------------------------------------------------------------ Starting Oracle Configuration Manager... Waiting for status from Oracle Configuration Manager.... Start Date 16-Oct-2007 09:28:46 Last Collection Time - Next Collection Time 17-Oct-2007 09:28:00 Collection Frequency Daily at 09:28 Collection Status scheduled collection running Log Directory /home/oracle/app/oracle/product/11.1/db_1/ccr/log Registered At 16-Oct-2007 09:28:17 Automatic Update On Collector Mode Connected Oracle Configuration Manager successfully started.
The ADR houses logs and traces of all components—ASM, CRS, listener, and so on—in addition to those of the database itself. This makes it convenient for you to look for a specific log at a single location.
Inside the ADR Base, there could be several ADR Homes, one for each component and instance. For example, if the server has two Oracle instances, there will be two ADR Homes. Here is the directory structure of the ADR Homes for the database instances.
<Directory mentioned in the DIAGNOSTIC_DEST parameter>
→<Name of the Database>
→<Name of the Instance>
The alert log in XML format is stored here.
Core dumps are stored here, the equivalent of the core_dump_dest in earlier versions.
The Health Monitor runs checks on many components, and it stores some files here.
All incidents dumps are stored here.
→<all incident directories exist here>
Each incident is stored in a different directory, which are all stored here.
When you package incidents (learn about packaging in this article), certain supporting files are stored here.
Metadata about problems, incidents, packages and so on is kept here.
User traces and background traces are kept here, along with the text version of the alert log.
$ ls alert cdump hm incident incpkg ir lck metadata stage sweep trace
The ADR directory structure for other components is similar. For instance, for ASM instance, the directory under "diag" is named asm, instead of rdbms. The rest of the directory structure remains the same. The name of the target in case of asm is +asm. For instance, here is how my ADR Home for ASM looks:
$ pwd /home/oracle/diag/asm/+asm/+ASM $ ls alert cdump hm incident incpkg ir lck metadata stage sweep trace
<Directory mentioned in the DIAGNOSTIC_DEST parameter>
→ <hostname of the server>
→ <name of the listener>
→ trace ...
For instance, for a host named oradba3, and a listener named "listener" (the default name), the directory will look like /home/oracle/diag/tnslsnr/oradba3/listener. Under this directory all the others (alert, trace, metadata, and so on) are created. Like the alert log, the listener log file is also stored as XML entries, under the subdirectory alert. The usual text listener log file is still produced, under the directory trace.
A new view V$DIAG_INFO shows all the details about the ADR Homes. In my RDBMS home, it appears like this:
SQL> select * from v$diag_info; INST_ID NAME VALUE -------- ------------------------------ ----------------------------------------------------------------- 1 Diag Enabled TRUE 1 ADR Base /home/oracle 1 ADR Home /home/oracle/diag/rdbms/odel11/ODEL11 1 Diag Trace /home/oracle/diag/rdbms/odel11/ODEL11/trace 1 Diag Alert /home/oracle/diag/rdbms/odel11/ODEL11/alert 1 Diag Incident /home/oracle/diag/rdbms/odel11/ODEL11/incident 1 Diag Cdump /home/oracle/diag/rdbms/odel11/ODEL11/cdump 1 Health Monitor /home/oracle/diag/rdbms/odel11/ODEL11/hm 1 Default Trace File /home/oracle/diag/rdbms/odel11/ODEL11/trace/ODEL11_ora_3908.trc 1 Active Problem Count 3 1 Active Incident Count 37 11 rows selected.
You can access the files and perform other operations on the ADR in two ways. The easiest way is via Enterprise Manager as you saw earlier. The other option is to use a command line tool called asrci. Let's see how you can use the tool. From the UNIX (or Windows) command prompt, type "adrci":
$ adrci ADRCI: Release 184.108.40.206.0 - Beta on Sun Sep 23 23:22:24 2007 Copyright (c) 1982, 2007, Oracle. All rights reserved. ADR base = ”/home/oracle”
adrci> show homes ADR Homes: diag/rdbms/odel11/ODEL11 diag/rdbms/dbeng1/DBENG1 diag/clients/user_unknown/host_411310321_11 diag/tnslsnr/oradba3/listener
adrci> set homepath diag/rdbms/odel11/ODEL11
adrci> help HELP [topic] Available Topics: CREATE REPORT ECHO EXIT HELP HOST IPS ...If you want to know more about a specific command, issue help <command>. For instance, if you want to get help on the usage of show incident commands, you will issue:
adrci> help show incident Usage: SHOW INCIDENT [-p <predicate_string>] [-mode BASIC|BRIEF|DETAIL] [-last <num> | -all] [-orderby (field1, field2, ...) [ASC|DSC]] Purpose: Show the incident information. By default, this command will only show the last 50 incidents which are not flood controlled. Options: [-p <predicate_string>]: The predicate string must be double-quoted. [-mode BASIC|BRIEF|DETAIL]: The different modes of showing incidents. [... and so on ...]
From the output you know the usage. Now to know how many incidents have been recorded, you can issue:
adrci> show incident -mode basic
ADR Home = /home/oracle/diag/rdbms/odel11/ODEL11:
INCIDENT_ID PROBLEM_KEY CREATE_TIME
-------------------- - --------------------------------------------------- ----------------------------------------
14556 ORA 600 [KSSRMP1] 2007-10-17 04:01:57.725620 -04:00
14555 ORA 600 [KSSRMP1] 2007-10-16 18:45:03.970884 -04:00
14435 ORA 603 2007-10-16 06:06:46.705430 -04:00
14427 ORA 603 2007-10-16 06:06:42.007937 -04:00
14419 ORA 603 2007-10-16 06:06:30.069050 -04:00
6001 ORA 4031 2007-08-28 14:50:01.355783 -04:00
5169 ORA 4031 2007-09-04 19:09:36.310123 -04:00
5121 ORA 4031 2007-09-03 14:40:14.575457 -04:00
5017 ORA 4031 2007-09-04 19:09:30.969226 -04:00
4993 ORA 4031 2007-09-04 19:09:33.179857 -04:00
4945 ORA 4031 2007-09-04 19:09:30.955524 -04:00
4913 ORA 4031 2007-09-04 19:09:31.641990 -04:00
adrci> show incident -mode detail -p "incident_id=14556"
ADR Home = /home/oracle/diag/rdbms/odel11/ODEL11:
INCIDENT INFO RECORD 1
CREATE_TIME 2007-10-17 04:01:57.725620 -04:00
[... and so on ...]
1 rows fetched
$ pwd /home/oracle/diag/rdbms/odel11/ODEL11/alert $ ls -ltr total 60136 -rw-r----- 1 oracle oinstall 10485977 Sep 13 17:44 log_1.xml -rw-r----- 1 oracle oinstall 10486008 Oct 16 06:35 log_2.xml -rw-r----- 1 oracle oinstall 10485901 Oct 16 07:27 log_3.xml -rw-r----- 1 oracle oinstall 10485866 Oct 16 08:12 log_4.xml -rw-r----- 1 oracle oinstall 10486010 Oct 17 23:56 log_5.xml -rw-r----- 1 oracle oinstall 9028631 Oct 21 20:07 log.xml
The new alert log is accessed via the adrci utility: the ADR command line tool, which you learned about in the previous section. From the adrci tool, issue:
adrci> show alert Choose the alert log from the following homes to view: 1: diag/rdbms/odel11/ODEL11 2: diag/clients/user_oracle/host_1967384410_11 3: diag/clients/user_unknown/host_411310321_11 4: diag/tnslsnr/oradba3/listener Q: to quit Please select option:
adrci> set homepath diag/rdbms/odel11/ODEL11 adrci> show alert ADR Home = /home/oracle/diag/rdbms/odel11/ODEL11: [... and the whole alert log show up here ...]
adrci> show alert -tail 10 2007-09-23 19:57:44.502000 -04:00 Errors in file /home/oracle/diag/rdbms/odel11/ODEL11/trace/ODEL11_arc1_20810.trc: [... the rest of the 10 lines ...]
adrci> show alert -tail -f
C:\>type show_alert_10lines.cmd set homepath diag\rdbms\lapdb11\lapdb11 show alert -tail 10
adrci exec=”show homes; show catalog”
adrci>>describe alert_ext Name Type NULL? ----------------------------- --------------- ----------- ORIGINATING_TIMESTAMP timestamp NORMALIZED_TIMESTAMP timestamp ORGANIZATION_ID text(65) COMPONENT_ID text(65) HOST_ID text(65) HOST_ADDRESS text(17) MESSAGE_TYPE number MESSAGE_LEVEL number MESSAGE_ID text(65) MESSAGE_GROUP text(65) CLIENT_ID text(65) MODULE_ID text(65) PROCESS_ID text(33) THREAD_ID text(65) USER_ID text(65) INSTANCE_ID text(65) DETAILED_LOCATION text(161) UPSTREAM_COMP_ID text(101) DOWNSTREAM_COMP_ID text(101) EXECUTION_CONTEXT_ID text(101) EXECUTION_CONTEXT_SEQUENCE number ERROR_INSTANCE_ID number ERROR_INSTANCE_SEQUENCE number MESSAGE_TEXT text(2049) MESSAGE_ARGUMENTS text(129) SUPPLEMENTAL_ATTRIBUTES text(129) SUPPLEMENTAL_DETAILS text(129) PARTITION number RECORD_ID number FILENAME text(513) PROBLEM_KEY text(65)
adrci>> show alert -p "module_id='DBMS_SCHEDULER'"
adrci>>show alert -p "module_id != 'DBMS_SCHEDULER'"
adrci>>show alert -p "module_id like '%SCHEDULER'"
adrci>> spool a adrci>> show alert -tail 50 adrci>> spool off
adrci> show alert -p "message_text like '%STREAM%'"
adrci>> show tracefile
adrci>>show tracefile %reco% -rt 18-JUL-07 22:59:50 diag\rdbms\lapdb11\lapdb11\trace\lapdb11_reco_4604.trc 12-JUL-07 09:48:23 diag\rdbms\lapdb11\lapdb11\trace\lapdb11_reco_4236.trc 11-JUL-07 10:30:22 diag\rdbms\lapdb11\lapdb11\trace\lapdb11_reco_3256.trcadrci offers many more options to view the alert log and related files in the most efficient manner. For a complete description of adrci commands, see the documentation.