Automatic Fault Recovery
Oracle performs recovery automatically on two occasions:
- At the first database open after the crash of a single-instance database
or all instances of an Oracle Real Applications Cluster database (crash recovery).
- When some but not all instances of an Oracle Real Application Clusters configuration
fail (instance recovery). The recovery is performed automatically by a surviving
instance in the configuration.
The important point is that in both crash and instance recovery, Oracle will
automatically recover data to a transactionally consistent state. This
means the datafiles will contain all committed changes, and will not contain
any uncommitted changes. Oracle returns to the transactionally consistent
state by rolling forward changes captured in the log files but not the datafiles,
and rolling back changes that had not been committed. This roll forward
and roll back process is called crash recovery. In a Real Application
Clusters environment, this process is performed by a surviving instance and
called instance recovery.
Why is recovery necessary?
To improve performance, Oracle keeps many changes in memory, even after they are
committed. It may also write data to the datafiles to free up memory, even though
the changes have not been committed. At the time of a failure, all data in memory is
lost. In order to ensure no committed changes are lost, Oracle records all
operations in an online redo logfile. The information in the log file allows Oracle
to redo any operations that may be lost in a failure. Writing to the logfile does
not hurt performance, because these writes are sequential and very fast. Writing to
datafiles on the other hand is random and can be very slow because the disk block to be
modified on disk must be located, and the disk head properly positioned for every write.

Cache Recovery (Roll Forward)
During cache recovery, Oracle replays transactions in the online redo log beginning
with the checkpoint position. The checkpoint position in the place in
the redo log where changes associated with previous redo entries had been saved
to the datafiles before the failure. As Oracle replays the redo operations,
it applies both committed and uncommitted changes to the datafiles. At
the conclusion of the roll forward phase, the data files contain all committed
changes, as well as new uncommitted changes (applied during roll forward) and
old uncommitted changes (saved to the datafiles to free up space in buffer cache
prior to the failure).
The database cannot open until the roll forward phase is complete.
Transaction Recovery (Roll Back)
During transaction recovery, Oracle searches out changes associated with dead
transactions that had not committed before the failure occurred. Undo
blocks (whether in rollback segments or automatic undo tablespaces) record database
actions that should be undone during certain database operations. In database
recovery, the undo blocks roll back the effects of uncommitted transactions
previously applied by the rolling forward phase. After the roll forward, any
changes that were not committed must be undone. Oracle applies undo blocks to
roll back uncommitted changes in data blocks that were either written before
the crash or introduced by redo application during cache recovery. This process
is called rolling back or transaction recovery. Oracle can roll back multiple
transactions simultaneously as needed. All transactions systemwide that were
active at the time of failure are marked as dead. Instead of waiting for SMON
to roll back dead transactions, new transactions can recover blocking transactions
themselves to get the row locks they need.

|