Fast-Start Fault Recovery

Fast-start checkpointing refers to the periodic writes by the database writer (DBWn) processes for the purpose of writing changed data blocks from the Oracle buffer cache to disk and advancing the thread-checkpoint. Setting the database parameter FAST_START_MTTR_TARGET to a value greater than zero enables the fast-start checkpointing feature. Fast-start checkpointing should always be enabled for the following reasons:

  • It reduces the time required for cache recovery, and makes instance recovery time-bounded and predictable. This is accomplished by limiting the number of dirty buffers (data blocks which have changes in memory that still need to be written to disk) and the number of redo records (changes in the database) generated between the most recent redo record and the last checkpoint.
  • Fast-Start checkpointing eliminates bulk writes and corresponding I/O spikes that occur traditionally with interval- based checkpoints, providing a smoother, more consistent I/O pattern that is more predictable and easier to manage.

If the system is not already near or at its maximum I/O capacity, fast-start checkpointing will have a negligible impact on performance. Although fast-start checkpointing results in increased write activity, there is little reduction in database throughout, provided the system has sufficient I/O capacity.

If determining the time to recover from an instance failure is a necessary component for reaching required service levels, then FAST_START_MTTR_TARGET should be set to the desired mean-time to recovery (MTTR). For example, if service levels dictate that when a node fails, instance recovery time can be no more than 3 minutes, FAST_START_MTTR_TARGET should be set to 180.

When enabling fast-start checkpointing, the following initialization parameters should be removed or disabled (set to 0):

  • LOG_CHECKPOINT_INTERVAL
  • LOG_CHECKPOINT_TIMEOUT
  • FAST_START_IO_TARGET

Although LOG_CHECKPOINT_INTERVAL and LOG_CHECKPOINT_TIMEOUT can also be used to control checkpointing to some degree (hence influencing MTTR), neither takes into consideration the number of data blocks that correspond to the amount of redo generated since the last checkpoint, nor do they consider the time it actually takes to read redo blocks or process data blocks during recovery. This results in checkpoint behavior that remains static as the environment changes, making it unfeasible to target a specific MTTR.

When fast-start checkpointing is enabled, Oracle automatically maintains the speed of checkpointing so that the requested MTTR is achieved. Setting a non-zero value for LOG_CHECKPOINT_INTERVAL or LOG_CHECKPOINT_TIMEOUT interferes with fast-start checkpointing, resulting in a different MTTR than expected.

Fast-Start Checkpointing and Performance

When using fast-start checkpointing, performance may be influenced in the following ways (as compared to typical interval-based checkpointing):

  • Instance and crash recovery times are reduced significantly
  • Database throughput is not affected unless using an aggressive FAST_START_MTTR_TARGET setting
  • The number of physical writes to the disk increases
  • The I/O profile is smoother, more consistent, and more predictable

Overview of MTTR and Crash Recovery Components

FAST_START_MTTR_TARGET is designed to control the time it takes the database to do crash recovery for a single instance. Crash recovery implies that there are no running instances and that an instance restart is a necessary step. The main components of crash recovery are:

  • Instance startup
  • Opening of all database files
  • Cache recovery (rolling forward)
  • Transaction recovery (rolling back)

There is a timing associated with each component. However, since transaction recovery is done by SMON in the background along with normal user activity after the database is opened, it is not factored into fast-start checkpointing calculations. The timings for instance startup and the opening of all datafiles are taken from the actual values for the currently running instance. For example, if it takes 50 seconds to start up the instance and 10 seconds to open and set up all the datafiles, those are the values used in calculating how aggressively the checkpoint needs to advance in order to ensure MTTR is within the FAST_START_MTTR_TARGET setting.

The time required to do cache recovery is determined by two factors:

  • The amount of redo (changes in the database) that needs to be processed during recovery. This is determined by the number of redo log blocks between the last thread checkpoint and the end of the redo log.
  • The number of data blocks that need to be processed during recovery, which is determined by the number of dirty blocks (changes to blocks in memory which have not yet been written to disk) in the cache at the time of the crash.

With fast-start checkpointing, Oracle automatically advances the thread checkpoint to control the amount of redo needed for recovery and limit the number of dirty buffers remaining in the cache so that recovery time is bounded. For example, in a large environment it may take 50 seconds to start up the instance and 10 seconds to open and set up all the datafiles. The final component is cache recovery, or the roll forward phase. With FAST_START_MTTR_TARGET set to 180, the Oracle server uses fast-start checkpointing to limit the cache recovery time to 120 seconds. Oracle maintains statistics about previous recoveries to estimate how long a future recovery would take. The statistics are weighted such that as more, larger recoveries are completed, the estimates are more accurate. This information is stored in the control file

Fast-Start Advisory

Starting in Oracle9i Release 2 (9.2), MTTR advisory is available to help you evaluate the effect of different MTTR settings on system performance in terms of extra physical writes.

How MTTR Advisory Works

When MTTR advisory is enabled, after the system runs a typical workload for a while, you can query V$MTTR_TARGET_ADVICE, which tells you the ratio of estimated number of cache writes under other MTTR settings to the number of cache writes under the current MTTR. For instance, a ratio of 1.2 indicates 20% more cache writes. By looking at the different MTTR settings and their corresponding cache write ratio, you can decide which MTTR value fits your recovery and performance needs. V$MTTR_TARGET_ADVICE also gives the ratio on total physical writes (including direct writes), and the ratio on total I/Os (including reads). Enabling MTTR Advisory Enabling MTTR Advisory involves setting two parameters:

  • STATISTICS_LEVEL
  • FAST_START_MTTR_TARGET

Make sure that STATISTICS_LEVEL is set to TYPICAL or ALL. To enable MTTR advisory, set the initialization parameter FAST_START_MTTR_TARGET to a nonzero value. If FAST_START_MTTR_TARGET is not specified, then MTTR advisory will be OFF. When MTTR advisory is ON, it simulates checkpoint queue behavior under five different MTTR settings: Current MTTR, 0.1, 0.5, 1.5, and 2.

Viewing MTTR Advisory

Oracle Database 10g EM MTTR graphStarting in Oracle9i Release 2 (9.2), a dynamic performance view is provided for viewing statistics or advisories collected by MTTR advisory. Enterprise Manager also provides a graph to view MTTR statistics, which maps MTTR target values against changes in total I/O. The MTTR graph from Oracle Database 10g Enterprise Manager is shown on the right.

If MTTR advisory has been turned on, V$MTTR_TARGET_ADVICE shows the advisory information collected. Usually this view show five rows, corresponding to the current MTTR, 0.1 times the current MTTR, 0.5 times the current MTTR, 1.5 times the current MTTR and 2 times the current MTTR. However, if one or more of the 5 values are less than the smallest MTTR target the system can sustain, their corresponding rows are replaced with a single row corresponding to the smallest MTTR target the system can have. Similarly, if one or more of the 5 values are larger than the worst-case MTTR target the system can have, their corresponding rows are replaced with a single row corresponding to the worst-case MTTR target the system can have. If MTTR advisory is currently OFF, the view shows information collected the last time MTTR advisory was on.

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy