Cloud Account Sign in to Cloud Sign Up for Free Cloud Tier

Oracle Account

Guide to Oracle Data Guard Fast-Start Failover

Architect: High Availability

by John Smiley Published March 2009

This document will guide you through configuring Oracle Data Guard Fast-Start Failover (FSFO) using a physical standby database. FSFO can provide substantial gains in high availability and disaster recovery preparedness for all environments, from inexpensive Cloud-based systems to global distributed data centers.

The information in this guide is based on practical experience gained from deploying FSFO in a global corporate production environment. The guide makes few assumptions about your existing environment and includes examples for creating a physical standby database and Data Guard Broker configuration. To get started, all you'll need is Oracle Database Enterprise Edition Release 10.2 or later, a database, and three hosts: two for the databases and a small host for the FSFO observer. The guide attempts to be operating system agnostic; however, some examples may contain platform specific elements such as path and file naming conventions.

DOWNLOAD

Oracle Database

Major Components of an FSFO Environment

FSFO builds upon a number of other Oracle technologies and features such as Data Guard, Flashback Database, and Data Guard Broker.

Data Guard

The foundation of FSFO is Data Guard - a primary and at least one standby. The standby can be physical or logical and there can be multiple standbys, but only one of the standbys can be the failover target at any given time. The following paragraphs describe the supported availability modes.

Maximum Availability Mode (Oracle Database 10g Rel 2 and later)

In Maximum Availability mode, FSFO guarantees that no transaction that has received a commit acknowledgment will be lost during a failover. The price for this guarantee is increased commit latency ( log file sync waits). Maximum Availability mode uses synchronous redo transfer and FSFO imposes the additional requirement that the redo is recorded in the standby redo log (SRL) of the target standby (AFFIRM option of log_archive_dest_ n). Overall commit latency is increased by the round-trip network latency. With increased latency comes decreased throughput; however, in some cases the difference in throughput may be made up by increasing parallelism.

Although redo transfer is synchronous, Maximum Availability mode allows the primary to remain available if the standby database becomes unavailable for any reason (e.g. standby database, host, or network failure, etc.). If the primary is unable to contact the standby after a user specified period of time (NET_TIMEOUT option of log_archive_dest_ n), it drops out of synchronous transfer mode and begins operating as though it were in Maximum Performance mode. When the standby becomes available again, the primary and standby re-synchronize and resume synchronous redo transfer.

Maximum Performance Mode (Oracle Database 11g Rel 1 and later)

Oracle Database 11g FSFO adds support for Maximum Performance mode (async redo transfer), providing the flexibility to trade durability for performance. Commit latency is not affected by redo transfer, but committed transactions whose redo has not been received by the standby will be lost during failover. FSFO configurations in Maximum Performance mode may limit potential data loss by specifying the maximum allowable age of transactions that are lost during a failover. For example, if the limit specified is 30 seconds (the default), FSFO guarantees that all transactions that committed prior to 30 seconds ago are preserved during failover. The minimum allowable limit is 10 seconds.

Data Guard Broker

Broker is a Data Guard management utility that maintains state information about a primary and its standby databases. It automatically sets Data Guard related database initialization parameters on instance start and role transitions, starts apply services for standbys, and automates many of the administrative tasks associated with maintaining a Data Guard configuration. FSFO is a feature of Broker which records information about the failover target, how long to wait after a failure before triggering a failover, and other FSFO specific properties.

Flashback Database

Flashback Database is a continuous data protection (CDP) solution integrated with the Oracle Database. It provides a way to quickly restore a database to a previous point in time or SCN using on-disk data structures called flashback logs. Flashing back a database is much faster and more seamless (one simple DDL statement) than traditional point-in-time or SCN-based recovery. FSFO uses Flashback Database as part of the process of reinstating a failed primary as a standby.

Problems with automatic reinstatement are frequently due to misconfiguration, so let's look at this in a bit more detail.

Flashback Database records the before-image of changed blocks. To avoid the overhead of recording every change to every block, Flashback Database takes a "fuzzy" snapshot every 30 minutes and only records the before-image block upon its first change since the last snapshot. Subsequent changes to the same block during the same snapshot are not recorded.

Flashing back a database occurs in two stages:

Restore - Flashback Database restores the datafiles to the closest snapshot prior to the specified SCN. This can be compared to performing an RMAN restore of the datafiles from a backup taken prior to the specified SCN, but is much faster.
Media Recovery - Once the restore is complete, recovery proceeds as a typical media recovery, applying redo from archived and online redologs and rolling back uncommitted changes with undo. This means that in order for a flashback database operation to succeed,Flashback Database requires all archive redo logs generated between the snapshot time and restore SCN (typically the past 30 minutes of redo). Use the V$RECOVERY_PROGRESS view to monitor recovery status.

For FSFO environments, set db_flashback_retention_target = 60 or higher to provide sufficient Flashback Database history for automatic standby reinstatement. Metadata for the fuzzy snapshot is stored in the flashback log itself. If that metadata is pushed out, Oracle can no longer find a fuzzy snapshot so it will not be able to flash back. To avoid problems due to timing variations, values less than 60 minutes are not recommended and values of 30 or less virtually guarantee Flashback Database failure.

Flashback Database stores its logs in the Flash Recovery Area (FRA), so the FRA must be large enough to store at least 60 minutes of Flashback Database history. The total storage requirement is proportional to the number of distinct blocks changed during snapshots - e.g. 1,000,000 block changes on a small set of blocks generates less Flashback Database history than 1,000,000 changes on a larger set of blocks. A good method to determine Flashback Database storage requirements is to enable Flashback Database and observe the amount of storage it uses during several peak loads. There is little risk in enabling Flashback Database to determine its storage requirements - it can be disabled while the primary is open if necessary. However, re-enabling Flashback Database will require a bounce since the database must be mounted and not open.

FSFO observer

The observer is the third party in an otherwise typical primary/standby Data Guard configuration. It is actually a low-footprint OCI client built into the DGMGRL CLI (Data Guard Broker Command Line Interface) and, like any other client, may be run on a different hardware platform than the database servers. Its primary job is to perform a failover when conditions permit it to do so without violating the data durability constraints set by the DBA. Only the observer can initiate FSFO failover. It's secondary job is to automatically reinstate a failed primary as a standby if that feature is enabled (the default). The observer is the key element that separates Data Guard failover from its pre-FSFO role as the plan of last resort to its leading role in a robust high availability solution.

Note: the FSFO observer version must match the database version. Oracle Database 11g observers are incompatible with 10g databases and vice-versa.

Conditions for FSFO Failover

By default, the observer will initiate failover to the target standby if and only if ALL of the following are true:

observer is running
observer and the standby both lose contact with the primary
- Note: if the observer loses contact with the primary, but the standby does not, the observer can determine that the primary is still up via the standby.
observer is still in contact with the standby
durability constraints are met
failover threshold timeout has elapsed

User configurable failover conditions (11g and later)

Oracle Database 11g Rel 1 introduced user configurable failover conditions that can trigger the observer to initiate failover immediately.

Health conditions

Broker can be configured to initiate failover on any of the following conditions. Conditions shown in blue are enabled by default.

Datafile Offline (due to IO errors)
Corrupted Controlfile
Corrupted Dictionary
Inaccessible Logfile (due to IO errors)
Stuck Archiver

Oracle errors (ORA-NNNNN)

You can also specify a list of ORA- errors that will initiate FSFO failover. The list is empty by default.

Application initiated

Applications can initiate FSFO failover directly using the DBMS_DG.INITIATE_FS_FAILOVER procedure with an optional message text that will be displayed in the observer log and the primary's alert log.

Walkthrough Overview

The walkthrough begins with a single database that will become the primary of a Data Guard configuration. For this build, we will use a single physical standby database. FSFO can also be used with logical standbys and an FSFO-enabled configuration may have multiple standbys with a mix of physical and logical, but only one standby can be the failover target at any given time.

The major steps in the walkthrough are:

Configure Oracle Net (aka SQL*Net)
Prepare the primary database
Create a physical standby
Enable Flashback Database
Create a Broker configuration
Configure the observer
Enable and test FSFO

Conventions used

Database hosts are referred to as "a" and "b" hosts and the databases themselves are referred to as the "a" and "b" databases. The observer host is 'observer.demo.org'.

Names used in the examples:

Database name	db1
Database unique names	db1_a db1_b
Domain name	demo.org
Hostnames	dbhost-a dbhost-b observer
Data Guard listener name	LISTENER_DG
TNS aliases	db1_a db1_b

Input commands are shown in shaded boxes in normal text. Expected output is shown in blue text.

Configure Oracle Net

Data Guard uses Oracle Net (SQL*Net) for communication between the primary and standby databases and the FSFO observer. Getting the Oracle Net configuration right is one of the key factors in a successful FSFO deployment. Improper Oracle Net configuration is a leading cause of reported FSFO issues.

Note: Data Guard requires dedicated server connections for proper operation. Do not use Shared Server (formerly MTS) for Data Guard.

Configure listeners

It's good practice to use separate listeners for application connections and Data Guard connections. This allows Data Guard to remain functional during maintenance periods when the application listeners are down. Be sure to include the Data Guard listener in the local_listeners database parameter.

Most of the network services used in a FSFO environment may use dynamic registration, but to enable Broker to restart instances during role transitions or during reinstatement after a failover, you must define a static service named db_unique_name_dgmgrl. db_domain . (Note: 11.1.0.7 adds the StaticConnectIdentifier Broker database property to allow you to specify a different service name.) If you will be using RMAN to create the standby database, it also needs a static service to restart the database being created. In order to maintain separation of Broker and non-Broker activity, a second static service is recommended.

listener.ora configuration for host "a":

Guide to Oracle Data Guard Fast-Start Failover

DOWNLOAD

Major Components of an FSFO Environment

Data Guard

Maximum Availability Mode (Oracle Database 10g Rel 2 and later)

Maximum Performance Mode (Oracle Database 11g Rel 1 and later)

Data Guard Broker

Flashback Database

FSFO observer

Conditions for FSFO Failover

User configurable failover conditions (11g and later)

Health conditions

Oracle errors (ORA-NNNNN)

Application initiated

Walkthrough Overview

Conventions used

Configure Oracle Net

Configure listeners

Configure naming method

Start the Data Guard listener

Test the Oracle Net connections

Preparing the Primary

Add the Data Guard listener to local_listeners parameter

Set

Verify

Enable Force Logging

Enable

Verify

Create an spfile

Create

Verify

Create

Verify

Enable remote login

Enable

Verify

Set db_unique_name

Set

Verify

Configure Flash Recovery Area

Configure

Verify

Enable automatic standby file management

Set log_archive_config

Set

Verify

Create standby redo logs

Create

Verify

Steps Requiring a Bounce of the Primary

Enable archivelog mode

Enable

Verify

Enable Flashback Database

Enable

Verify

Enable Maximum Availability Mode

Enable

Verify

Create the Standby Database

Copy the password file from the primary to the standby

Create oratab entry

Create an init.ora file

Set environment

Startup nomount the standby

Create the standby with RMAN

Set the default device type to disk

Duplicate the database from the active primary

Enable Flashback Database on the standby

Enable

Verify

Set the location of the Data Guard Broker configuration files

Set (primary and standby)

Verify

Enable Data Guard Broker

Enable (primary and standby)

Verify

Configure Broker

Start the dgmgrl utility and connect as SYS on the primary

Create the Broker Configuration