Oracle Backup and Recovery
Backup and recovery is one of the most important aspects of database administration.
If a database crashed and there was no way to recover it, the devastating results
to a business could include lost data, lost revenue and customer dissatisfaction.
Whether companies operate a single database or multiple databases storing hundreds
of gigabytes or even terabytes of data, they share one common factor - the need
to back up important data and protect themselves from disaster by developing
a backup and recovery plan.
BASICS OF BACKUP AND RECOVERY
A backup is a representative copy of data. This copy can include important parts
of a database such as the control file, redo logs, and datafiles. A backup protects
data from application error and acts as a safeguard against unexpected data
loss, by providing a way to restore original data. Backups are divided into
physical backups and logical backups. Physical backups are copies of physical
database files. The phrase "backup and recovery" usually refers to the transfer
of copied files from one location to another, along with the various operations
performed on these files.
In contrast, logical backups contain data that is exported using SQL commands
and stored in a binary file. Oracle records both committed and uncommitted changes
in redo log buffers. Logical backups are used to supplement physical backups.
Restoring a physical backup means reconstructing it and making it available
to the Oracle server. To recover a restored backup, data is updated using redo
records from the transaction log. The transaction log records changes made to
the database after the backup was taken.
Oracle performs crash recovery and instance recovery automatically after an
instance failure. In the case of media failure, a database administrator (DBA)
must initiate a recovery operation. Recovering a backup involves two distinct
operations: rolling the backup forward to a more recent time by applying redo
data, and rolling back all changes made in uncommitted transactions to their
original state. In general, recovery refers to the various operations involved
in restoring, rolling forward, and rolling back a backup. Backup and recovery
refers to the various strategies and operations involved in protecting the database
against data loss and reconstructing the database should a loss occur.
BACKUP AND RECOVERY OPERATIONS
A backup is a snapshot of a datafile, tablespace, or database at a certain time.
If periodic backups of the database have been made and data is lost, users can
apply the stored redo information to their latest backup to make the database
current again. Oracle enables users to restore an older backup and apply only
some redo data, thereby recovering the database to an earlier point in time.
This type of recovery is called incomplete media recovery. If the backup was
consistent, then users are not required to apply any redo data, at all.
A simple example of media recovery illustrates the concept. Suppose a user
makes a backup of the database at noon. Starting at noon, one change to the
database is made every minute. At 1 p.m. one of the disk drives fails, causing
the loss of all data on that disk. Fortunately, Oracle records all changes in
the redo log. The user can then restore the noon backup onto a functional disk
drive and use redo data to recover the database to 1 p.m., reconstructing the
lost changes.
ELEMENTS OF A BACKUP AND RECOVERY STRATEGY
Although backup and recovery operations can be intricate and vary from one business
to another, the basic principles follow these four simple steps:
- Multiplex the online redo logs
- Run the database in ARCHIVELOG mode and archive redo logs to multiple locations
- Maintain multiple concurrent backups of the control file
- Take frequent backups of physical datafiles and store them in a safe place,
making multiple copies if possible
As long as users have backups of the database and archive redo logs in safe
storage, the original database can be recreated.
KEY DATA STRUCTURES FOR BACKUP AND RECOVERY
Before users begin to think seriously about backup and recovery strategy, the
physical data structures relevant for backup and recovery operations must be
identified. This section discusses the following physical data structures:
DATAFILES
Every Oracle database has one or more physical datafiles that belong to logical
structures called tablespaces. The datafile is divided into smaller units called
data blocks. The data of logical database structures, such as tables and indexes,
is physically located in the blocks of the datafiles allocated for a database.
Datafiles hold the following characteristics:
- User-defined characteristics allow datafiles to automatically extend when
the database runs out of space.
- One or more physical datafiles form a logical database storage unit called
a tablespace.
The first block of every datafile is the header. The header includes important
information such as file size, block size, tablespace, and creation timestamp.
Whenever the database is opened, Oracle checks to see that the datafile header
information matches the information stored in the control file. If it does not,
then recovery is necessary. Oracle reads the data in a datafile during normal
operation and stores it in the buffer cache. For example, assume that a user
wants to access some data in a table. If the requested information is not already
in the buffer cache, Oracle reads it from the appropriate datafiles and stores
it in memory.
CONTROL FILES
Every Oracle database has a control file containing the operating system filenames
of all other files that constitute the database. This important file also contains
consistency information that is used during recovery, such as the:
- Database name
- Timestamp of database creation
- Names of the database's datafiles and online and archived redo log files
- Checkpoint, a record indicating the point in the redo log where all database
changes prior to this point have been saved in the datafiles
- Recovery Manager(RMAN) backup meta-data
Users can multiplex the control file, allowing Oracle to write multiple copies
of the control file to protect it against disaster. If the operating system
supports disk mirroring, the control file can also be mirrored, allowing the
O/S to write a copy of the control file to multiple disks. Every time a user
mounts an Oracle database, its control file is used to identify the datafiles
and online redo log files that must be opened for database operation. If the
physical makeup of the database changes, such as a new datafile or redo log
file is created, Oracle then modifies the database's control file to reflect
the change. The control file should be backed up whenever the structure of the
database changes. Structural changes can include adding, dropping, or altering
datafiles or tablespaces and adding or dropping online redo logs.
ONLINE REDO LOG FILES
Redo logs are absolutely crucial for recovery. For example, imagine that a power
outage prevents Oracle from permanently writing modified data to the datafiles.
In this situation, an old version of the data in the datafiles can be combined
with the recent changes recorded in the online redo log to reconstruct what
was lost. Every Oracle database contains a set of two or more online redo log
files. Oracle assigns every redo log file a log sequence number to uniquely
identify it. The set of redo log files for a database is collectively known
as the database's redo log.
Oracle uses the redo log to record all changes made to the database. Oracle
records every change in a redo record, an entry in the redo buffer describing
what has changed. For example, assume a user updates a column value in a payroll
table from 5 to 7. Oracle records the old value in undo and the new value in
a redo record. Since the redo log stores every change to the database, the redo
record for this transaction actually contains three parts:
- The change to the transaction table of the undo
- The change to the undo data block
- The change to the payroll table data block
If the user then commits the update to the payroll table - to make permanent
the changes executed by SQL statements - Oracle generates another redo record.
In this way, the system maintains a careful watch over everything that occurs
in the database.
Circular Use of Redo Log Files
Log Writer (LGWR) writes redo log entries to disk. Redo log data is generated
in the redo log buffer of the system global area. As transactions commit and
the log buffer fills, LGWR writes redo log entries into an online redo log file.
LGWR writes to online redo log files in a circular fashion: when it fills the
current online redo log file, called the active file, LGWR writes to the next
available inactive redo log file. LGWR cycles through the online redo log files
in the database, writing over old redo data. Filled redo log files are available
for reuse depending on whether archiving is enabled:
- If archiving is disabled, a filled online redo log is available once the
changes recorded in the log have been saved to the datafiles.
- If archiving is enabled, a filled online redo log is available once the
changes have been saved to the datafiles and the file has been archived.
ARCHIVED REDO LOG FILES
Archived log files are redo logs that Oracle has filled with redo entries, rendered
inactive, and copied to one or more log archive destinations. Oracle can be
run in either of two modes:
- ARCHIVELOG - Oracle archives the filled online redo log files before reusing
them in the cycle.
- NOARCHIVELOG - Oracle does not archive the filled online redo log files
before reusing them in the cycle.
Running the database in ARCHIVELOG mode has the following benefits:
- The database can be completely recovered from both instance and media failure.
- The user can perform online backups, i.e., back up tablespaces while the
database is open and available for use.
- Archived redo logs can be transmitted and applied to the standby database
- Oracle supports multiplexed archive logs to avoid any possible single point
of failure on the archive logs.
- The user has more recovery options, such as the ability to perform tablespace-point-in-time
recovery (TSPITR)
Running the database in NOARCHIVELOG mode has the following consequences:
- The user can only back up the database while it is completely closed after
a clean shutdown.
- Typically, the only media recovery option is to restore the whole database,
which causes the loss of all transactions issued since the last backup.
Automatic Managed Undo
Every Oracle database must have a method of maintaining information that is
used to roll back, or undo, changes to the database. Such information consists
of records of the actions of transactions, primarily before they are committed.
Oracle refers to these records collectively as undo. Historically, Oracle has
used rollback segments to store undo. Space management for these rollback segments
has proven to be quite complex. Oracle now offers another method of storing
undo that eliminates the complexities of managing rollback segment space, and
allows DBAs to exert control over how long undo is retained before being overwritten.
This method uses an undo tablespace. Undo records are used to:
- Roll back transactions when a ROLLBACK statement is issued
- Recover the database
- Provide read consistency
When a rollback statement is issued, undo records are used to undo changes
that were made to the database by the uncommitted transaction. During database
recovery, undo records are used to undo any uncommitted changes applied from
the redo log to the datafiles. Undo records provide read consistency by maintaining
the before image of the data for users who are accessing the data at the same
time that another user is changing it.
Understanding Basic Backup
A backup strategy provides a safeguard against data loss. Answering the following
questions can help database administrators develop a strong backup strategy:
- What types of failures can occur?
- What information should be backed up?
- Which backup method should be used?
- Should backups be made online or offline?
- How often should backups be made?
- How can dangerous backup techniques be avoided?
WHAT TYPES OF FAILURES CAN OCCUR?
Data loss can occur for various reasons. Here are some of the most common types
of failures that can lead to data loss.
A statement failure is a logical failure in the handling statement in
an Oracle program. For example, a user issues a statement that is not a valid
SQL construction. When statement failure occurs, Oracle automatically undoes
any effects of the statement and returns control to the user.
A process failure is a failure in a user process accessing Oracle, i.e.,
an abnormal disconnection or process termination. The failed user process cannot
continue work, although Oracle and other user processes can. If the user process
fails while modifying the database, Oracle background processes undo the effects
of uncommitted transactions.
An instance failure is a problem that prevents an Oracle instance, i.e.,
the SGA and background processes, from continuing to function. Instance failure
can result from a hardware problem such as a power outage, or a software problem
such as an operating system crash. When an instance fails, Oracle does not write
the data in the buffers of the SGA to the datafiles.
A user or application error is a user mistake that results in the loss
of data. For example, a user can accidentally delete data from a payroll table.
Such user errors can require the database or object to be recovered to a point
in time before the error occurred. To allow recovery from user error and accommodate
other unique recovery requirements, Oracle provides Flashback Technology. For
more information on Flashback
Technology.
A media failure is a physical problem that arises when Oracle tries
to write or read a file that is required to operate the database. A common example
is a disk head crash that causes the loss of all data on a disk drive. Disk
failure can affect a variety of files, including datafiles, redo log files,
and control files. Because the database instance cannot continue to function
properly, it cannot write the data in the database buffers of the SGA to the
datafiles.
WHAT INFORMATION SHOULD BE BACKED UP?
A database contains a wide variety of types of data. When developing backup
strategy, DBAs must decide what information they want to copy. The basic backup
types include:
- Online Database Backup
- Offline Database Backup
- Whole Database
- Tablespace
- Datafile
- Control File
- Archived Redo Log
- Configuration Files
In deciding what to back up, the basic principle is to prioritize data depending
on its importance and the degree to which it changes. Archive logs do not change,
for example, but they are crucial for recovering the database, so multiple copies
should be maintained, if possible. Expense account tables, however, are constantly
updated by users. Therefore, this tablespace should be backed up frequently
to prevent having to apply as much redo data during recovery.
Backups can be combined in a variety of ways. For example, a DBA can decide
to take weekly whole database backups, to ensure a relatively current copy of
original database information, but take daily backups of the most accessed tablespaces.
The DBA can also multiplex the all important control file and archived redo
log as an additional safeguard.
Online Database Backup
An online backup or also known as an open backup, is a backup in which all read-write
datafiles and control files have not been checkpointed with respect to the same
SCN. For example, one read-write datafile header may contain an SCN of 100 while
other read-write datafile headers contain an SCN of 95 or 90. Oracle cannot
open the database until all of these header SCNs are consistent, that is, until
all changes recorded in the online redo logs have been saved to the datafiles
on disk. If the database must be up and running 24 hours a day, 7 days a week,
then you have no choice but to perform online backups of a whole database which
is in ARCHIVELOG mode.
Offline Database Backup
In this backup, all datafiles and control files are consistent to the same point
in time - consistent with respect to the same SCN, for example. The only tablespaces
in a consistent backup that are allowed to have older SCNs are read-only and
offline-normal tablespaces, which are consistent with the other datafiles in
the backup. This type of backup allows the user to open the set of files created
by the backup without applying redo logs, since the data is already consistent.
The only way to perform this type of backup is to shut down the database cleanly
and make the backup while the database is closed. A consistent whole database
backup is the only valid backup option for databases running in NOARCHIVELOG
mode.
Whole Database Backup
The most common type of backup, a whole database backup contains the control
file along with all database files that belong to a database. If operating in
ARCHIVELOG mode, the DBA also has the option of backing up different parts of
the database over a period of time, thereby constructing a whole database backup
piece by piece.
Tablespace Backups
A tablespace backup is a subset of the database. Tablespace backups are only
valid if the database is operating in ARCHIVELOG mode. The only time a tablespace
backup is valid for a database running in NOARCHIVELOG mode is when that tablespace
is read-only or offline-normal.
Datafile Backups
A datafile backup is a backup of a single datafile. Datafile backups, which
are not as common as tablespace backups and are only valid if the database is
run in ARCHIVELOG mode. The only time a datafile backup is valid for a database
running in NOARCHIVELOG mode is if that datafile is the only file in a tablespace.
For example, the backup is a tablespace backup, but the tablespace only contains
one file and is read-only or offline-normal.
Control File Backups
A control file backup is a backup of a database's control file. If a database
is open, the user can create a valid backup by issuing the following SQL statement:
ALTER DATABASE BACKUP CONTROLFILE to 'location'; or use Recovery Manager (RMAN).
Archived Redo Log Backups
Archived redo logs are the key to successful media recovery. Depending on the
disk space available and the number of transactions executed on the database,
you want to keep as many days of archive logs on disk and you want to back them
up regularly to ensure a more complete recovery.
Configuration Files
Configuration files may consist of spfile or init.ora, password file, tnsnames.ora,
and sqlnet.ora. Since these files do not change often, then they require a less
frequent backup schedule. If you lost a configuration file it can be easily
recreated manually. When restore time is a premium, it will be faster to restore
a backup of the configuration file then manually creating a file with a specific
format.
For more information on RMAN
Oracle Enterprise Manager
Although Recovery Manager is commonly used as a command-line utility, the Backup
Wizards in Oracle Enterprise Manager is the GUI interface that enables backup
and recovery via a point-and-click method. Oracle Enterprise Manager (EM) supports
Backup and Recovery features commonly used by users.
- Backup Configurations to customize and save commonly used configurations
for repeated use
- Backup and Recovery wizards to walk the user through the steps of creating
a backup script and submitting it as a scheduled job
- Backup Job Library to save commonly used Backup jobs that can be retrieved
and applied to multiple targets
- Backup Management to view and perform maintenace on RMAN backups.
For more information on Oracle
Enterprise Manager
Using the Data Pump for Supplemental Backup Protection
Physical backups can be supplemented by using the Data Pump utility to make
logical backups of data. Logical backups store information about the schema
objects created for a database. Data Pump writes data from a database into Oracle
files in a proprietary format, which can then be imported into a database using
the Import utility.
For more information on Data
Pump
User Managed Backups
Operating system commands can be used such as the UNIX dd or tar command to
make backups. Backup operations can also be automated by writing scripts. The
user can make a backup of the whole database at once or back up individual tablespaces,
datafiles, control files, or archived logs. A whole database backup can be supplemented
with backups of individual tablespaces, datafiles, control files, and archived
logs. O/S commands can also be used to perform these backups if the database
is down or if the database is placed into hot backup mode to take an online
backup.
UNDERSTANDING BASIC RECOVERY STRATEGY
Basic recovery involves two parts: restoring a physical backup and then updating
it with the changes made to the database since the last backup. The most important
aspect of recovery is making sure all data files are consistent with respect
to the same point in time. Oracle has integrity checks that prevent the user
from opening the database until all data files are consistent with one another.
When preparing a recovery strategy, it is critical to understand the answers
to these questions:
- How does recovery work?
- What are the types of recovery?
- Which recovery method should be used?
HOW DOES RECOVERY WORK?
In every type of recovery, Oracle sequentially applies redo data to data blocks.
Oracle uses information in the control file and datafile headers to ascertain
whether recovery is necessary. Recovery has two parts: rolling forward and rolling
back. When Oracle rolls forward, it applies redo records to the corresponding
data blocks. Oracle systematically goes through the redo log to determine which
changes it needs to apply to which blocks, and then changes the blocks. For
example, if a user adds a row to a table, but the server crashes before it can
save the change to disk, Oracle can use the redo record for this transaction
to update the data block to reflect the new row.
Once Oracle has completed the rolling forward stage, the Oracle database can
be opened. The rollback phase begins after the database is open. The rollback
information is stored in transaction tables. Oracle searches through the table
for uncommitted transactions, undoing any that it finds. For example, if the
user never committed the SQL statement that added the row, then Oracle will
discover this fact in a transaction table and undo the change.
WHAT ARE THE TYPES OF RECOVERY?
There are three basic types of recovery: instance recovery, crash recovery,
and media recovery. Oracle performs the first two types of recovery automatically
at instance startup. Only media recovery requires the user to issue commands.
An instance recovery, which is only possible in an Oracle Real Applications
Cluster configuration, occurs in an open database when one instance discovers
that another instance has crashed. A surviving instance automatically uses the
redo log to recover the committed data in the database buffers that was lost
when the instance failed.Oracle also undoes any transactions that were in progress
on the failed instance when it crashed, then clears any locks held by the crashed
instance after recovery is complete.
A crash recovery occurs when either a single-instance database crashes or all
instances of a multi-instance database crash. In crash recovery, an instance
must first open the database and then execute recovery operations. In general,
the first instance to open the database after a crash or SHUTDOWN ABORT automatically
performs crash recovery.
Unlike crash and instance recovery, a media recovery is executed on the user's
command, usually in response to media failure. In media recovery, online or
archived redo logs can be used to make a restored backup current or to update
it to a specific point in time. Media recovery can restore the whole database,
a tablespace or a datafile and recover them to a specified time. Whenever redo
logs are used or a database is recovered to some non-current time, media recovery
is being performed.
A restored backup can always be used to perform the recovery. The principal
division in media recovery is between complete and incomplete recovery. Complete
recovery involves using redo data combined with a backup of a database, tablespace,
or datafile to update it to the most current point in time. It is called complete
because Oracle applies all of the redo changes to the backup. Typically, media
recovery is performed after a media failure damages datafiles or the control
file.
For more information on Automatic
Fault Recovery
Recovery Options
If the user does not completely recover the database to the most current time,
Oracle must be instructed how far to recover. The user can perform:
- Tablespace point-in-time recovery (TSPITR), which enables users to recover
one or more tablespaces to a point-in-time that is different from the rest
of the database.
- Time-based recovery, also called point-in-time recovery (PITR), which recovers
the data up to a specified point in time.
- Cancel-based recovery, which recovers until the CANCEL command is issued.
- Change-based recovery or log sequence recovery. If O/S commands are used,
change-based recovery recovers up to a specified SCN in the redo record.
- Flashback from human error
If Recovery Manager is used, log sequence recovery recovers up to a specified
log sequence number. When performing an incomplete recovery, the user must reset
the online redo logs when opening the database. The new version of the reset
database is called a new incarnation. Opening the database with the RESETLOGS
option tells Oracle to discard some redo. In Oracle Database 10g and following releases, the control
file added new structures that provides the ability to recover through a
RESETLOGS operation using backups from a previous incarnation.
Recovering From Human Errors
The Oracle Database 11g architecture leverages the unique technological advances
in the area of database recovery due to human errors. Flashback Technology provides
a set of new features to view and rewind data back and forth in time. The Flashback
features offer the capability to query past versions of schema objects, query
historical data, perform change analysis or perform self-service repair to recover
from logical corruptions while the database is online. With the Oracle Database
11g Flashback Technology, you can indeed undo the past!
For more information on Flashback
Technology
WHICH RECOVERY METHOD SHOULD BE USED?
Users have a choice between two basic methods for recovering physical files.
They can:
- Use Recovery Manager to automate recovery.
- Execute SQL commands.
Recovering with Recovery Manager
The basic RMAN commands are RESTORE and RECOVER. RMAN can be used to restore
datafiles from backup sets or image copes, either to their current location
or to a new location. If any archived redo logs are required to complete the
recovery operation, RMAN automatically restores and applies them. In a recovery
catalog, RMAN keeps a record containing all the essential information concerning
every backup ever taken. If a recovery catalog is not used, RMAN uses the control
file for necessary information. The RMAN RECOVER command can be used to perform
complete media recovery and apply incremental backups, and to perform incomplete
media recovery.
For more information on RMAN
Recovering with SQL*Plus
Administrators can use the SQL*Plus utility at the command line to restore and
perform media recovery on your files. Users can recover a database, tablespace,
or datafile. Before performing recovery, users need to:
- Determine which files to recover. Often the table V$RECOVER_FILE can be
used.
- Restore backups of files permanently damaged by media failure. If the user
does not have a backup, recovery can still be performed if the user has the
necessary redo log files and the control file contains the name of the damaged
file.
- If a file cannot be restored to its original location, then the user must
relocate the restored file and inform the control file of the new location.
- Restore necessary archived redo log files.
Conclusion
Backup and recovery of your Oracle database is important to protecting data
from corruptions, hardware failures, and data failures. While Oracle provides
many features to protect your data, nothing can replace a backup.