High Availability
OVERVIEW
Enterprises have used their information technology (IT) infrastructure
to provide competitive advantage, increase productivity, and empower
users to make faster and more informed decisions. However, with these
benefits has come an increasing dependence on that infrastructure.
Should a critical application, server or data become unavailable, the
entire business can be placed in jeopardy. Revenue and customers can be
lost, penalties can be owed, and bad press can have a lasting effect on
customers and a company's reputation. Building a high availability IT
infrastructure is critical to the success and well being of all
enterprises in today's fast moving economy.
Trends in computing technology are also enabling a
new IT architecture, referred to as Grid computing, to be deployed.
Grid computing is a new computing architecture that effectively pools
large numbers of servers and storage into a flexible, on-demand
computing resource for all enterprise computing needs. Technology
innovations like low-cost blade servers, small and inexpensive
multiprocessor servers, modular storage technologies, and open source
operating systems such as Linux provide the raw material for the Grid. By
harnessing these technologies, and leveraging the Grid technology
available in the Oracle Database, enterprises can deliver extremely
high quality of service to their users while vastly reducing their
expenditures on IT. The Oracle Database enables you to capture the
cost advantages of Grid enterprise computing without sacrificing
performance, scalability, security, manageability, functionality, or
system availability.
CAUSES OF DOWNTIME
One of the challenges in designing a highly
available IT Grid infrastructure is examining and addressing all the
possible causes of downtime. Figure 1 is a taxonomy of downtime which
classifies it into two primary categories: unplanned and planned
downtime. It is important to consider causes of both unplanned and
planned downtime when designing a fault tolerant and resilient IT
infrastructure.
Figure 1: Causes of Downtime
Unplanned downtime is primarily the result of computer failures or data
failures. Planned downtime is primarily due to data changes or system
changes that must be applied to the production system.
HIGH AVAILABILITY IN ORACLE DATABASE 11g
The Oracle database has been widely acknowledged for its technical and
thought leadership in the area of high availability (HA), with a broad
suite of capabilities that help businesses maintain continuous operations
both during unexpected failures and scheduled maintenance activities.
With Oracle Database 11g, Oracle has expanded its innovative leadership in
HA, with a suite of new features that provide significant value for the customer,
as described in the following outline:
- Minimize Downtime - While solutions
such as RAC and Data Guard have addressed this area in prior releases,
Oracle Database 11g introduces new capabilities that
eliminate / minimize downtime even further. One such capability is Online Patching.
Initially available for Linux, this allows certain diagnostic patches to be
installed in a completely online manner, i.e. without requiring the
database to be brought down and applications to be disconnected. So - a novel
capability that further minimizes downtime, leading to faster diagnostic
analysis, and more customer satisfaction.
- Offload Processing & Utilize Resources -
Traditional methods of implementing HA involve idle servers and offline storage
that cannot be used for productive work. In contrast, Oracle Database 11g
provides various capabilities that allow customers to offload processing from the
production server to standby servers, thereby enhancing performance levels
of the production server and utilization of the standby servers. For
example - the
Oracle Active Data Guard Option enables real-time read-only access to a
physical standby database to offload queries, sorting, reporting,
web-based access, etc. from the production database, while continuously
applying changes received from the production database. Similarly, the
new Snapshot Standby capability allows physical standby to be open read/write
for testing/reporting while simultaneously accepting redo data from the primary,
and hence providing DR protection at the same time.
- Scale for Growth - Oracle Database's scale-out
architecture supports dynamic addition/removal of servers (through RAC) and storage disks
(through ASM) in a grid model, allowing easy ways for customers to expand their
architecture as their business grows. Oracle Database 11g allows the use of
the new capabilities in a similarly innovative manner. For example, the previously
discussed Active Data Guard now allows physical standbys to be used in a Reader Farm
configuration, where multiple physical standbys can be used to offer real-time read
access in a highly scalable manner. An on-line music catalog provider can now have
multiple physical standbys that scale web-site read access (e.g. catalog browsing).
At peak holiday periods when website traffic is expected to increase, the customer
simply adds more physical standbys to support the additional workload, and this
doesn’t incur any downtime of the production database.
- Integrate Smartly - Oracle Database's HA
technologies, since they are all integrated and are all Oracle-aware, can provide
tremendous value-added service. For example, by optimizing the direct integration
between Oracle Secure Backup and RMAN, Oracle Secure Backup 10.2 is considerably
faster compared to competitive products. Another stellar solution in this area is
Data Recovery Advisor. It’s a tool (accessible
through Enterprise Manager or RMAN CLI) that has the Oracle kernel intelligence to
automatically diagnose data failures in the database, present various possible
repair options, and execute desired repairs at the user's request. Using such
smart integration throughout the Oracle Database HA solution set, much of the
error-prone and manual operations are taken out of the day-to-day database
administration tasks, thereby improving the overall availability of the database.
For further details of the HA capabilities of Oracle Database 11g,
please refer to the technical white paper,
Oracle Database 11 High Availability. If you are interested in the HA
capabilities of Oracle Database 10g, please refer to
the technical white paper,
Oracle Database 10g Release 2 High Availability.
MAXIMUM AVAILABILITY ARCHITECTURE (MAA)
Operational best practices are key to the successful implementation of
IT infrastructure. Technology alone is not enough.
Oracle Maximum
Availability Architecture (MAA) is a fully integrated and proven
blueprint for building highly available systems. Enterprises that have
based their system architecture on MAA find they can quickly and
efficiently design and deploy applications that meet their business
requirements for system availability. MAA encompasses specific design
and configuration recommendations, which have been extensively reviewed
and tested to ensure optimum system availability and reliability. The
MAA blueprints examine and detail the combined use of key Oracle
Database features for high availability including Real Application
Clusters, Data Guard, Streams, Recovery Manager, Enterprise Manager, etc.
They also address the configuration and integration of other critical
components of highly available systems including servers, storage,
networking, and the application server.