Data Guard vs. HADR

Technical Comparison of Oracle Data Guard vs. IBM DB2 HADR

Ashish Ray, Server Technologies, Oracle Corporation



[Note: This article is excerpted from the comprehensive white paper: Technical Comparison of Oracle Database 10g vs. IBM DB2 v8.2: Focus on High Availability, available at the OTN HA Collateral site.]


OVERVIEW

Oracle Data Guard

Oracle Data Guard is the disaster recovery (DR) solution for the Oracle database. Available as an integrated feature of Oracle Database Enterprise Edition, it ensures fault isolation, high availability and disaster recovery through the use of one or many standby databases that are transactionally consistent copies of the production or primary database. In the event of a planned or unplanned outage at the production site, Data Guard ensures that a chosen standby database can be easily switched to a primary database role, and continue serving the enterprise data needs.

DB2 HADR

The new high availability feature in DB2 version 8.2 is called "High Availability Disaster Recovery", or HADR [1]. HADR is based on a similar feature, called High Availability Data Replication (HDR for short) from IBM's Informix Dynamic Server acquisition]. It is similar to Oracle Data Guard in the sense that it replicates data changes from a source database, called the primary, to a target database, called the standby. The idea is that in the event of a partial or complete site failure, the standby database can take over for the primary database.

DB2 – What Existed in Previous Releases

What existed in version 8.1 was simply called DB2 log shipping, which involves manual administration and custom scripts. This log-shipping mechanism requires a setting up a user-exit program (specifically called “db2uxt2”) at a specified location, and setting the database configuration parameter “userexit” to “Yes”. Once these settings are in place, the DB2 server makes a call every five minutes to the user-exit program to check for log files that can be archived to the special directory/location specified by the program. The user-exit program could be written such that it copies the archived logs to a directory accessible by the standby server, or simply FTP the logs to the standby server.

This mechanism also requires setting up a scheduled job on the standby system to periodically issue a db2 rollforward command (e.g. “db2 rollforward db <dbname> to end of logs”). When the rollforward db command is invoked on the standby server, the DB2 logger automatically attempts to retrieve the next consecutive log file from the archive target path. The roll forward operation continues to retrieve log files until there are no more left to process. The frequency at which this job runs determines how quickly archived logs can be picked up and applied by the standby database.

Obviously, this script-driven approach is very cumbersome and error-prone. With HADR in Version 8.2, IBM has attempted to remove some of the complexities associated with this approach.

How is v8.2 HADR Different from v8.1 Log Shipping?

The DB2 version 8.1 log-shipping feature still exists in version 8.2. The way HADR is different from log-shipping is that IBM has built some more automation around this concept, using technology acquired from Informix. The dependence on manual user-exit programs and explicit roll-forward commands has been reduced through the “START HADR” and “STOP HADR” commands. However, the caveat here is that the existing “rollforward database” command cannot be used in a HADR configuration, because it may produce some inconsistencies [1]. IBM advises using “start hadr on db <dbname> as standby” instead.

The data protection is more granular in HADR compared to the earlier log-shipping. Instead of waiting for complete archived logs to be generated and then sending those, log pages are sent to the standby database as they are generated on the primary database. The state of replication is also controlled in a more granular manner through the HADR_SYNCMODE configuration parameter, with its SYNC, NEARSYNC and ASYNC values, which are somewhat similar to the protection modes of Data Guard. HADR also provides automation around the role transition, through the TAKEOVER HADR command.


DATA GUARD: COMPARATIVE STRENGTHS

HADR is a new feature. In contrast, Data Guard has been around for several years, has evolved and enhanced through several Oracle database releases, and is deployed for mission-critical applications at major customer sites all over the world. The following table provides a quick summary of the comparative strengths of Data Guard over HADR.

Table 1: Addressing Disaster Recovery – Oracle vs. DB2

Addressing Disaster Recovery Oracle
DB2
Continuously open standby for read/write & reporting Yes
No
Integrated automatic failover
Yes
No
Integrated automatic reinstatement of old primary Yes
No
Ability to open a standby database read-only Yes
No
Seamless back-and-forth conversion between a read-write database and a standby database Yes
No
Ability to do backups on the standby Yes
No
Preventing data corruptions due to human errors (through delayed apply, or flashback database) Yes
No
Support for clustering Yes
No
Asynchronous log data transport has no impact on the primary Yes
No
Multiple standbys in the same configuration Yes
No
Flexible log data transport Yes
No
Transparent out-of-the-box log gap resolution Yes
No
Log gap resolution through incremental backups Yes
No
Rolling upgrades across major database releases Yes
No
Dynamic reconfiguration of relevant parameters Yes
No
Primary does not need to be recreated after a failover Yes
No
Replication of stored procedures Yes
No
Built-in authentication / encryption Yes
No
Support for raw devices Yes
No
Cascaded standbys Yes
No
Unrestricted CLOB/BLOB replication Yes
No



Following sections provide further details on Data Guard's comparative strengths.


HADR does not offer any functionality equivalent to Data Guard SQL Apply

Oracle Data Guard supports two kinds of standby databases – physical standby databases that use Redo Apply technology, and logical standby databases that use SQL Apply technology. These two types of standby databases are well integrated with each other – for example, a single Data Guard configuration can be created to contain a mix of physical and logical standby databases. Data Guard uses the same redo transport mechanism to keep these standby databases transactionally consistent with the primary. No extra integration is required to maintain these two types of standby database – they are part of the same feature. The same management interface – whether it is SQL*Plus, or DGMGRL, or Enterprise Manager Grid Control, can be used to manage these two types of standby databases.

The kind of standby database supported by HADR is similar to physical standby. However – SQL Apply provides a very powerful capability for Data Guard, allowing a logical standby database to be open for read/write access and be utilized as a reporting database while SQL is being applied to it. With real time apply – the new feature in Oracle Database 10g Release 1, logical standby databases can also be used as a real time reporting solution. This means that the logical standby server can also be utilized for other valuable business purpose besides disaster recovery. This is critical because effective system resource utilization is a very important criterion for any disaster recovery solution – to be cost effective, one simply can’t afford to have system resources idling away waiting for the next disaster to happen.

A standby database in a HADR configuration cannot be open – whether read, or read/write, while log data is being applied to it. Applications cannot access this standby database in any state, which means HADR customers cannot extract value out of their DR investment, to the extent Data Guard customers can.


HADR does not provide any integrated automatic failover

HADR does not provide any integrated capability to automatically perform a failover after a severe outage at the production site. The failover operation has to be manually initiated through the TAKEOVER … BY FORCE command.

In contrast, Data Guard in Oracle Database 10g Release 2 offers the Fast-Start Failover feature that allows Data Guard to automatically fail over to a previously chosen, synchronized standby database in the event of loss of the primary database, without requiring any manual steps to invoke the failover. Not only that, following a failover, once connection to the old primary database is established, it is automatically reinstated as a new standby in the configuration, restoring high availability and data protection capabilities for the configuration.

To implement automatic failover, DB2 documentation [1] suggests integration with a Cluster Manager, which manages the primary-standby pair. There are several flaws with this design. For one thing – it requires a separate integration with a third party clustering product for the OS under consideration. At least for AIX, IBM offers its own clustering product – HACMP, which requires the separate integration anyway. However, for other OS-s, e.g. Linux, IBM has to be dependent on whatever clustering product that is provided by that OS vendor, and the integration complexities are likely to be more serious. Also – this approach may still be able to be made to work in cases where the primary and standby databases are located across a short distance. For any distance that is recommended for a practical level of disaster protection, DB2 would require a geo-cluster implementation for automatic failover, which will increase the integration and operational complexities even more.

Data Guard’s integrated automatic failover capability is an excellent fit for mission-critical business applications, which must tolerate server failures transparently, while at the same time, being protected from data failures. Without such an integrated support for automatic failover, DB2 is not a fit for lights-out high availability that is increasingly a critical need for today’s 24x7 global business applications.


A HADR standby cannot be open read-only like Redo Apply

In Data Guard, the physical standby database can be open read-only to satisfy read-only reporting requirements. While in this state, redo is still sent to the standby server – it is just not applied to it. After the reporting is complete, redo apply can be restarted with a simple command, or a mouse click. In this manner, a physical standby database can be transitioned back-and-forth between being in redo apply mode and being opened up read-only as many times as possible, to suit specific business requirements.

A HADR standby does not have any such capability. To do any client/application access, the HADR standby database either has to switch role to a primary database, or activated to be a standard database.

Note that the SQL Apply continuously-open or Redo Apply read-only capabilities offer an excellent way to test out or validate the DR configuration, without causing any disruption to the primary database. In contrast, since applications cannot access the HADR standby database at any time, the only way such a validation can be done in a HADR configuration is to do a role change to the standby, which is disruptive to the primary database.


A HADR standby cannot be seamlessly activated to be a read/write database, and back

Once a HADR standby database is activated to be a standard database, it cannot just go back to be a standby database. If it needs to go back to the standby state, it needs to be completely reinstantiated as a standby database from a backup of the primary database [1]. In contrast, a physical standby database can be activated to support read/write reporting capabilities, and then – using the Flashback Across Resetlogs feature in Oracle Database 10g Release 2, it can be converted back to a standby database with a single command. This enhances the reporting as well as testing/cloning capabilities of the Data Guard physical standby database.


A HADR standby cannot be used for backups

A Data Guard physical standby database can be used for backups, which can be used to restore primary databases. This offloads the backup operation from the production database, reduces resource contention on the production server, boosts performance, and enables no-downtime backup windows. Furthermore, using RMAN, the backups can occur while redo is being applied to the physical standby database. A HADR standby database cannot be used for such backups. This is yet another example where customers investing in a DB2 HADR configuration will not be able to extract value out of their investment and will instead waste money on system resources that are essentially sitting idle.


HADR does not have any built-in mechanisms to prevent/undo data corruptions related to human errors

Human errors are one of the leading causes of downtime, yet HADR, in contrast with Data Guard, does not have any built-in mechanisms to prevent data corruptions related to human errors.

One way Data Guard prevents such data corruptions is using delayed apply. The redo is still sent to the standby as fast as possible, however the apply (Redo Apply or SQL Apply) can be delayed on the standby by a configurable amount of time. This provides administrators a safety time window to failover to the standby, in case the primary has been corrupted for example, because of a bad batch job that got run on the primary database. This delay is very flexible in that it can be configured on a per-standby basis – a primary database with two standbys may have one standby configured with a delay of 4 hrs, the other with a delay of 12 hrs – for varied protection from such corruptions.

In Oracle Database 10g, administrators may choose not to use delayed apply but use real-time apply instead (e.g. to get the benefits of real-time reporting in logical standby databases). If a human error were to occur in such cases, the primary and standby databases may simply be flashed back to a safe point in time in the past, using the Flashback Database feature, providing yet another flexibility for Data Guard in preventing corruptions due to human errors. HADR completely lacks this capability.


HADR does not support DB2’s own clustering feature

HADR does not support DB2’s own partitioning feature (Database Partitioning Feature, or DPF), which is its clustering feature [1]. This is a critical deficit, since partitioning is IBM’s premier HA feature. This effectively means that the HA part of HADR is really missing! Data Guard on the other hand, is completely integrated with Oracle's clustering solution – RAC. Either or both of the primary and standby database can be a RAC cluster. All protection modes are supported in these configurations. Automated transmission of redo data and recovery are available for all configurations. With a well-integrated RAC and Data Guard offering, Oracle offers an end-to-end High Availability solution that is simply unparalleled in the industry.


The HADR asynchronous mode is not really asynchronous, since it can stall the primary

HADR uses synchronization modes (the values SYNC, NEARSYNC and ASYNC of the HADR_SYNCMODE configuration parameter), to manage the transmission of log data between the primary and standby databases. The asynchronous (ASYNC) value is meant to minimize the impact in the primary database, but even in this mode, the primary can stall in cases where there is high traffic, as pointed out in the DB2 documentation [1], [2]:

For example, when the HADR synchronization mode is asynchronous and the primary and standby databases are in peer state, if the primary database is experiencing a high transaction load, the log receive buffer on the standby database might fill to capacity and the log shipping operation from the primary database might stall.

or


If HADR synchronization mode (the HADR_SYNCMODE database configuration parameter) is set to ASYNC, during peer state, a slow standby may cause the send operation on the primary to stall and therefore block transaction processing on the primary.


To manage these temporary peaks, the documentation suggests tweaking the DB2_HADR_BUF_SIZE registry variable. This tweaking seems to be purely arbitrary, without any advisories provided to assist the customer.

In contrast, Data Guard in Oracle Database 10g has been architected such that the Maximum Performance mode will not block the primary.


A HADR configuration does not support multiple standbys

A Data Guard configuration supports multiple standbys, which allows standbys to be used whether they are physical or logical standbys, or whether they are located on a LAN or a WAN. Customers find this option quite flexible to meet their unique business needs. For example, with Data Guard, a multiple standby configuration is possible in which there is a logical standby in a LAN serving as a local reporting database, and a physical standby in a WAN serving as a remote DR database. A HADR configuration, which allows only a primary-standby pair, does not offer this flexibility.


Log data transport / log gap resolution in HADR configuration is not well-architected

The fact that the primary database can be blocked even in an asynchronous log data transport in a HADR configuration in the event of a high transaction load points out the inefficient log data transport architecture for HADR. There are other issues related to this matter that are worth pointing out.

IBM collateral indicates that there is an HADR process that takes the log buffer and passes it over to the HADR process on the standby machine. The fact that there is one process implies a potential bottleneck in transmission of the data from the log buffers to the standby database, which indeed explains why the primary database may stall for high transaction loads. In contrast, for Data Guard, it is possible to set up multiple Archiver (ARCH) processes to transmit redo data to the standby database (even using parallel streaming in Oracle Database 10g Release 2) without impacting the primary database even for high transaction loads.

A related issue is the case of network disconnect problems – which is especially a concern that needs to be addressed if the disaster recovery solution is deployed over a long distance. In the case of HADR, if network connection is lost, the standby database enters the Remote Catchup Pending state, and remains in that state until the connection is restored. When the connection is restored, it enters the Remote Catchup state, and expects primary database archival methodology to send archived log data to the standby database – presumably through using specialized user-exit programs and configuring the database configuration parameters logarchmeth1 and logarchmeth2. When all of the log files on disk of the primary have been replayed by the standby database, the primary and standby enter Peer state, at which time log pages can be sent to the standby directly from the log buffer of the primary.

The serious architectural limitation here is that for a busy system, following a network disconnect problem and subsequent restoring of the connectivity, the standby may perpetually be in the Remote Catchup state – because there is always more archived data to catch up to. Consequently, it may never be able to move to the Peer state, implying a significant data loss exposure in such cases.

Data Guard, in contrast, allows simultaneous redo transmission using both the Log Writer (LGWR) and Archiver (ARCH) processes, and that – in combination with allowing multiple network connections to the same standby server, allows much more expedited catchup for a Data Guard standby, following restoration of network connectivity.

For resolution of large archive-log gaps, in order to minimize the delay in transmitting several large log files over the network, it may be prudent to apply incremental backups on the standby database and bring it up-to-date with the primary, or at least get missing log files from another local standby. HADR does not support applying incremental backups on the standby database to bring it up to date with the primary. Also, since HADR does not support multiple standbys, it is solely dependent on the primary server for missing log data. Not only that – if a HADR standby enters the Remote Catchup Pending state, and more log files become available that can be manually copied to the standby server, the standby database must be restarted to ensure that it can recognize those logs [1]. For Data Guard – if missing log files are registered with the standby control file – Redo Apply or SQL Apply will automatically recognize them and start applying, without needing any database restart.


HADR does not support rolling upgrades across major database releases

Both HADR and Data Guard supports rolling upgrades of databases, but HADR's rolling upgrades support is much more restrictive because unlike Data Guard, it does not support rolling upgrades across major database releases. It supports only rolling upgrades across fixpack releases (equivalent to Oracle patchsets).

A related issue regarding system requirements is that the OS on the primary and standby databases in a HADR configuration should be the same version, including patches. In contrast, in a Data Guard configuration, the OS on the primary and standby databases should be the same, but they can be of different versions. For example, a Solaris 8 – Solaris 9 Data Guard configuration is supported.


Dynamic reconfiguration of HADR configuration parameters is not supported

Any changes made to any HADR configuration parameter are not effective until the database has been shut down and restarted [1].

Examples of these parameters are: HADR_TIMEOUT, HADR_SYNCMODE, etc. This problem is compounded by the fact that most of these parameters require identical values between the primary and standby databases. This means that to change any of these values, both the primary as well as the standby databases have to be shut down and restarted. This certainly reduces the availability and flexibility of HADR.

In contrast, almost all the parameters that are relevant to a Data Guard configuration can be dynamically altered without requiring the restart of the database.


A failover in a HADR configuration requires full re-instantiation of the old primary database

A failover in a HADR configuration, done through the TAKEOVER HADR command with the BY FORCE option, typically requires the old primary database to be recreated using a backup of the new primary database, before it can rejoin the configuration. Only in one specific case [1] – in the SYNC state, and also if the primary was in a peer state when it fails, can the old primary be resynchronized as a new standby, instead of needing to be recreated. However, even in SYNC mode it is possible that the primary was not in the peer state when it fails, and in that case – any effort to start it as a standby database (through the START HADR … AS STANDBY command) will fail, and it has to be recreated from a full backup of the new primary.

DB2 in general requires this full database re-instantiation because with HADR, it is not possible to revert the primary database to the point in time when the failure occurred [1]. This may consume significant time and resources considering today’s multi-TB databases. Besides, while the old primary is being recreated from a backup – not only does this leave the HADR configuration unprotected for this duration, it may also create logistics problems if the HADR configuration is a WAN, because that may involve shipment of backup tapes from a remote site.

In contrast, since Data Guard is integrated with the Oracle Flashback Database feature, a primary database, after a failover, can be simply flashed back to the point in time when the failover occurred, and then can simply rejoin the Data Guard configuration as a new standby and automatically catch up. This capability is available since Oracle Database 10g Release 1. As discussed previously, the Fast-Start Failover feature in Oracle Database 10g Release 2 automates this even further, by automatically reinstating the old primary database as a new standby database, without requiring any manual backups, cumbersome shipping of tapes, followed by a manual restore operation – as is the case with DB2.

This means that compared to DB2 HADR, complete data protection can be restored much more quickly in a Data Guard configuration after a failover, and a Data Guard configuration is much better protected from possible subsequent failures.


HADR does not replicate stored procedures

Stored procedures are not replicated in a HADR configuration – they must be manually recreated. In contrast, for Data Guard, this is not an issue for Redo Apply, which replicates all stored procedures. SQL Apply also replicates all stored procedures, except Oracle PL/SQL supplied packages that modify system metadata.


HADR does not offer built-in authentication/encryption

HADR does not offer built-in security mechanisms such as encrypting log data while in transit, or authenticating every new primary-to-standby connection [3]. Data Guard in Oracle Database 10g offers a built-in security feature such that every new connection between a primary and standby database is authenticated based on an administrator-supplied password. Furthermore, with Oracle Advanced Security Option (ASO), all redo traffic between the primary and standby databases will be encrypted. Support for such security mechanisms are very important considering that highly sensitive business critical data may be transmitted between primary and standby databases.


HADR does not support raw devices

HADR does not support a configuration in which the primary/standby database is based on raw devices (as opposed to a file system). No such restriction exists for Data Guard.


HADR does not support cascaded standbys

HADR does not support cascaded standby configurations in which a standby can retransmit redo to a second layer of standbys. This helps saving CPU-processing and networking resources around the primary data center. Data Guard supports this capability.


HADR does not properly replicate all BLOBs and CLOBs

An IBM support note indicates that BLOBs and CLOBs larger than 1 GB cannot be logged, so they also cannot be replicated. Data Guard does not have this restriction – whether that is for physical standby, or logical standby.



CONCLUSION

Recognizing the high availability challenges every business faces, Oracle provides comprehensive, unique, powerful, and simple-to-use capabilities that protect businesses against all forms of unplanned downtime, including system faults, data corruption, disasters, and human errors. Oracle achieves this in an environment where the downtime that occurs during planned maintenance activities is also minimized.

Unlike DB2, Oracle’s high availability solutions are not isolated, disjointed solutions. Oracle offers a well-integrated high availability solution stack – comprised of components such as RAC, Data Guard, RMAN, Flashback, etc., that do not need consultants to stitch them together. This saves customers time, money and system/people resources – factors that are extremely critical in today’s economy. Oracle has gone one step further by publishing best practice guidelines for configuring a High Availability solution through its Maximum Availability Architecture framework, and making it available for its customers. The long list of Oracle customers who have embraced its High Availability solutions is a testimonial to Oracle’s unparalleled technical leadership and vision in this area.

In contrast to Oracle, DB2 offers a basic set of backup and recovery features and lacks the completeness and depth of High Availability functionality required by most businesses today. DB2 continues to lag several releases behind Oracle in this regard and is not an appropriate choice for today’s business applications demanding high levels of uptime.



REFERENCES
  1. IBM DB2 Universal Database – Data Recovery and High Availability Guide and Reference, Version 8.2, Chapter 7: High Availability Disaster Recovery (HADR)
  2. IBM DB2 Universal Database – Administration Guide: Performance, Version 8.2, Appendix A: DB2 Registry and Environment Variables
  3. IBM DB2 Universal Database – Introduction to Replication and Event Publishing, Version 8.2, Chapter 7: Comparison of Q replication to high availability disaster recovery (HADR)



Ashish Ray (Ashish.Ray at oracle.com) is a Group Product Manager with Oracle's Database High Availability Group. He has 12+ years of combined experience in software architecture design, software development and product management, focusing largely on the reliability, availability and scalability issues of enterprise and e-business computing.


E-mail this page
Printer View Printer View