As Published In

Oracle Magazine
November/December 2003
Technology BUSINESS INTELLIGENCE

Capturing Change

By Sanjay Mishra

Use Change Data Capture to simplify ETL in data warehouse applications.

Data extraction is an integral part of all data warehousing. Data is often extracted on a nightly basis from the transac tional systems and transported to the data warehouse. Typically, all the data in the data warehouse is refreshed with data extracted from the source system. But this involves the extraction and transportation of huge volumes of data and is very expensive in both resources and time.

Since the data extraction takes place daily, it would be much more efficient to extract and transport only the data that has changed since the last extraction. However, in most source systems, it is extremely difficult, if not impossible, to identify and extract only the recently changed data.

Beyond the challenge of identifying the recently changed data, many extraction, transformation, and loading (ETL) environments involve one source system feeding data into multiple target systems. It is also a challenge to sync up change data from one source to many targets.

The Oracle Change Data Capture (CDC) feature simplifies the process of identifying the change data since the last extraction. CDC provides easy-to-use APIs to set up a CDC environment. If you are a DBA or a developer involved in a data integration project that involves the regular extraction and transportation of a large amount of data from one system to another system or systems, consider CDC to help reduce your ETL time.

This article gives you a step-by-step approach for setting up a CDC environment, outlines the prerequisites for the setup, and discusses the APIs you need to know to set up CDC for your application.

Overview

The CDC feature was introduced in Oracle9i Database. CDC helps identify the data in the source system that has changed since the last extraction. With CDC, data extraction takes place at the same time the INSERT, UPDATE, or DELETE operations occur in the source tables, and the change data is stored inside the database in change tables. The change data, thus captured, is then made available to the target systems in a controlled manner, using database views.

Figure 1 shows the basic architecture of a CDC environment.

CDC creates the subscriber view to control accessibility of the change table data to the target systems. A target system is any application that wants to use the data captured from the source system.

The Publisher and the Subscriber

The CDC architecture is based on the publisher/subscriber model. The publisher captures the change data and makes it available to the subscribers. The subscribers utilize the change data obtained from the publisher. Usually, a CDC system has one publisher and many subscribers.

The publisher first identifies the source tables from which the change data needs to be captured. It then captures the change data and stores it in specially created change tables. Then it allows the subscribers controlled access to the change data.

The subscriber needs to know what change data it is interested in. A subscriber may not be interested in all the data that is published by the publisher. The subscriber creates a subscriber view to access the change data to which it has been granted access by the publisher.

Change Data Capture architecture

Figure 1: Change Data Capture architecture

Setting Up a CDC Environment

The CDC feature is part of the Oracle9i (and later) Database software; you don't need to install any software other than Oracle9i Database to use the CDC feature. To make use of Oracle9i Database's CDC feature, do the following:

  • Enable Java (the Java virtual machine—JVM) in the database. Refer to the Oracle9i Database Java Developer's Guide for detailed instructions on configuring Oracle JVM. When creating a database using Database Configuration Assistant (DBCA), you can choose to install Oracle JVM.
  • Enable system triggers. When you create a database, system triggers are enabled by default.

Once your database is ready for CDC, you can follow this step-by-step approach to set up a CDC environment:

1. Identify the source tables.
2. Set up a publisher.
3. Create change tables.
4. Set up a subscriber.
5. Subscribe to the source tables, and activate the subscription.
6. Set up the CDC window.
7. Prepare a subscriber view.
8. Access data from the change tables.
9. Drop the subscriber view, and purge the CDC window.
10. Repeat steps 6 through 9 to see new data.

For the purposes of this article, I will use a sales application example. The source application is a transactional system that records the information on each sale. The tables for the transactional application are owned by schema owner SALES_OP.

I am setting up a data warehouse based on the sales operational data. The user SALES_DW is the owner of the data warehouse tables. The purpose of my CDC system is to transfer change data from the transactional system (source tables owned by SALES_OP) to the data warehouse system (target tables owned by SALES_DW) efficiently. The change data consists of

  • Data inserted (INSERT) in the source tables since the last extraction
  • Data modified (UPDATE) in the source tables since the last extraction
  • Data deleted (DELETE) from the source tables since the last extraction

In the following few sections, I will show how the change data is captured in the source system and made available to the target system.

1. Identify the Source Tables

The database user SALES_OP owns two tables, PRICE_LIST and SALES_TRAN, which contain the price list of the items sold and the transaction details of each sale, respectively.

SQL> DESC PRICE_LIST

 Name          Null?     Type
 ----------    -------   -------
 PRODUCT_ID    NOT NULL  VARCHAR2(20)
 UOM           NOT NULL  VARCHAR2(20)
 PRICE_PER_UOM NOT NULL  NUMBER(10,2)

SQL> DESC SALES_TRAN

 Name           Null?    Type
 ----------     -------  -------
 PRODUCT_ID     NOT NULL VARCHAR2(20)
 SALE_DATE_TIME NOT NULL TIMESTAMP(6)
 QUANTITY       NOT NULL NUMBER(10,2)

Every morning, the changes made in these two tables must be extracted and made available to the data warehouse application.

2. Set Up a Publisher

A publisher is a database user who creates and maintains the change tables. As mentioned earlier, the responsibility of the publisher is to identify and extract the change data and make it available to the subscribers. To be able to do that, the following two database roles must be granted to the publisher:

  • SELECT_CATALOG_ROLE

  • EXECUTE_CATALOG_ROLE

For simplicity, let the user SALES_OP assume the role of the publisher. Since SALES_OP owns the source table, it is easier to designate this user as the publisher. However, you can designate any other database user as the publisher. In situations where tables owned by multiple database users constitute the source tables, it is better to designate another user as the publisher. In such a situation, the publisher needs the SELECT privilege on the source tables.

3. Create Change Tables

A change table is required for each source table. The publisher uses the procedure DBMS_LOGMNR_CDC_PUBLISH .CREATE_CHANGE_TABLE to create change tables, as shown in Listing 1. In this example, the change tables corresponding to PRICE_LIST and SALES_TRAN are named CDC_PRICE_LIST and CDC_SALES_TRAN respectively.

The CREATE_CHANGE_TABLE procedure passes the name of the change set as an argument. A change set is a collection of change tables. In Listing 1, I have used the system-generated change set: SYNC_SET.

4. Set Up a Subscriber

In my example, the target system, SALES_DW, is the subscriber. To be able to subscribe to the change data, the subscriber needs the SELECT privilege on the source tables and the change tables, as shown:

connect sales_op/sales_op
grant select on PRICE_LIST to sales_dw;
grant select on SALES_TRAN to sales_dw;
grant select on CDC_PRICE_LIST 
  to sales_dw;
grant select on CDC_SALES_TRAN 
  to sales_dw;

5. Subscribe to the Source Tables, and Activate the Subscription

In a typical CDC environment there are several subscribers, and each subscriber needs to subscribe only to the tables he or she requires. Moreover, a subscriber may not want to see all the columns of the table to which he or she subscribes.

Before you can subscribe to the published data, you need to get a subscription handle. The subscription handle lets the subscriber manage the change tables and columns of interest. Within Oracle's CDC environment, a subscription is uniquely identified by the subscription handle, and no matter how many tables you are subscribing to, you need to create just one subscription handle. Use the DBMS_LOGMNR_ CDC_SUBSCRIBE.GET_SUBSCRIPTION HANDLE procedure to get the subscription handle. This procedure returns the value of the handle in a variable. Define a variable within your session to receive the value from the procedure, as follows:

variable sh_sales number;
EXECUTE - 
DBMS_LOGMNR_CDC_SUBSCRIBE
  .GET_SUBSCRIPTION_HANDLE( -
    CHANGE_SET=>'SYNC_SET', -
    DESCRIPTION=>'Change data from 
Sales Appln.', -
    SUBSCRIPTION_HANDLE=>:sh_sales);

Once you create the handle, you can subscribe to the change data you want. While subscribing to the change data, specify which tables and columns you want to subscribe to, as follows:

EXECUTE DBMS_LOGMNR_CDC_SUBSCRIBE
  .SUBSCRIBE( -
    SUBSCRIPTION_HANDLE=>:sh_sales, -
    SOURCE_SCHEMA=>'SALES_OP', -
    SOURCE_TABLE=>'PRICE_LIST', -

COLUMN_LIST=>'PRODUCT_ID,UOM,
PRICE_PER_UOM');

EXECUTE DBMS_LOGMNR_CDC_SUBSCRIBE
  .SUBSCRIBE( -
    SUBSCRIPTION_HANDLE=>:sh_sales, -
    SOURCE_SCHEMA=>'SALES_OP', -
    SOURCE_TABLE=>'SALES_TRAN', -
    COLUMN_LIST=>'PRODUCT_ID,SALE_DATE_TIME,
QUANTITY');

After you specify which tables and columns you want to subscribe to, you can activate your subscription. No matter how many tables you are subscribing to, you need to activate the subscription just once. Once a subscription is activated, you can no longer add any additional tables or columns to your subscription. To activate the subscription, call the DBMS_LOGMNR_CDC_SUBSCRIBE.ACTIVATE_ SUBSCRIPTION procedure and pass the subscription handle to it:

EXECUTE -
DBMS_LOGMNR_CDC_SUBSCRIBE
  .ACTIVATE_SUBSCRIPTION( -
    SUBSCRIPTION_HANDLE=>:sh_sales);

6. Set Up the CDC Window

Data keeps changing (inserted, updated, or deleted) in the source tables, regardless of the subscriptions on the change data. Therefore, the subscription is responsible for setting up (extending) a window—the CDC window—during which it wants to see the changes. The DBMS_LOGMNR_CDC_SUBSCRIBE.EXTEND_WINDOW procedure is used to set up the CDC window, as follows:

EXECUTE -
DBMS_LOGMNR_CDC_SUBSCRIBE.EXTEND_WINDOW( -
    SUBSCRIPTION_HANDLE=>:sh_sales); 

The changes done to the source table after you execute the EXTEND_WINDOW procedure will not be visible to the subscription until you purge the window and extend it again (more about this later).

7. Prepare a Subscriber View

Once you have set up the CDC window, you need to prepare a view for the subscriber so that the subscriber can see the change data. You need to prepare a view for each source table the subscriber is subscribing to. The procedure DBMS_LOGMNR_CDC_SUBSCRIBE.PREPARE_SUBSCRIBER_VIEW is used to prepare a subscriber view. You need to pass the subscription handle, source schema name, and source table name to this procedure, and it will return the name of the subscriber view in a variable. In this example, the subscriber view names for the source tables PRICE_LIST and SALES_TRAN are returned in variables sv_price_list and sv_sales_tran, respectively:

variable sv_price_list varchar2(30);

EXECUTE -
DBMS_LOGMNR_CDC_SUBSCRIBE
  .PREPARE_SUBSCRIBER_VIEW( -
    SUBSCRIPTION_HANDLE =>:sh_sales, -
    SOURCE_SCHEMA=>'SALES_OP', -
    SOURCE_TABLE=>'PRICE_LIST', -
    VIEW_NAME => :sv_price_list); 

variable sv_sales_tran varchar2(30);

EXECUTE -
DBMS_LOGMNR_CDC_SUBSCRIBE
  .PREPARE_SUBSCRIBER_VIEW( -
    SUBSCRIPTION_HANDLE =>:sh_sales, -
    SOURCE_SCHEMA=>'SALES_OP', -
    SOURCE_TABLE=>'SALES_TRAN', -
    VIEW_NAME => :sv_sales_tran); 

You can get the view names from these two variables. The view names are system-generated and generally contain a prefix of CDC and a number. For example:

SQL> select :sv_price_list from dual;

:SV_PRICE_LIST
----------------
CDC#CV$1430621

SQL> select :sv_sales_tran from dual;

:SV_SALES_TRAN
-----------------
CDC#CV$1430623

8. Access Data from the Change Tables

The subscriber view contains the change data you need. The subscriber view also contains some additional information to facilitate usage of the change data:

SQL> desc CDC#CV$1430621

 Name         Null?     Type
----------    -------   --------
 OPERATION$	        CHAR(2)	
 CSCN$                  NUMBER
 COMMIT_TIMESTAMP$      DATE
 TIMESTAMP$             DATE
 USERNAME$              VARCHAR2(30)
 PRICE_PER_UOM          NUMBER(10,2)
 PRODUCT_ID             VARCHAR2(20)
 UOM                    VARCHAR2(20)

For example, the column OPERATION$ indicates whether the change is an INSERT, an UPDATE, or a DELETE. The columns CSCN$, COMMIT_TIMESTAMP$, TIMESTAMP$, and USERNAME$ indicate who made the change in the source table and when the change was made. The other columns are the subscribed columns from the source table. You can use a SELECT statement to see the data from the subscriber view. For example:

SQL> select * from CDC#CV$1430621;

OP PRICE_PER_UOM   PRODUCT_ID   UOM
-- ------------    ------------ ----
I  2.99            P101         pound
I  4.99            P102         pound

9. Drop the Subscriber View, and Purge the CDC Window
Next Steps

READ more on Change Data Capture
Oracle9i Data Warehousing Guide Release 2 (9.2)
Oracle9i Supplied PL/SQL Packages and Types Reference Release 2 (9.2)
otn.oracle.com/documentation

When you have finished reading and extracting the change data and you no longer need the change data that is visible in the subscriber view, you need to drop the subscriber view and purge the CDC window so that you can set up (extend) a new window and see new change data. To drop an existing subscriber view, use the procedure DBMS_LOGMNR_CDC_SUBSCRIBE.DROP_SUBSCRIBER_VIEW. You need to drop each subscriber view for your sub-scription individually, before you can purge the old CDC window and set up a new window:

EXECUTE -
DBMS_LOGMNR_CDC_SUBSCRIBE
  .DROP_SUBSCRIBER_VIEW( -
    SUBSCRIPTION_HANDLE =>:sh_sales, -
    SOURCE_SCHEMA=>'SALES_OP', -
    SOURCE_TABLE=>'PRICE_LIST');

EXECUTE -
DBMS_LOGMNR_CDC_SUBSCRIBE
  .DROP_SUBSCRIBER_VIEW( -
    SUBSCRIPTION_HANDLE =>:sh_sales, -
    SOURCE_SCHEMA=>'SALES_OP', -
    SOURCE_TABLE=>'SALES_TRAN');

After you drop the subscriber views, you can purge the window. Use the DBMS_LOGMNR_CDC_SUBSCRIBE. PURGE_WINDOW procedure to purge the window:

EXECUTE -
DBMS_LOGMNR_CDC_SUBSCRIBE.PURGE_WINDOW( -
    SUBSCRIPTION_HANDLE=>:sh_sales);

10. Repeat Steps 6 through 9 to See New Data

To extract change data periodically, you need to repeat steps 6 through 9. If you extract change data on a daily basis, your daily (or, preferably, nightly) data-extraction job will include setting up the subscription window (EXTEND_WINDOW), preparing subscriber views (PREPARE_SUBSCRIBER_ VIEW), accessing data from the subscriber views, dropping subscriber views (DROP_SUBSCRIBER_VIEW), and purging the window (PURGE_WINDOW).

Summary of Steps

Steps 1 through 5 are configuration steps that need to be set up just once for a given publisher/subscriber scenario. Execute steps 6 through 9 periodically to incrementally extract the change data from the source system.

Sanjay Mishra (smishra_tech@yahoo.com) has coauthored three Oracle books: Oracle Parallel Processing, Oracle SQL Loader: The Definitive Guide, and Mastering Oracle SQL—all published by O'Reilly and Associates.

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy