Using the ETL Infrastructure Inside Oracle Database 10g

Purpose

This tutorial covers the Extraction, Transformation, and Loading (ETL) infrastructure of Oracle Database 10g.

Time to Complete

Approximately 1 hour

Topics

This tutorial covers the following topics:

	Overview
	Prerequisites
	Implement Schema Changes for the Sales History Schema
	Review the Multitable Insert
	Learn DML Error Logging Capabilities
	Experience the Basics of the Table Function
	Use Synchronous Change Data Capture (CDC) to Track and Consume Incremental Data Changes
	Propagate Information for a Data Mart
	Clean Up
	Summary

Viewing Screenshots

Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)

Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.

Overview

What Happens During the ETL Process?

ETL stands for extraction, transformation, and loading. During Extraction, the desired data has to be identified and extracted from many different sources, including database systems and applications. Often, it is not possible to identify the specific subset of interest, meaning that more than necessary data has to be extracted, and the identification of the relevant data is done at a later point in time. Depending on the source system’s capabilities (for example, OS resources), some transformations may take place during this extraction process. The size of the extracted data varies from hundreds of kilobytes to hundreds of gigabytes, depending on the source system and the business situation. The same is true for the time delta between two (logically) identical extractions: the time span may vary between days/hours and minutes to near real-time. For example, Web server log files can easily become hundreds of megabytes in a very short period of time.

After extracting data, it needs to be physically transported to the target system or an intermediate system for further processing. Depending on the chosen way of transportation, some transformations can be done during this process, too. For example, a SQL statement that directly accesses a remote target through a gateway, can concatenate two columns as part of the SELECT statement .

After extracting and transporting the data, the most challenging (and time consuming) part of ETL follows: Transformation and Loading into the target system. This may include:

	Application of complex filters
	Data has to be validated against information already existing in target database tables
	Data extracted without the knowledge of new versus changed information has to be checked against the target objects to determine whether it must be updated or inserted
	The same data has to be inserted several times as detail level and aggregated information

This should be done as quickly as possible in a scalable manner and must not affect the existing target with respect to concurrent access for information retrieval.

Oracle offers a wide variety of capabilities to address all the issues and tasks relevant in an ETL scenario. Oracle Database 10g is the ETL transformation engine.

Back to Topic List

Prerequisites

Before starting this tutorial, you should:

1.	Perform the Installing Oracle Database 10g on Windows tutorial.
2.	Download and unzip etl2.zip into your working directory (i.e. c:\wkdir).

Back to Topic List

Implement Schema Changes for the Sales History Schema

Before starting the tasks for this tutorial, you need to implement some changes on the existing Sales History schema. Additional objects are necessary, and additional system privileges must be granted to the user SH. The SQL file for applying those changes is modifySH_10gR2.sql .

Start a SQL *Plus session. Select Start > Programs > Oracle-OraDB10g_home > Application Development > SQL Plus.

(Note: This tutorial assumes you have an c:\wkdir folder. If you do not, you need to create one and unzip the contents of etl2.zip into this folder. While executing the scripts, paths are specified)

Run the modifySH_10gR2.sql script from your SQL*Plus session.

@c:\wkdir\modifySH_10gR2.sql

The bottom of your output should match the image below

Back to Topic List

Reviewing the Multitable Insert

MyCompany receives some nonrelational data structures from one of its partner companies, which then sells the products for a special advertisement campaign. The data structure is a denormalized, nonrelational record structure from proprietary mainframe systems consisting of one record per customer and product per week. Those data structures have to be inserted into the data warehouse. Because sales record data is stored per customer and product per day, you need to transform the incoming data.

As part of the transformation, the nonrelational denormalized data structure must be transformed from one record per week into seven records, each containing the information of a business day (first example business transformation). In addition, the data warehouse keeps track of all new customers with a credit limit above a certain limit. Track those customers separately.

In this section, you will implement those business transformations by leveraging Oracle’s new multitable insert capabilities. To do this, perform the following steps:

Use the Multitable Insert for Pivoting
Use the Multitable Insert for Conditional Insertion

Back to Topic List

Using the Multitable Insert for Pivoting

Show the execution plan for the new multitable insert. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\explain_mti_new.sql

DELETE
FROM PLAN_TABLE;

EXPLAIN PLAN FOR
INSERT ALL
INTO sales
VALUES(product_id, customer_id,weekly_start_date,2,9999, q_sun,sales_sun)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+1,2,9999, q_mon,sales_mon)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+2,2,9999, q_tue,sales_tue)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+3,2,9999, q_wed,sales_wed)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+4,2,9999, q_thu,sales_thu)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+5,2,9999, q_fri,sales_fri)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+6,2,9999, q_sat,sales_sat)
SELECT * FROM sales_input_table;

SET linesize 140
SELECT * from table(dbms_xplan.display);

Note: The input source table is scanned only once! The complexity of the denormalization is handled within the several INSERT INTO branches, thus avoiding multiple scans.

Now show the execution plan for the multi-table insert based on a UNION ALL set operation. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\explain_mti_old.sql

DELETE FROM PLAN_TABLE;
COMMIT;

EXPLAIN PLAN FOR
INSERT INTO sales
(prod_id, cust_id, time_id, channel_id,promo_id,amount_sold,quantity_sold)
SELECT product_id, customer_id,weekly_start_date,2,9999,sales_sun,q_sun
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+1,2,9999,sales_mon,q_mon
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+2,2,9999,sales_tue,q_tue
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+3,2,9999,sales_wed,q_wed
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+4,2,9999,sales_thu,q_thu
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+5,2,9999,sales_fri,q_fri
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+6,2,9999,sales_sat,q_sat
FROM sales_input_table;

SET linesize 140
SELECT * from table(dbms_xplan.display);

COMMIT;

Note: The input source table is scanned seven times! The complexity of the denormalization is handled within the several SELECT operations.

With an increasing number of input records, the superiority and the performance improvement of the new multi-table insert statement—by reducing the statement to only one SCAN—will become more and more obvious.

	Use the Multitable Insert for Pivoting
	Use the Multitable Insert for Conditional Insertion

1.	Create an intermediate table consisting of new information. From a SQLPlus session logged on to the SH schema, execute the following SQL script, which contains the SQL statements from steps 1-4: @c:\wkdir\mti2_prepare.sql* Rem create intermediate table with some records CREATE TABLE customers_new AS SELECT * FROM customers WHERE cust_id BETWEEN 2000 AND 5000;
2.	Disable constraints on the SALES table; this is necessary for step 3. ALTER TABLE sales DISABLE CONSTRAINT sales_customer_fk;
3.	Delete some data from the CUSTOMERS table. DELETE FROM customers WHERE cust_id BETWEEN 2000 AND 5000;
4.	Create an empty table for our special promotion information. CREATE TABLE customers_special AS SELECT cust_id, cust_credit_limit FROM customers WHERE rownum > 1;
5.	Issue the multitable insert into several tables with different table structures. From a SQLPlus session logged on to the SH schema, execute the following SQL script: @c:\wkdir\do_mti2.sql* INSERT /+ APPEND NOLOGGING /FIRST WHEN cust_credit_limit >= 4500 THEN INTO customers INTO customers_special VALUES (cust_id, cust_credit_limit) ELSE INTO customers SELECT * FROM customers_new;
6.	You can see what was inserted. Execute the following SQL script: @c:\wkdir\control_mti2.sql SELECT COUNT() FROM customers; SELECT COUNT() FROM customers_special; SELECT MIN(cust_credit_limit) FROM customers_special;
7.	Before continuing, reset your environment by executing the following SQL script: @c:\wkdir\reset_mti2.sql set echo on REM cleanup and reset ALTER TABLE sales MODIFY CONSTRAINT sales_customer_fk RELY ENABLE NOVALIDATE; DROP TABLE customers_special; DROP TABLE customers_new; COMMIT;

1.	Create an external table (and directories) for the external products information.
2.	Perform an Upsert using the SQL MERGE command.
3.	Show the execution plan of the MERGE command.
4.	Perform an Upsert using two separate SQL commands (without MERGE functionality).

1.	Create an external table, representing data from a source system with an obvious poor data quality. Run the script cr_ext_tab_for_elt.sql. @c:\wkdir\cr_ext_tab_for_elt.sql PROMPT Create an external table pointing to a data set with poor quality Rem DROP TABLE sales_activity_direct; CREATE TABLE sales_activity_direct ( PROD_ID NUMBER, CUST_ID NUMBER, TIME_ID CHAR(20), CHANNEL_ID CHAR(2), PROMO_ID NUMBER, QUANTITY_SOLD NUMBER(3), AMOUNT_SOLD CHAR(50) ) ORGANIZATION external ( TYPE oracle_loader DEFAULT DIRECTORY data_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII BADFILE log_dir:'sh_sales2.bad' LOGFILE log_dir:'sh_sales2.log_xt' FIELDS TERMINATED BY "\|" LDRTRIM ) location ( 'sales_activity_direct.dat' ) )REJECT LIMIT UNLIMITED NOPARALLEL;
2.	Create a dummy table to avoid interfering with existing tables. Run the script cr_tab_for_elt.sql. @c:\wkdir\cr_tab_for_elt.sql PROMPT create a second sales table to being used by the insert DROP TABLE sales_overall; CREATE TABLE sales_overall as select * from sales where 1=0;
3.	To track errors with the new DML error logging functionality, you need an error logging table. It is highly recommended that you use the provided package DBMS_ERRLOG. Run the script cr_elt.sql. @c:\wkdir\cr_elt.sql PROMPT Create the DML error logging table with DEFAULT name PROMPT It is highly advised to use the ORCL-provided pkg to create the PROMPT DML Error Logging table Rem DROP TABLE err$_sales_overall; exec dbms_errlog.create_error_log('sales_overall'); Note the mandatory control columns and the different data types for the columns to being tracked. The data types have to be a superset of the target data types to enable a proper tracking of errors, e.g. A non-number value for a number target column. SET LINESIZE 60 DESCRIBE err$_sales_overall
4.	Try to load the data residing in the external file into the target table. The default behavior of the error logging functionality is set to a REJECT LIMIT of zero. In the case of an error, the DML operation will fail and the first record raising the error is stored in the error logging table. Run the script ins_elt_1.sql to see this behavior. @c:\wkdir\ins_elt_1.sql PROMPT First insert attempt , DEFAULT reject limit 0 PROMPT also, the error message that comes back is the one of error #1 INSERT /+ APPEND NOLOGGING PARALLEL / INTO sales_overall SELECT * FROM sales_activity_direct LOG ERRORS INTO err$_sales_overall ( 'load_test1' ); commit; PROMPT As you can see, nothing is inserted in the target table, but ONE row in the error logging table select count() from sales_overall; select count() from err$_sales_overall; delete from err$_sales_overall; commit; From a DML perspective, you want to ensure that a transaction is either successful or not. Hence, from a generic perspective, you either want to have the DML operation succeeding no matter what (which would translate into a reject limit unlimited) or you want to have it failing when an error occurs (reject limit 0). The decision was made to set the default reject limit to zero, because any arbitrary number chosen is somewhat meaningless; If you decide to tolerate a specific number of errors, it is a pure business decision how many errors might be tolerable in a specific situation.
5.	Try the insert again with a REJECT LIMIT of 10 records. If more than 10 errors occur, the DML operation will fail, and you will find 11 records in the error logging table. Run the script ins_elt_2.sql to see this behavior. @c:\wkdir\ins_elt_2.sql SET ECHO OFF PROMPT First insert attempt , DEFAULT reject limit 10 PROMPT Note that the error message that comes back is the one of error #11 INSERT /+ APPEND NOLOGGING PARALLEL / INTO sales_overall SELECT * FROM sales_activity_direct LOG ERRORS INTO err$_sales_overall ( 'load_test2' ) REJECT LIMIT 10; commit; PROMPT no rows in target; error count+1 in DML error logging table select count() from sales_overall; select count() from err$_sales_overall; delete from err$_sales_overall; commit; There are more than 10 errors for this insert, meaning the quality of this data is poor.
6.	Put the data into the table and figure out what errors your are encountering. Run script ins_elt_3.sql. @c:\wkdir\ins_elt_3.sql PROMPT Reject limit unlimited will succeed Rem ... as long as you do not run into one of the current limitations ... INSERT /+ APPEND NOLOGGING PARALLEL / INTO sales_overall SELECT * FROM sales_activity_direct LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT UNLIMITED; commit; PROMPT finally ... select count() from sales_overall; select count() from err$_sales_overall;
7.	There are quite a large number of errors! Look into the error logging table to have a better level understanding of the errors that have occurred. Run script sel_elt_1.sql. @c:\wkdir\sel_elt_1.sql PROMPT Please recognize the subtle difference between ERROR MESSAGE ONLY and - ERROR MESSAGE TEXT Rem Therefore we enforce to store both set linesize 80 select distinct ora_err_number$ from err$_sales_overall; select distinct ora_err_number$, ora_err_mesg$ from err$_sales_overall; Please note the subtle difference between Oracle error message and Oracle error text. In many cases, the error text provides additional information that helps analyzing the problem, thus the error message and the error text are included as mandatory control columns to an error logging table.
8.	As of today, there are some limitations around the error logging capability. All limitations have to deal with situations where an index maintenance is done in a delayed optimization. The limitations will be addressed in a future release of Oracle. Run the script elt_limit.sql. @c:\wkdir\elt_limit.sql PROMPT discuss a case with limitation truncate table sales_overall; alter table sales_overall add constraint pk_1 primary key (prod_id, cust_id,time_id, channel_id, promo_id); PROMPT works fine as before, errors get re-routed INSERT /+ APPEND NOLOGGING PARALLEL / INTO sales_overall SELECT * FROM sales_activity_direct LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT UNLIMITED; commit; select count() from sales_overall; select count() from err$_sales_overall; delete from err$_sales_overall; commit; PROMPT case when deferred constraint check (UNIQUE INDEX) leads to error Rem unique index maintenance is a delayed operation that cannot be caught Rem on a per record base as of today. Planned for a future release. INSERT /+ APPEND NOLOGGING PARALLEL / INTO sales_overall SELECT * FROM sales_activity_direct LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT UNLIMITED; commit; select count() from sales_overall; select count() from err$_sales_overall; commit;
9.	DML Error logging is taking care of any kind of error that happens during the DML operation. However, any kind of error that happens as part of the SQL statement in the execution plan BEFORE the DML operation cannot be caught. Consider the following DML operation in the script ins_elt_4.sql. @c:\wkdir\ins_elt_4.sql PROMPT INTERESTING CASE. PROMPT The DML Error Logging can only catch errors that happen at DML TIME, but not errors PROMT that happen as part of the SQL statement Rem the TO_DATE conversion is in the insert portion, so that we catch the error Rem this is default view merging of ORCL alter table sales_overall drop constraint pk_1; truncate table sales_overall; INSERT /+ APPEND NOLOGGING PARALLEL / INTO sales_overall SELECT prod_id, cust_id, TO_DATE(time_id,'DD-mon-yyyy'), channel_id, promo_id, quantity_sold, amount_sold FROM sales_activity_direct LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT UNLIMITED; commit Note that the SELECT statement is applying a TO_DATE() function on column sales.activity_direct.time_id. Oracle internally optimizes the error handling by pushing up this conversion function to the insert operation to ensure that any potential error will be caught. As you experienced earlier, the DML operation succeeds. When you look into the plan, you will realize what kind of optimization for the error handling is taking place. Run the script xins_elt_4.sql. @c:\wkdir\xins_elt_4.sql SET LINESIZE 140 explain plan for INSERT /+ APPEND NOLOGGING PARALLEL / INTO sales_overall SELECT * FROM ( SELECT prod_id, cust_id, TO_DATE(time_id,'DD-mon-yyyy'), channel_id, promo_id, quantity_sold, amount_sold FROM sales_activity_direct ) LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT UNLIMITED; select * from table(dbms_xplan.display(null,null,'ALL')); The plan shows that the TO_DATE() function is pushed up to the insert operation, represented as a 'LOAD AS SELECT' row source (it is a direct path insertion)
10.	Run the same DML operation again, but enforce that the TO_DATE() conversion cannot be pushed to the DML operation. By using a view construct with a NO_MERGE hint, you can accomplish this. The equivalent SQL SELECT statement itself fails, as you can see by running the script sel_ins_elt_5.sql. @c:\wkdir\sel_ins_elt_5.sql PROMPT The equivalent SQL statement will fail if the predicate is artificially kept - within a lower query block SELECT prod_id, cust_id, TO_DATE(time_id,'DD-mon-yyyy'), channel_id, promo_id, quantity_sold, amount_sold FROM sales_activity_direct; The DML operation will fail with the same error. Run the script ins_elt_5.sql. @c:\wkdir\ins_elt_5.sql PROMPT The same is true when errors are happening on a row source level inside the - SELECT block PROMPT We cannot catch such an error truncate table sales_overall; INSERT /+ APPEND NOLOGGING PARALLEL / INTO sales_overall SELECT * FROM ( SELECT /+ NO_MERGE / prod_id, cust_id, TO_DATE(time_id,'DD-mon-yyyy'), channel_id, promo_id, quantity_sold, amount_sold FROM sales_activity_direct ) LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT UNLIMITED; commit; And the plan shows that the conversion is taking place inside the view, thus the error. Run the script xins_elt_5.sql to view this behavior: @c:\wkdir\xins_elt_5.sql explain plan for INSERT /+ APPEND NOLOGGING PARALLEL / INTO sales_overall SELECT * FROM ( SELECT /+ NO_MERGE / prod_id, cust_id, TO_DATE(time_id,'DD-mon-yyyy'), channel_id, promo_id, quantity_sold, amount_sold FROM sales_activity_direct ) LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT UNLIMITED; select * from table(dbms_xplan.display(null,null,'ALL'));

1.	Set up the basic objects for a table function.
2.	Perform a nonpipelined table function, returning an array of records.
3.	Perform a pipelined table function.
4.	Perform transparent parallel execution of table functions.
5.	Perform a table function with autonomous DML.
6.	Perform seamless streaming through several table functions.

1.	Define the object (record) type for the examples. From a SQLPlus session logged on to the SH schema, execute the following SQL script: @c:\wkdir\setup_tf_record.sql* SET ECHO ON PROMPT object_types CREATE TYPE product_t AS OBJECT ( prod_id NUMBER(6) , prod_name VARCHAR2(50) , prod_desc VARCHAR2(4000) , prod_subcategory VARCHAR2(50) , prod_subcategory_desc VARCHAR2(2000) , prod_category VARCHAR2(50) , prod_category_desc VARCHAR2(2000) , prod_weight_class NUMBER(2) , prod_unit_of_measure VARCHAR2(20) , prod_pack_size VARCHAR2(30) , supplier_id NUMBER(6) , prod_status VARCHAR2(20) , prod_list_price NUMBER(8,2) , prod_min_price NUMBER(8,2) ); This is the basic object record type you will use in the examples.
2.	Define the object (collection) type for the examples. From a SQLPlus session logged on to the SH schema, execute the following SQL script: @c:\wkdir\setup_tf_collection.sql* CREATE TYPE product_t_table AS TABLE OF product_t; / This type represents the structure of the set of records, which will be delivered back by your table functions.
3.	Define a package for the REF CURSOR types. From a SQLPlus session logged on to the SH schema, execute the following SQL script: @c:\wkdir\setup_tf_package.sql* Rem package of all cursor types Rem handle the input cursor type and the output cursor collection type CREATE OR REPLACE PACKAGE cursor_PKG as TYPE product_t_rec IS RECORD (prod_id NUMBER(6) , prod_name VARCHAR2(50) , prod_desc VARCHAR2(4000) , prod_subcategory VARCHAR2(50) , prod_subcategory_desc VARCHAR2(2000) , prod_category VARCHAR2(50) , prod_category_desc VARCHAR2(2000) , prod_weight_class NUMBER(2) , prod_unit_of_measure VARCHAR2(20) , prod_pack_size VARCHAR2(30) , supplier_id NUMBER(6) , prod_status VARCHAR2(20) , prod_list_price NUMBER(8,2) , prod_min_price NUMBER(8,2)); TYPE product_t_rectab IS TABLE OF product_t_rec; TYPE strong_refcur_t IS REF CURSOR RETURN product_t_rec; TYPE refcur_t IS REF CURSOR; END;
4.	Create a log table for the table function example with autonomous DML. From a SQLPlus session logged on to the SH schema, execute the following SQL script: @c:\wkdir\cre_ope_table.sql* CREATE TABLE obsolete_products_errors (prod_id NUMBER, msg VARCHAR2(2000)); This table will be used for the table function example that fans out some data as part of its execution. Due to time constraints, only the PL/SQL implementations of table functions are covered here. For more information, see the ‘Data Cartridge Developer’s Guide’ for details of interface implementations.

	Determines the relational tables (called source tables) from which the data warehouse application is interested in capturing change data.
	Uses the Oracle supplied package, DBMS_LOGMNR_CDC_PUBLISH, to set up the system to capture data from one or more source tables.
	Publishes the change data in the form of change tables.
	Allows controlled access to subscribers by using the SQL GRANT and REVOKE statements to grant and revoke the SELECT privilege on change tables for users and roles.

	Use the Oracle supplied package, DBMS_LOGMNR_CDC_SUBSCRIBE, to subscribe to source tables for controlled access to the published change data for analysis.
	Extend the subscription window and create a new subscriber view when the subscriber is ready to receive a set of change data.
	Use SELECT statements to retrieve change data from the subscriber views.
	Drop the subscriber view and purge the subscription window when finished processing a block of changes.
	Drop the subscription when the subscriber no longer needs its change data.

1.	Use synchronous CDC to track all the incremental changes.
2.	Create a change table.
3.	Subscribe to a change set and to all source table columns of interest.
4.	Activate a subscription and extend the subscription window.
5.	Investigate how to handle the new environment over time.
6.	Run the publisher.
7.	Drop the used change view and purge the subscription window.
8.	Clean up the CDC environment.

	On the publisher site you prepared a source system for CDC.
	On the subscriber site, you identified and subscribed to all source tables (and columns) of interest and defined/created our Change View.

1.	Activate and extend a subscription window. You will also have a look into the metadata. @c:\wkdir\sub_cdc1.sql rem now activate the subscription since we are ready to receive rem change data the ACTIVATE_SUBSCRIPTION procedure sets rem subscription window to empty initially rem At this point, no additional source tables can be added to the rem subscription EXEC DBMS_LOGMNR_CDC_SUBSCRIBE.ACTIVATE_SUBSCRIPTION - (SUBSCRIPTION_name => 'my_subscription_no_1') rem now recheck the subscriptions and see that it is active rem the view is still not created ... SELECT handle, description, decode(status,'N','Not activated.','Activated.') status, earliest_scn, latest_scn, decode(last_purged,null,'Never purged.', to_char(last_purged,'dd-mon-yyyy hh24:mi:ss')) last_purged, decode(last_extended, null,'Never extended.', to_char(last_extended,'dd-mon-yyyy hh24:mi:ss')) last_extended from user_subscriptions;
2.	Any changes you apply now to the source table PRODUCTS is reflected in the change table. The changes are transparently maintained with triggers on the source table. @c:\wkdir\dml_cdc1.sql Rem now do some changes UPDATE products SET prod_list_price=prod_list_price1.1 WHERE prod_min_price > 100; COMMIT; Rem you will see entries in the change table Rem note that we have entries for the old and the new values SELECT Count() FROM prod_price_ct; Note that you have two records for each source row, one representing the old, and the other the new value set for the subscribed columns. Again, please never use the change table itself for identifying the changes on a source table. Use the subscriber view that will be created for you. This is the only supported way to guarantee that every change is delivered only once to a subscribing application.
3.	Unlike the change table, the change view does not show any records yet. This is because you applied the DML operation AFTER you extended the subscription window. Consequently you must not see anything yet. Rem the view does not show anything select count() from MY_PROD_PRICE_CHANGE_VIEW; You are now preparing for consuming the changes by extending the subscription window. @c:\wkdir\ext_cdc_sub1.sql* rem now set upper boundary (high-water mark) for the subscription window rem At this point, the subscriber has created a new window that begins rem where the previous window ends. rem The new window contains any data that was added to the change table. EXEC DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW - (SUBSCRIPTION_NAME => 'my_subscription_no_1'); rem now recheck the subscriptions and see that it is active SELECT handle, description, decode(status,'N','Not activated.','Activated.') status, earliest_scn, latest_scn, decode(last_purged,null,'Never purged.', to_char(last_purged,'dd-mon-yyyy hh24:mi:ss')) last_purged, decode(last_extended, null,'Never extended.', to_char(last_extended,'dd-mon-yyyy hh24:mi:ss')) last_extended from user_subscriptions; Rem ... and you will see data SELECT count(*) FROM my_prod_price_change_view;
4.	You can now select from this system-generated change view, and you will see only the changes that have happened for your time window. @c:\wkdir\sel_cdc_cv1.sql Rem changes classified for specific product groups SELECT p1.prod_id, p2.prod_category, p1.prod_min_price, p1.prod_list_price, commit_timestamp$ FROM my_prod_price_change_view p1, products p2 WHERE p1.prod_id=p2.prod_id AND operation$='UN'; PROMPT and especially the Electronics' ones - 3 records only SELECT p1.prod_id, p2.prod_category, p1.prod_min_price, p1.prod_list_price, commit_timestamp$ FROM my_prod_price_change_view p1, products p2 WHERE p1.prod_id=p2.prod_id AND operation$='UN' AND p2.prod_category='Electronics'; In the example shown above, you are joining back to the source table PRODUCTS. When you set up a CDC environment, please ensure that such an operation should not be necessary; if possible, a change table should be usable as stand-alone source for synchronizing your target environment. Joining back to the source table may reduce the data stored on disk, but it creates additional workload on the source site. Join-back cannot be accomplished on a different system than the source system.
5.	You can now consume the changes on your target system. @c:\wkdir\consume_cdc1.sql Prompt ... you can now consume the changes rem arbitrary example, where we only track ELECTRONICS changes. This shall demonstrate rem the flexibility as well as the responsibility of the client site (consumer) to deal rem appropriately with the changes INSERT into my_price_change_electronics SELECT p1.prod_id, p1.prod_min_price, p1.prod_list_price, commit_timestamp$ FROM my_prod_price_change_view p1, products p2 WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics' AND operation$='UN'; COMMIT; SELECT prod_id, prod_min_price, prod_list_price, to_char(when,'dd-mon-yyyy hh24:mi:ss') FROM my_price_change_electronics;

1.	Other DML operations take place on the source table PRODUCTS. @c:\wkdir\dml_cdc2.sql PROMPT other changes will happen UPDATE products SET prod_min_price=prod_min_price*1.1 WHERE prod_min_price < 10; COMMIT;
2.	The synchronous CDC framework tracks these DML operations transparently in the change table. @c:\wkdir\show_cdc_ct2.sql SELECT count() FROM prod_price_ct ; PROMPT and especially the ELECTRONICS' ones SELECT COUNT() FROM prod_price_ct p1, products p2 WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';
3.	Run the following script to view the product data. @c:\wkdir\sel_cdc_cv1.sql Rem changes classified for specific product groups SELECT p1.prod_id, p2.prod_category, p1.prod_min_price, p1.prod_list_price, commit_timestamp$ FROM my_prod_price_change_view p1, products p2 WHERE p1.prod_id=p2.prod_id AND operation$='UN'; PROMPT and especially the Electronics' ones - 3 records only SELECT p1.prod_id, p2.prod_category, p1.prod_min_price, p1.prod_list_price, commit_timestamp$ FROM my_prod_price_change_view p1, products p2 WHERE p1.prod_id=p2.prod_id AND operation$='UN' AND p2.prod_category='Electronics';
4.	To tell the CDC framework that you are done with a set of changes, you have to purge and extend the subscription. @c:\wkdir\purge_cdc_sub_window1.sql PROMPT purge old data from the subscription window Rem this will NOT delete any records from the change table. It just tells the CDC system Rem that the changed data used so fast is no longer needed EXEC DBMS_CDC_SUBSCRIBE.PURGE_WINDOW - (SUBSCRIPTION_name => 'my_subscription_no_1'); PROMPT still the same number of records in the change table PROMPT REMEMBER THE NUMBER OF ROWS !!! SELECT COUNT() FROM prod_price_ct; PROMPT ... change view is empty SELECT COUNT() FROM my_prod_price_change_view p1, products p2 WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';
5.	To get the new changes reflected in the change view, you have to extend the time window for the change view. @c:\wkdir\ext_cdc_sub_window1.sql PROMPT let's get the new change Rem 'do it again Sam' Rem first extend the window you want to see EXEC DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW - (SUBSCRIPTION_name => 'my_subscription_no_1'); Rem ... now you will see exactly the new changed data since the last consumption SELECT COUNT(*) FROM my_prod_price_change_view p1, products p2 WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics'; The new change view shows exactly the changes that were not consumed yet. This is different than the content in the change table.
6.	Now consume the new changes. Because the incremental changes you are interested in are stored in a change table (a "normal" Oracle table), you can use any language or SQL construct the database supports. @c:\wkdir\consume_cdc2.sql PROMPT the new changes will be used MERGE INTO my_price_change_Electronics t USING ( SELECT p1.prod_id, p1.prod_min_price, p1.prod_list_price, commit_timestamp$ FROM my_prod_price_change_view p1, products p2 WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics' AND operation$='UN') cv ON (cv.prod_id = t.prod_id) WHEN MATCHED THEN UPDATE SET t.prod_min_price=cv.prod_min_price, t.prod_list_price=cv.prod_list_price WHEN NOT MATCHED THEN INSERT VALUES (cv.prod_id, cv.prod_min_price, cv.prod_list_price, commit_timestamp$); COMMIT; rem look at them SELECT prod_id, prod_min_price, prod_list_price, to_char(when,'dd-mon-yyyy hh24:mi:ss') FROM my_price_change_electronics;

1.	Enable a RESUMABLE session.
2.	Create a new tablespace as a potential transportable tablespace.
3.	Create a LIST partitioned table in the new tablespace.
4.	Leverage the new RESUMABLE statement capabilities for efficient error detection and handling.
5.	Create a new RANGE_LIST partitioned table (this is Oracle9i Release 2 functionality).
6.	Prepare the metadata export for a transportable tablespace.

1.	From a SQLPlus session logged on to the SH schema, run the following SQL statements into your SQLPlus session: @c:\wkdir\create_new_range_list.sql CREATE TABLE sales_rlp COMPRESS TABLESPACE MY_OBE_TRANSFER PARTITION BY RANGE (time_id) SUBPARTITION BY LIST (channel_id) SUBPARTITION TEMPLATE ( SUBPARTITION direct values (3), SUBPARTITION internet values (4), SUBPARTITION partner values (2), SUBPARTITION other values (DEFAULT) ) (PARTITION SALES_before_1999 VALUES LESS THAN (TO_DATE('01-JAN-1999','DD-MON-YYYY')), PARTITION SALES_Q1_1999 VALUES LESS THAN (TO_DATE('01-APR-1999','DD-MON-YYYY')), PARTITION SALES_Q2_1999 VALUES LESS THAN (TO_DATE('01-JUL-1999','DD-MON-YYYY')), PARTITION SALES_Q3_1999 VALUES LESS THAN (TO_DATE('01-OCT-1999','DD-MON-YYYY')), PARTITION SALES_Q4_1999 VALUES LESS THAN (TO_DATE('01-JAN-2000','DD-MON-YYYY')), PARTITION SALES_Q1_2000 VALUES LESS THAN (TO_DATE('01-APR-2000','DD-MON-YYYY')), PARTITION SALES_Q2_2000 VALUES LESS THAN (TO_DATE('01-JUL-2000','DD-MON-YYYY')), PARTITION SALES_Q3_2000 VALUES LESS THAN (TO_DATE('01-OCT-2000','DD-MON-YYYY')), PARTITION SALES_Q4_2000 VALUES LESS THAN (MAXVALUE) NOCOMPRESS) AS SELECT * FROM sales sample(10); This example also shows how you can easily create compressed and uncompressed partitions as part of an initial table creation. The partition and template names are inherited by all the subpartitions, thus making the naming and identification of subpartitions much easier and more convenient.
2.	From a SQLPlus session logged on to the SH schema, run the following SQL statements into your SQLPlus session: @c:\wkdir\show_range_list_names.sql PROMPT as you can see, the partition and template name is inherited to all the subpartitions, PROMPT thus making the naming - and identification - of subpartitions much easier ... select partition_name, subpartition_name, high_value from user_tab_subpartitions where table_name='SALES_RLP'; A List and Range-List partitioned table group sets distinct unrelated values together in one partition. Consequently, all values not being covered with an existing partition definition will raise an Oracle error message. We will now create a DEFAULT List Partition for the previously created List partitioned table sales_prod_dept, a kind of catch-all partition for all undefined partition key values.
3.	From a SQLPlus session logged on to the SH schema, run the following SQL statements into your SQLPlus session: @c:\wkdir\cr_default_list_part.sql PROMPT add a new partition to sales_prod_dept that does NOT have a DEFAULT partition Rem it is added just like for a range partitioned table ALTER TABLE sales_prod_dept ADD PARTITION gameboy_sales VALUES ('Gameboy'); PROMPT now add another one, covering the DEFAULT value ALTER TABLE sales_prod_dept ADD PARTITION all_other_sales VALUES (DEFAULT); PROMPT view the data dictionary select partition_name, high_value from user_tab_partitions where table_name='SALES_PROD_DEPT'; All records having an undefined partitioning key will be stored in this new partition other_sales. However, having a DEFAULT partition changes the way you’re adding new partitions to this table. Conceptually, you cover all possible values for the partitioning key—the defined ones and "all others"—as soon as you have created a default partition. To ADD a new partition to this table, you logically have to SPLIT the default partition into a new partition with a set of defined keys and a new default partition, still covering "all other values" (now reduced by the keys we have specified for the new partition). Any attempt to add a new partition like you did before will raise an ORA error message.
4.	From a SQLPlus session logged on to the SH schema, run the following SQL statements into your SQLPlus session: @c:\wkdir\split_default_list_part.sql PROMPT Unlike the first time, we cannot simply add a new partition PROMPT raises ORA-14323: cannot add partition when DEFAULT partition exists Rem we cannot be sure whether the new value already exists in the DEFAULT partition ALTER TABLE sales_prod_dept ADD PARTITION undefined_sales VALUES ('Undefined'); This will raise an Oracle error. Run the following commands or the script split_default_list_part_b.sql @c:\wkdir\split_default_list_part_b.sql PROMPT so we have to SPLIT the default partition to ensure that PROMPT all potential values of 'Undefined' are in the new partition ALTER TABLE sales_prod_dept SPLIT PARTITION other_sales VALUES ('Undefined') INTO (PARTITION undefined_sales, PARTITION other_sales); PROMPT control the data dictionary PROMPT Note that without specifying any tablespace, the default PROMPT tablespace of the partitioned table is used select partition_name, tablespace_name, high_value from user_tab_partitions where table_name='SALES_PROD_DEPT';

1.	From a SQLPlus session logged on to the SH schema, execute the following SQL script: @c:\wkdir\make_ts_ro.sql* ALTER TABLESPACE my_obe_transfer READ ONLY; This guarantees that no further changes can happen on the data in the tablespace.
2.	You can now export the data dictionary information of tablespace my_obe_transfer as follows: @c:\wkdir\export_metadata.sql Optionally, you can create a specific directory where you want to store the export dump file: CREATE DIRECTORY my_obe_dump_dir as 'c:\wkdir'; Now you can export the metadata only. Please compare the file size with the tablespace size to get a feeling what it would have meant to extract all the data and not only the metadata. expdp \'/ as sysdba\' DIRECTORY=my_obe_dump_dir DUMPFILE= meta_MY_OBE_TRANSFER.dmp Note: This script may take several minutes to complete running. Note: The same export syntax can be used at the DOS prompt by dropping the SQL*Plus "host" command.
3.	Before you import the tablespace information into another database, you will want to clean up and reset the session status by executing the SQL script: @c:\wkdir\cleanup_tts.sql DROP TABLESPACE transfer INCLUDING CONTENTS AND DATAFILES; ALTER SESSION DISABLE RESUMABLE;
4.	Trying to access table sales_prod_dept raises an error. The table does not exist any longer. @c:\wkdir\sel_spd.sql SELECT count(*) from sales_prod_dept;

	Perform a multitable insert
	Perform an upsert
	Use a table function
	Use synchronous CDC to capture and consume incremental source changes
	Propagate from a data warehouse to a data mart

Purpose

Topics

Viewing Screenshots

Overview

Implement Schema Changes for the Sales History Schema

Reviewing the Multitable Insert

Use the Multitable Insert for Pivoting Use the Multitable Insert for Conditional Insertion

Using the Multitable Insert for Conditional Insertion

Using the Upsert Functionality, SQL MERGE Keyword Overview

1. Create an external table (and directories) for the external products information. 2. Perform an Upsert using the SQL MERGE command. 3. Show the execution plan of the MERGE command. 4. Perform an Upsert using two separate SQL commands (without MERGE functionality).

Learn DML Error Logging Capabilities

Experience the Basics of the Table Function

Using Synchronous Change Data Capture (CDC) to Capture and Consume Incremental Source Changes

Propagate Information for a Data Mart

Cleanup

Summary

Use the Multitable Insert for Pivoting
Use the Multitable Insert for Conditional Insertion

1. Create an external table (and directories) for the external products information.
2. Perform an Upsert using the SQL MERGE command.
3. Show the execution plan of the MERGE command.
4. Perform an Upsert using two separate SQL commands (without MERGE functionality).