Using the ETL Infrastructure Inside Oracle Database 10g

This tutorial covers the Extraction, Transformation, and Loading (ETL) infrastructure of Oracle Database 10g.

Approximately 1 hour

Topics

This tutorial covers the following topics:

Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)

Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.

Overview

What Happens During the ETL Process?

ETL stands for extraction, transformation, and loading. During Extraction, the desired data has to be identified and extracted from many different sources, including database systems and applications. Often, it is not possible to identify the specific subset of interest, meaning that more than necessary data has to be extracted, and the identification of the relevant data is done at a later point in time. Depending on the source system’s capabilities (for example, OS resources), some transformations may take place during this extraction process. The size of the extracted data varies from hundreds of kilobytes to hundreds of gigabytes, depending on the source system and the business situation. The same is true for the time delta between two (logically) identical extractions: the time span may vary between days/hours and minutes to near real-time. For example, Web server log files can easily become hundreds of megabytes in a very short period of time.

After extracting data, it needs to be physically transported to the target system or an intermediate system for further processing. Depending on the chosen way of transportation, some transformations can be done during this process, too. For example, a SQL statement that directly accesses a remote target through a gateway, can concatenate two columns as part of the SELECT statement .

After extracting and transporting the data, the most challenging (and time consuming) part of ETL follows: Transformation and Loading into the target system. This may include:

Application of complex filters

Data has to be validated against information already existing in target database tables

Data extracted without the knowledge of new versus changed information has to be checked against the target objects to determine whether it must be updated or inserted
The same data has to be inserted several times as detail level and aggregated information

This should be done as quickly as possible in a scalable manner and must not affect the existing target with respect to concurrent access for information retrieval.

Oracle offers a wide variety of capabilities to address all the issues and tasks relevant in an ETL scenario. Oracle Database 10g is the ETL transformation engine.

Back to Topic List

Prerequisites

Before starting this tutorial, you should:

1.

Perform the Installing Oracle Database 10g on Windows tutorial.

2.

Download and unzip etl2.zip into your working directory (i.e. c:\wkdir).

Back to Topic List

Implement Schema Changes for the Sales History Schema

Before starting the tasks for this tutorial, you need to implement some changes on the existing Sales History schema. Additional objects are necessary, and additional system privileges must be granted to the user SH. The SQL file for applying those changes is modifySH_10gR2.sql .

1.

Start a SQL *Plus session. Select Start > Programs > Oracle-OraDB10g_home > Application Development > SQL Plus.

(Note: This tutorial assumes you have an c:\wkdir folder. If you do not, you need to create one and unzip the contents of etl2.zip into this folder. While executing the scripts, paths are specified)

 

2.

Log in as the SH user. Enter SH as the User Name and SH as the Password. Then click OK.

 

3.

Run the modifySH_10gR2.sql script from your SQL*Plus session.

@c:\wkdir\modifySH_10gR2.sql

The bottom of your output should match the image below

 

 

Back to Topic List

Reviewing the Multitable Insert

MyCompany receives some nonrelational data structures from one of its partner companies, which then sells the products for a special advertisement campaign. The data structure is a denormalized, nonrelational record structure from proprietary mainframe systems consisting of one record per customer and product per week. Those data structures have to be inserted into the data warehouse. Because sales record data is stored per customer and product per day, you need to transform the incoming data.

As part of the transformation, the nonrelational denormalized data structure must be transformed from one record per week into seven records, each containing the information of a business day (first example business transformation). In addition, the data warehouse keeps track of all new customers with a credit limit above a certain limit. Track those customers separately.

In this section, you will implement those business transformations by leveraging Oracle’s new multitable insert capabilities. To do this, perform the following steps:

Back to Topic List

Using the Multitable Insert for Pivoting

1.

Show the execution plan for the new multitable insert. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\explain_mti_new.sql

DELETE
FROM PLAN_TABLE;
EXPLAIN PLAN FOR
INSERT ALL
INTO sales
VALUES(product_id, customer_id,weekly_start_date,2,9999, q_sun,sales_sun)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+1,2,9999, q_mon,sales_mon)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+2,2,9999, q_tue,sales_tue)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+3,2,9999, q_wed,sales_wed)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+4,2,9999, q_thu,sales_thu)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+5,2,9999, q_fri,sales_fri)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+6,2,9999, q_sat,sales_sat)
SELECT * FROM sales_input_table;
SET linesize 140
SELECT * from table(dbms_xplan.display);

Note: The input source table is scanned only once! The complexity of the denormalization is handled within the several INSERT INTO branches, thus avoiding multiple scans.

 

2.

Now show the execution plan for the multi-table insert based on a UNION ALL set operation. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\explain_mti_old.sql

DELETE FROM PLAN_TABLE;
COMMIT;
EXPLAIN PLAN FOR
INSERT INTO sales
(prod_id, cust_id, time_id, channel_id,promo_id,amount_sold,quantity_sold)
SELECT product_id, customer_id,weekly_start_date,2,9999,sales_sun,q_sun
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+1,2,9999,sales_mon,q_mon
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+2,2,9999,sales_tue,q_tue
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+3,2,9999,sales_wed,q_wed
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+4,2,9999,sales_thu,q_thu
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+5,2,9999,sales_fri,q_fri
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+6,2,9999,sales_sat,q_sat
FROM sales_input_table;
SET linesize 140
SELECT * from table(dbms_xplan.display);
COMMIT;

Note: The input source table is scanned seven times! The complexity of the denormalization is handled within the several SELECT operations.

With an increasing number of input records, the superiority and the performance improvement of the new multi-table insert statement—by reducing the statement to only one SCAN—will become more and more obvious.


Back to Topic

Using the Multitable Insert for Conditional Insertion

1.

Create an intermediate table consisting of new information. From a SQL*Plus session logged on to the SH schema, execute the following SQL script, which contains the SQL statements from steps 1-4:

@c:\wkdir\mti2_prepare.sql

Rem create intermediate table with some records


CREATE TABLE customers_new AS 
SELECT * 
FROM customers
WHERE cust_id BETWEEN 2000 AND 5000;

 

2.

Disable constraints on the SALES table; this is necessary for step 3.

ALTER TABLE sales DISABLE CONSTRAINT sales_customer_fk;
           
3. Delete some data from the CUSTOMERS table.
DELETE FROM customers WHERE cust_id BETWEEN 2000 AND 5000;

4.

Create an empty table for our special promotion information.

CREATE TABLE customers_special AS
SELECT cust_id, cust_credit_limit 
FROM customers 
WHERE rownum > 1;

 

5. Issue the multitable insert into several tables with different table structures. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\do_mti2.sql

INSERT /*+ APPEND NOLOGGING */FIRST
  WHEN cust_credit_limit >= 4500 THEN
    INTO customers
    INTO customers_special VALUES (cust_id, cust_credit_limit)
  ELSE
    INTO customers
SELECT * FROM customers_new; 

 

6. You can see what was inserted. Execute the following SQL script:

@c:\wkdir\control_mti2.sql

SELECT COUNT(*) FROM customers;
SELECT COUNT(*) FROM customers_special;
SELECT MIN(cust_credit_limit) FROM customers_special;


7.

Before continuing, reset your environment by executing the following SQL script:

@c:\wkdir\reset_mti2.sql

set echo on REM cleanup and reset ALTER TABLE sales
MODIFY CONSTRAINT sales_customer_fk RELY
ENABLE NOVALIDATE;

DROP TABLE customers_special;
DROP TABLE customers_new;

COMMIT;

Back to Topic

Back to Topic List

Using the Upsert Functionality, SQL MERGE Keyword Overview

MyCompany has to update its product information in the data warehouse periodically with data from its products database. Unfortunately, you are receiving the product delta information without the possibility to distinguish between new and updated information. Therefore, you have to figure this out on the data warehouse site.

To use the SQL MERGE command to either update or insert the data conditionally, you perform the following steps:

1.
2.
3.
4.

Back to Topic List

1. Creating an External Table (and Directories) for the External Products Information

Before you can create the external table, you need to create a directory object in the database that points to the directory on the file system where the data files will reside. Optionally, you can separate the location for the logged, bad and discarded files from the location of the data files, as shown in the following example.

1.

To create the directory, perform the following. If you have already completed High-Speed Data Load and Rolling Window Operations, the directories will already exist and you can skip this step.

The scripts are set up for a Windows system and assume that the Hands-On workshop was extracted
to the c:\wkdir folder.

From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\create_directory.sql

DROP DIRECTORY data_dir;
DROP DIRECTORY log_dir;
Rem *****
Rem CREATE DIRECTORIES
Rem note that security is controlled
Rem (a) with privileges on directory (read/write)
Rem (b) with privileges on external table
Rem *****
CREATE DIRECTORY data_dir AS 'c:\wkdir';
CREATE DIRECTORY log_dir AS 'c:\wkdir';





2.

When creating an external table, you are defining two parts of information:

  1. The metadata information for the table representation inside the database
  2. The access parameter definition of how to extract the data from the external file

After creation of this meta information, the external data can be accessed from within the database, without the necessity of an initial load. To create the external table, from a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\create_external_table2.sql

Rem *****
Rem preparation for upsert
Rem file prodDelta.dat, 1.2 MB must
Rem reuse of directory STAGE_DIR
Rem access via external table
Rem *****
DROP TABLE products_delta;
CREATE TABLE products_delta
(
PROD_ID NUMBER(6),
PROD_NAME VARCHAR2(50),
PROD_DESC VARCHAR2(4000),
PROD_SUBCATEGORY VARCHAR2(50),
PROD_SUBCATEGORY_ID NUMBER,
PROD_SUBCATEGORY_DESC VARCHAR2(2000),
PROD_CATEGORY VARCHAR2(50),
PROD_CATEGORY_ID NUMBER,
PROD_CATEGORY_DESC VARCHAR2(2000),
PROD_WEIGHT_CLASS NUMBER(2),
PROD_UNIT_OF_MEASURE VARCHAR2(20),
PROD_PACK_SIZE VARCHAR2(30),
SUPPLIER_ID NUMBER(6),
PROD_STATUS VARCHAR2(20),
PROD_LIST_PRICE NUMBER(8,2),
PROD_MIN_PRICE NUMBER(8,2),
PROD_TOTAL VARCHAR2(13),
PROD_TOTAL_ID NUMBER,
PROD_SRC_ID NUMBER,
PROD_EFF_FROM DATE,
PROD_EFF_TO DATE,
PROD_VALID CHAR(1)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY data_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
NOBADFILE
NOLOGFILE
FIELDS TERMINATED BY "|" LDRTRIM
)
location
('prodDelta.dat')
REJECT LIMIT UNLIMITED NOPARALLEL;


Back to Topic

2. Performing an Upsert Using the SQL MERGE Command

To perform an upsert using the SQL MERGE command, execute the following step:

1.

From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\do_merge_new.sql

Rem *****
Rem MERGE
Rem new and modified product information has arrived
Rem use of external table again
Rem - seamlessly within the MERGE command !!!
Rem *****
MERGE INTO products t
USING products_delta s
ON ( t.prod_id=s.prod_id )
WHEN MATCHED THEN
UPDATE SET t.prod_name=s.prod_name, t.prod_list_price=s.prod_list_price,
           t.prod_min_price=s.prod_min_price
WHEN NOT MATCHED THEN
INSERT (prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
        prod_subcategory_desc, prod_category, prod_category_id, prod_category_desc,
        prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
        prod_weight_class, prod_pack_size, supplier_id)
VALUES (s.prod_id, s.prod_name, s.prod_desc, s.prod_subcategory, s.prod_subcategory_id,
        s.prod_subcategory_desc, s.prod_category, s.prod_category_id,
        s.prod_category_desc,
        s.prod_status, s.prod_list_price, s.prod_min_price, s.prod_total,
        s.prod_total_id,
        s.prod_weight_class, s.prod_pack_size, s.supplier_id)
;
ROLLBACK;

Note: You will ROLLBACK so that you can reissue the same upsert with two SQL statements in a subsequent operation.

 

Back to Topic

3. Showing the Execution plan of the MERGE Command

To examine the execution plan of the SQL MERGE statement, perform the following steps:

1.

From a SQL*Plus session logged on to the SH schema, run the explain_merge_new.sql script, or copy the following SQL statement into your SQL*Plus session:

@c:\wkdir\explain_merge_new.sql

DELETE FROM plan_table;
COMMIT;
EXPLAIN PLAN FOR
MERGE INTO products t
USING products_delta s
ON ( t.prod_id=s.prod_id )
WHEN MATCHED THEN
UPDATE SET t.prod_name=s.prod_name, t.prod_list_price=s.prod_list_price,
           t.prod_min_price=s.prod_min_price
WHEN NOT MATCHED THEN
INSERT    (prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
           prod_subcategory_desc, prod_category, prod_category_id, prod_category_desc,
           prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
           prod_weight_class, prod_pack_size, supplier_id)
VALUES    (s.prod_id, s.prod_name, s.prod_desc, s.prod_subcategory, s.prod_subcategory_id,
           s.prod_subcategory_desc, s.prod_category, s.prod_category_id,
           s.prod_category_desc,
           s.prod_status, s.prod_list_price, s.prod_min_price, s.prod_total,
           s.prod_total_id,
           s.prod_weight_class, s.prod_pack_size, s.supplier_id)
;
set linesize 140
select * from table(dbms_xplan.display);

 

Back to Topic

4. Performing an Upsert Using Two Separate SQL Commands

Without the MERGE statement, such upsert functionality could be implemented as a combination of two SQL statements (insert plus update) or in a procedural manner. A possible upsert implementation of this business problem as two SQL commands is shown below.

To demonstrate how this was done, perform the following steps:

1.

To leverage an updatable join view, a primary or unique key must exist on the join column. Therefore, you cannot use the external table directly. As a result, you need to create an intermediate table in the database with the appropriate unique key constraint. Run the following script to create the necessary structures and perform the upsert.:

@c:\wkdir\do_merge_816.sql

CREATE TABLE prod_delta NOLOGGING AS SELECT * FROM products_delta;
CREATE UNIQUE INDEX p_uk ON prod_delta(prod_id);
UPDATE
           (SELECT s.prod_id, s.prod_name sprod_name, s.prod_desc sprod_desc,
           s.prod_subcategory sprod_subcategory,
           s.prod_subcategory_desc sprod_subcategory_desc, s.prod_category sprod_category,
           s.prod_category_desc sprod_category_desc, s.prod_status sprod_status,
           s.prod_list_price sprod_list_price, s.prod_min_price sprod_min_price,
           t.prod_id, t.prod_name tprod_name, t.prod_desc tprod_desc, t.prod_subcategory
           tprod_subcategory,
           t.prod_subcategory_desc tprod_subcategory_desc, t.prod_category tprod_category,
           t.prod_category_desc tprod_category_desc, t.prod_status tprod_status,
           t.prod_list_price tprod_list_price, t.prod_min_price tprod_min_price
           FROM products t, prod_delta s WHERE s.prod_id=t.prod_id) JV
           SET
           tprod_list_price =sprod_list_price,
           tprod_min_price =sprod_min_price;
INSERT INTO products
           (prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
           prod_subcategory_desc, prod_category, prod_category_id, prod_category_desc,
           prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
           prod_weight_class, prod_pack_size, supplier_id)
           SELECT prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
           prod_subcategory_desc, prod_category, prod_category_id, prod_category_desc,
           prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
           prod_weight_class, prod_pack_size, supplier_id
           FROM prod_delta s WHERE NOT EXISTS (SELECT 1 FROM products t WHERE t.prod_id =
           s.prod_id);
ROLLBACK;

Note that you have to issue two separate SQL statements to accomplish the same functionality when using the MERGE statement.

 

2.

From a SQL*Plus session logged on to the SH schema, view the execution plans by running the view_explain_merge.sql script, or copy the following SQL statement into your SQL*Plus session:

@c:\wkdir\view_explain_merge.sql

PROMPT    show the plans
DELETE FROM plan_table;
COMMIT;
EXPLAIN PLAN FOR
UPDATE
(SELECT s.prod_id, s.prod_name sprod_name, s.prod_desc sprod_desc, s.prod_subcategory sprod_subcategory,
s.prod_subcategory_desc sprod_subcategory_desc, s.prod_category sprod_category,
s.prod_category_desc sprod_category_desc, s.prod_status sprod_status, s.prod_list_price sprod_list_price, s.prod_min_price sprod_min_price,
t.prod_id, t.prod_name tprod_name, t.prod_desc tprod_desc, t.prod_subcategory tprod_subcategory,
t.prod_subcategory_desc tprod_subcategory_desc, t.prod_category tprod_category,
t.prod_category_desc tprod_category_desc, t.prod_status tprod_status, t.prod_list_price tprod_list_price, t.prod_min_price tprod_min_price
FROM products t, prod_delta s WHERE s.prod_id=t.prod_id) JV
SET
tprod_list_price =sprod_list_price,
tprod_min_price =sprod_min_price;
set linesize 180
select * from table(dbms_xplan.display);
DELETE FROM plan_table;

COMMIT;


EXPLAIN PLAN FOR
  INSERT INTO products
  (prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
   prod_subcategory_desc, prod_category, prod_category_id, prod_category_desc,
   prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
   prod_weight_class, prod_pack_size, supplier_id)
   SELECT prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
   prod_subcategory_desc, prod_category, prod_category_id, prod_category_desc,
   prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
   prod_weight_class, prod_pack_size, supplier_id
   FROM prod_delta s WHERE NOT EXISTS (SELECT 1 FROM products t WHERE t.prod_id = s.prod_id);
set linesize 180
select * from table(dbms_xplan.display);
ROLLBACK;
    
...


To leverage the updatable join view functionality, the external table needs to be copied into a real database table and a unique index on the join column needs to be created. In short, this requires more operations, more space requirements, and more processing time.

Back to Topic

Back to Topic List

 

Learn DML Error Logging Capabilities

Oracle Database 10g release 2 introduces a new and exciting functionality to enable record-level error handling for bulk SQL operations. The DML error logging functionality extends existing DML operations by enabling you to specify the name of an error logging table into which Oracle should record errors encountered during DML operations. This enables you to complete the DML operation in spite of any errors, and to take corrective action on the erroneous rows at a later time.

A DML error logging table consists of several mandatory control columns and a set of user-defined columns that represent either all or a subset of the columns of the target table of the DML operation using a data type that is capable of storing potential errors for the target column. For example, you need a VARCHAR2 data type in the error logging table to store TO_NUM data type conversion errors for a NUMBER column in the target table. You should use the DBMS_ERRLOG package to create the DML error logging tables.

1.

Create an external table, representing data from a source system with an obvious poor data quality. Run the script cr_ext_tab_for_elt.sql.

@c:\wkdir\cr_ext_tab_for_elt.sql

PROMPT Create an external table pointing to a data set with poor quality
Rem
DROP TABLE sales_activity_direct;
CREATE TABLE sales_activity_direct (
PROD_ID NUMBER,
CUST_ID NUMBER,
TIME_ID CHAR(20),
CHANNEL_ID CHAR(2),
PROMO_ID NUMBER,
QUANTITY_SOLD NUMBER(3),
AMOUNT_SOLD CHAR(50)
)
ORGANIZATION external (
TYPE oracle_loader
DEFAULT DIRECTORY data_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
BADFILE log_dir:'sh_sales2.bad'
LOGFILE log_dir:'sh_sales2.log_xt'
FIELDS TERMINATED BY "|" LDRTRIM
)
location
(
'sales_activity_direct.dat'
)
)REJECT LIMIT UNLIMITED NOPARALLEL;

 

2. Create a dummy table to avoid interfering with existing tables. Run the script cr_tab_for_elt.sql.

@c:\wkdir\cr_tab_for_elt.sql

PROMPT create a second sales table to being used by the insert
DROP TABLE sales_overall;
CREATE TABLE sales_overall as select * from sales where 1=0;

3. To track errors with the new DML error logging functionality, you need an error logging table. It is highly
recommended that you use the provided package DBMS_ERRLOG. Run the script cr_elt.sql.

@c:\wkdir\cr_elt.sql

PROMPT Create the DML error logging table with DEFAULT name
PROMPT It is highly advised to use the ORCL-provided pkg to create the
PROMPT DML Error Logging table
Rem
DROP TABLE err$_sales_overall;
exec dbms_errlog.create_error_log('sales_overall');

Note the mandatory control columns and the different data types for the columns to being tracked. The data
types have to be a superset of the target data types to enable a proper tracking of errors, e.g. A non-number value
for a number target column.

SET LINESIZE 60
DESCRIBE err$_sales_overall

 

4.

Try to load the data residing in the external file into the target table. The default behavior of the error
logging functionality is set to a REJECT LIMIT of zero. In the case of an error, the DML operation will fail
and the first record raising the error is stored in the error logging table. Run the script ins_elt_1.sql to see this behavior.

@c:\wkdir\ins_elt_1.sql

PROMPT First insert attempt , DEFAULT reject limit 0
PROMPT also, the error message that comes back is the one of error #1
INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
  SELECT * FROM sales_activity_direct
  LOG ERRORS INTO err$_sales_overall ( 'load_test1' );
commit;


PROMPT As you can see, nothing is inserted in the target table, but ONE row in the error
logging table
select count(*) from sales_overall;
select count(*) from err$_sales_overall;
delete from err$_sales_overall;
commit;



From a DML perspective, you want to ensure that a transaction is either successful or not. Hence, from a generic perspective, you either want to have the DML operation succeeding no matter what (which would translate into a reject limit unlimited) or you want to have it failing when an error occurs (reject limit 0). The decision was made to set the default reject limit to zero, because any arbitrary number chosen is somewhat meaningless; If you decide to tolerate a specific number of errors, it is a pure business decision how many errors might be tolerable in a specific situation.

 

5.

Try the insert again with a REJECT LIMIT of 10 records. If more than 10 errors occur, the DML operation will fail, and you will find 11 records in the error logging table. Run the script ins_elt_2.sql to see this behavior.

@c:\wkdir\ins_elt_2.sql

SET ECHO OFF

PROMPT First insert attempt , DEFAULT reject limit 10
PROMPT Note that the error message that comes back is the one of error #11


INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
  SELECT * FROM sales_activity_direct
    LOG ERRORS INTO err$_sales_overall ( 'load_test2' ) REJECT LIMIT 10;
commit;


PROMPT no rows in target; error count+1 in DML error logging table
select count(*) from sales_overall;
select count(*) from err$_sales_overall;
delete from err$_sales_overall;
commit;

There are more than 10 errors for this insert, meaning the quality of this data is poor.

 

6.

Put the data into the table and figure out what errors your are encountering. Run script ins_elt_3.sql.

@c:\wkdir\ins_elt_3.sql

PROMPT Reject limit unlimited will succeed
Rem ... as long as you do not run into one of the current limitations ...
INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
  SELECT * FROM sales_activity_direct
  LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT UNLIMITED;
commit;


PROMPT finally ...
select count(*) from sales_overall;
select count(*) from err$_sales_overall;

 

7.

There are quite a large number of errors! Look into the error logging table to have a better level understanding of the errors that have occurred. Run script sel_elt_1.sql.

@c:\wkdir\sel_elt_1.sql

PROMPT Please recognize the subtle difference between ERROR MESSAGE ONLY and -
ERROR MESSAGE TEXT


Rem Therefore we enforce to store both
set linesize 80

select distinct ora_err_number$ from err$_sales_overall;
select distinct ora_err_number$, ora_err_mesg$ from err$_sales_overall;



Please note the subtle difference between Oracle error message and Oracle error text. In many cases, the error
text provides additional information that helps analyzing the problem, thus the error message and the error text are included as mandatory control columns to an error logging table.

 

8.

As of today, there are some limitations around the error logging capability. All limitations have to deal with situations where an index maintenance is done in a delayed optimization. The limitations will be addressed in a future release of Oracle. Run the script elt_limit.sql.

@c:\wkdir\elt_limit.sql
PROMPT discuss a case with limitation
truncate table sales_overall;
alter table sales_overall add constraint pk_1 primary key (prod_id, cust_id,time_id,
channel_id, promo_id);

PROMPT works fine as before, errors get re-routed
INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
  SELECT * FROM sales_activity_direct
  LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT UNLIMITED;
commit;

select count(*) from sales_overall;
select count(*) from err$_sales_overall;
delete from err$_sales_overall;
commit;
PROMPT case when deferred constraint check (UNIQUE INDEX) leads to error
Rem unique index maintenance is a delayed operation that cannot be caught
Rem on a per record  base as of today. Planned for a future release.
INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
  SELECT * FROM sales_activity_direct
  LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT UNLIMITED;
commit;

select count(*) from sales_overall;
select count(*) from err$_sales_overall;
commit;





9.

DML Error logging is taking care of any kind of error that happens during the DML operation. However, any kind of error that happens as part of the SQL statement in the execution plan BEFORE the DML operation cannot be caught. Consider the following DML operation in the script ins_elt_4.sql.

@c:\wkdir\ins_elt_4.sql

PROMPT INTERESTING CASE.
PROMPT The DML Error Logging can only catch errors that happen at DML TIME, but not errors
PROMT that happen as part of the SQL statement
Rem the TO_DATE conversion is in the insert portion, so that we catch the error
Rem this is default view merging of ORCL 
alter table sales_overall drop constraint pk_1;
truncate table sales_overall;
INSERT /*+ APPEND NOLOGGING PARALLEL */
  INTO sales_overall
    SELECT prod_id, cust_id,
           TO_DATE(time_id,'DD-mon-yyyy'),
           channel_id, promo_id, quantity_sold, amount_sold
  FROM sales_activity_direct
    LOG ERRORS INTO err$_sales_overall ( 'load_20040802' )
    REJECT LIMIT UNLIMITED;
commit

Note that the SELECT statement is applying a TO_DATE() function on column sales.activity_direct.time_id. Oracle internally optimizes the error handling by pushing up this conversion function to the insert operation to ensure that any potential error will be caught. As you experienced earlier, the DML operation succeeds.

When you look into the plan, you will realize what kind of optimization for the error handling is taking place. Run the script xins_elt_4.sql.

@c:\wkdir\xins_elt_4.sql

SET LINESIZE 140

explain plan for
 INSERT /*+ APPEND NOLOGGING PARALLEL */
   INTO sales_overall
   SELECT * FROM
     ( SELECT prod_id, cust_id,
              TO_DATE(time_id,'DD-mon-yyyy'),
              channel_id, promo_id, quantity_sold, amount_sold
       FROM sales_activity_direct )
   LOG ERRORS INTO err$_sales_overall ( 'load_20040802' )
   REJECT LIMIT UNLIMITED;

select * from table(dbms_xplan.display(null,null,'ALL'));         

The plan shows that the TO_DATE() function is pushed up to the insert operation, represented as a 'LOAD AS
SELECT
' row source (it is a direct path insertion)

 

10.

Run the same DML operation again, but enforce that the TO_DATE() conversion cannot be pushed to the DML operation. By using a view construct with a NO_MERGE hint, you can accomplish this. The equivalent SQL SELECT statement itself fails, as you can see by running the script sel_ins_elt_5.sql.

@c:\wkdir\sel_ins_elt_5.sql

PROMPT The equivalent SQL statement will fail if the predicate is artificially kept -
within a lower query block

SELECT prod_id, cust_id,
       TO_DATE(time_id,'DD-mon-yyyy'),
       channel_id, promo_id, quantity_sold, amount_sold
FROM sales_activity_direct;

The DML operation will fail with the same error. Run the script ins_elt_5.sql.

@c:\wkdir\ins_elt_5.sql
PROMPT The same is true when errors are happening on a row source level inside the -
SELECT block
PROMPT We cannot catch such an error 


truncate table sales_overall;
INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
  SELECT * FROM
   ( SELECT /*+ NO_MERGE */ prod_id, cust_id,
            TO_DATE(time_id,'DD-mon-yyyy'),
            channel_id, promo_id, quantity_sold, amount_sold
     FROM sales_activity_direct )
  LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) 
  REJECT LIMIT UNLIMITED;
commit;

And the plan shows that the conversion is taking place inside the view, thus the error. Run the script xins_elt_5.sql to view this behavior:

@c:\wkdir\xins_elt_5.sql

explain plan for
   INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
     SELECT * FROM
            ( SELECT /*+ NO_MERGE */ prod_id, cust_id,
            TO_DATE(time_id,'DD-mon-yyyy'),
            channel_id, promo_id, quantity_sold, amount_sold
     FROM sales_activity_direct )
  LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) 
  REJECT LIMIT UNLIMITED;

select * from table(dbms_xplan.display(null,null,'ALL'));

Back to Topic

Back to Topic List

Experience the Basics of the Table Function

In the ETL process, the data extracted from a source system passes through a sequence of transformations before it is loaded into a data warehouse. Complex transformations are implemented in a procedural manner, either outside the database or inside the database in PL/SQL. When the results of a transformation become too large to fit into memory, they must be staged by materializing them either into database tables or flat files. This data is then read and processed as input to the next transformation in the sequence. Oracle table functions provide support for pipelined and parallel execution of such transformations implemented in PL/SQL, C, or Java.

A table function is defined as a function that can produce a set of rows as output; additionally, Oracle table functions can take a set of rows as input. These sets of rows are processed iteratively in subsets, thus enabling a pipelined mechanism to stream these subset results from one transformation to the next before the first operation has finished. Furthermore, table functions can be processed transparently in parallel, which is equivalent to SQL operations such as a table scan or a sort. Staging tables are no longer necessary.

Because of the powerful and multifaceted usage of table functions as part of complex transformations, you examine basic implementation issues for table functions only. To do this, perform the following activities:

1.
2.
3.
4.
5.
6.

 

1. Setting up the Basic Objects for a Table Function

A table function can produce a set of rows as output and can take a set of rows as input. Therefore, it is imperative that you have to define a record type and a collection of records to define the input and output; those types can either be strongly typed, that is, well-defined prior to their execution, or weakly typed, which means without a fixed record format before run time. You will examine both variants.

To set up the table function objects, perform the following steps:

1.

Define the object (record) type for the examples. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\setup_tf_record.sql

SET ECHO ON
PROMPT object_types
CREATE TYPE product_t AS OBJECT (
prod_id NUMBER(6)
, prod_name VARCHAR2(50)
, prod_desc VARCHAR2(4000)
, prod_subcategory VARCHAR2(50)
, prod_subcategory_desc VARCHAR2(2000)
, prod_category VARCHAR2(50)
, prod_category_desc VARCHAR2(2000)
, prod_weight_class NUMBER(2)
, prod_unit_of_measure VARCHAR2(20)
, prod_pack_size VARCHAR2(30)
, supplier_id NUMBER(6)
, prod_status VARCHAR2(20)
, prod_list_price NUMBER(8,2)
, prod_min_price NUMBER(8,2)
);

This is the basic object record type you will use in the examples.

 

2. Define the object (collection) type for the examples. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\setup_tf_collection.sql

CREATE TYPE product_t_table AS TABLE OF product_t;
/

This type represents the structure of the set of records, which will be delivered back by your table functions.

3. Define a package for the REF CURSOR types. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\setup_tf_package.sql

Rem package of all cursor types
Rem handle the input cursor type and the output cursor collection type


CREATE OR REPLACE PACKAGE cursor_PKG as
TYPE product_t_rec IS RECORD (prod_id NUMBER(6) 
, prod_name VARCHAR2(50)
, prod_desc VARCHAR2(4000)
, prod_subcategory VARCHAR2(50)
, prod_subcategory_desc VARCHAR2(2000)
, prod_category VARCHAR2(50)
, prod_category_desc VARCHAR2(2000)
, prod_weight_class NUMBER(2)
, prod_unit_of_measure VARCHAR2(20)
, prod_pack_size VARCHAR2(30)
, supplier_id NUMBER(6)
, prod_status VARCHAR2(20)
, prod_list_price NUMBER(8,2)
, prod_min_price NUMBER(8,2));


TYPE product_t_rectab IS TABLE OF product_t_rec;
TYPE strong_refcur_t IS REF CURSOR RETURN product_t_rec;
 TYPE refcur_t IS REF CURSOR;
END;

 

4.

Create a log table for the table function example with autonomous DML. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\cre_ope_table.sql

CREATE TABLE obsolete_products_errors 
(prod_id NUMBER, msg VARCHAR2(2000));

This table will be used for the table function example that fans out some data as part of its execution.

Due to time constraints, only the PL/SQL implementations of table functions are covered here. For more information, see the ‘Data Cartridge Developer’s Guide’ for details of interface implementations.

Table functions return a set of records to the subsequent operation. The result set can be either delivered en-block (the default PL/SQL behavior) or pipelined – increments of the result set are streamed to the subsequent operation as soon as they are produced.

For table function implementations in PL/SQL, the PL/SQL engine controls the size of the incrementally returned array. For interface implementations of table functions, in C or Java, it is up to the programmer to define the incremental return set.

Back to Topic

2. Performing Nonpipelined Table Function, Returning an Array of Records

The following example represents a simple table function, returning its result set nonpipelined as a complete array. The table function filters out all records of the product category "Electronics" and returns all other records. Note that the input REF CURSOR is weakly typed, meaning it could be any valid SQL statement. It is up to the table function code to handle its input, probably based on other input variables.

To perform the nonpipelined table function, you perform the following steps:

1.

From a SQL*Plus session logged on to the SH schema, run create_tf_1.sql, or copy the following SQL statements into your SQL*Plus session:

@c:\wkdir\create_tf_1.sql

PROMPT SIMPLE PASS-THROUGH, FILTERING OUT obsolete products without
            product_category 'Electronics'
Rem uses weakly typed cursor as input
CREATE OR REPLACE FUNCTION obsolete_products(cur cursor_pkg.refcur_t) RETURN
product_t_table
IS
prod_id NUMBER(6);
prod_name VARCHAR2(50);
prod_desc VARCHAR2(4000);
prod_subcategory VARCHAR2(50);
prod_subcategory_desc VARCHAR2(2000);
prod_category VARCHAR2(50);
prod_category_desc VARCHAR2(2000);
prod_weight_class NUMBER(2);
prod_unit_of_measure VARCHAR2(20);
prod_pack_size VARCHAR2(30);
supplier_id NUMBER(6);
prod_status VARCHAR2(20);
prod_list_price NUMBER(8,2);
prod_min_price NUMBER(8,2);
sales NUMBER:=0;
objset product_t_table := product_t_table();
i NUMBER := 0;
BEGIN
LOOP
-- Fetch from cursor variable
FETCH cur INTO prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price;
EXIT WHEN cur%NOTFOUND; -- exit when last row is fetched
-- Category Electronics is not meant to be obsolete and will be suppressed
IF prod_status='obsolete' AND prod_category != 'Electronics' THEN
-- append to collection
i:=i+1;
objset.extend;
objset(i):=product_t( prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price);
END IF;
END LOOP;
CLOSE cur;
RETURN objset;
END;
/


 
2.

You cannot transparently SELECT from the defined table function obsolete_products. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\use_tf_1.sql

SELECT DISTINCT UPPER(prod_category), prod_status
FROM TABLE(obsolete_products(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products)));

Note that the input SELECT list for the cursor variable must match the definition of the table function, in this case, pkg.refcur_t. If you can ensure that the table definition (in this case products) matches this input cursor, you can simplify the statement by using the SELECT * FROM TABLE notation.

Back to Topic

3. Performing Pipelined Table Functions

The following example represents the same filtering-out than the previous example. Unlike the first one, this table function returns its result set pipelined, which means incrementally. Note that the input REF CURSOR is strongly typed here, meaning it could be only a SQL statement returning a record type of product_type_record.

To perform the pipelined table function, perform the following steps:

1.

From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\create_tf_2.sql

Rem same example, pipelined implementation
Rem strong ref cursor (input type is defined)
Rem a table without a strong typed input ref cursor 
Rem cannot be parallelized
CREATE OR REPLACE FUNCTION obsolete_products_pipe(cur cursor_pkg.strong_refcur_t)
RETURN product_t_table
PIPELINED
PARALLEL_ENABLE (PARTITION cur BY ANY) IS
prod_id NUMBER(6);
prod_name VARCHAR2(50);
prod_desc VARCHAR2(4000);
prod_subcategory VARCHAR2(50);
prod_subcategory_desc VARCHAR2(2000);
prod_category VARCHAR2(50);
prod_category_desc VARCHAR2(2000);
prod_weight_class NUMBER(2);
prod_unit_of_measure VARCHAR2(20);
prod_pack_size VARCHAR2(30);
supplier_id NUMBER(6);
prod_status VARCHAR2(20);
prod_list_price NUMBER(8,2);
prod_min_price NUMBER(8,2);
sales NUMBER:=0;
BEGIN
LOOP
-- Fetch from cursor variable
FETCH cur INTO prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price;
EXIT WHEN cur%NOTFOUND; -- exit when last row is fetched
IF prod_status='obsolete' AND prod_category !='Electronics' THEN
PIPE ROW (product_t( prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price));
END IF;
END LOOP;
CLOSE cur;
RETURN;
END;
/

2.

You can now transparently SELECT from the defined table function obsolete_products. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\use_tf_2.sql

SELECT DISTINCT prod_category, DECODE(prod_status,'obsolete','NO LONGER 
AVAILABLE','N/A')
FROM TABLE(obsolete_products_pipe(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products)));

 

Back to Topic

4. Performing Transparent Parallel Execution of Table Functions

Table functions can be parallelized transparently without any special intervention or setup. The only necessary declaration, which has to be done as part of the table function creation, is to tell the SQL engine if there are any restrictions or rules for parallelizing its execution.

Think of a table function that processes a complex aggregation operation over some product category attributes. If you want to process this operation in parallel, you have to guarantee that all records with the same product category attributes are processed from the same parallel execution slave, so that your aggregation operation covers all records that belong to the same grouping. This implies that, during parallel processing, a distribution rule must be used for parallel processing, which would guarantee correct distribution as described above.

This parallel distribution rule is defined as part of the table function header, for example:

PARALLEL_ENABLE (PARTITION cur BY ANY) IS

All you need to do to enable parallel execution of a table function is to use the syntax shown.

To perform one of the table functions you previously defined in parallel, perform the following steps:

1.

From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\pq_session.sql

SELECT * 
FROM v$pq_sesstat
WHERE statistic in ('Queries Parallelized','Allocation Height');
2.

Now you can use one of the table functions in parallel. You enforce the parallel execution with a PARALLEL hint on the REF CURSOR. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\parallel_tf.sql

SELECT DISTINCT prod_category,
DECODE(prod_status,'obsolete','NO LONGER AVAILABLE','N/A')
FROM TABLE(obsolete_products_pipe(
CURSOR(SELECT /*+ PARALLEL(a,4)*/
prod_id, prod_name, prod_desc, prod_subcategory, prod_subcat_desc,
prod_category, prod_cat_desc, prod_weight_class, prod_unit_of_measure,
prod_pack_size, supplier_id,prod_status, prod_list_price,prod_min_price
FROM products a)));

 

3.

Rerun the SQL script to check the status of the parallelized queries:

@c:\wkdir\pq_session.sql

SELECT * 
FROM v$pq_sesstat
WHERE statistic in ('Queries Parallelized','Allocation Height');

 

Back to Topic

5. Performing Table Functions with Autonomous DML

Although a table function is part of a single atomic transaction, it provides the additional capability to fan out data to other tables within the scope of an autonomous transaction. This can be used in a variety of ways, such as for exception or progress logging or to fan out subsets of data to be used by other independent transformations. The following example logs some "exception" information into the OBSOLETE_PRODUCTS_ERRORS table:

1.

From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\create_tf_3.sql

CREATE OR REPLACE FUNCTION obsolete_products_dml(cur cursor_pkg.strong_refcur_t,
prod_cat varchar2 DEFAULT 'Electronics') RETURN product_t_table
PIPELINED
PARALLEL_ENABLE (PARTITION cur BY ANY) IS
PRAGMA AUTONOMOUS_TRANSACTION;
prod_id NUMBER(6);
prod_name VARCHAR2(50);
prod_desc VARCHAR2(4000);
prod_subcategory VARCHAR2(50);
prod_subcategory_desc VARCHAR2(2000);
prod_category VARCHAR2(50);
prod_category_desc VARCHAR2(2000);
prod_weight_class NUMBER(2);
prod_unit_of_measure VARCHAR2(20);
prod_pack_size VARCHAR2(30);
supplier_id NUMBER(6);
prod_status VARCHAR2(20);
prod_list_price NUMBER(8,2);
prod_min_price NUMBER(8,2);
sales NUMBER:=0;
BEGIN
LOOP
-- Fetch from cursor variable
FETCH cur INTO prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price;
EXIT WHEN cur%NOTFOUND; -- exit when last row is fetched
IF prod_status='obsolete' THEN
IF prod_category=prod_cat THEN
INSERT INTO obsolete_products_errors VALUES
(prod_id, 'correction: category '||UPPER(prod_cat)||' still available');
COMMIT;
ELSE
PIPE ROW (product_t( prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price));
END IF;
END IF;
END LOOP;
CLOSE cur;
RETURN;
END; /
2.

Now you can select from this table function and check the log table. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\use_tf_3a.sql

TRUNCATE TABLE obsolete_products_errors;
SELECT DISTINCT prod_category, prod_status FROM TABLE(obsolete_products_dml(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products)));
SELECT DISTINCT msg FROM obsolete_products_errors;

 

3.

Select from this table function again with a different input argument and check the log table.

@c:\wkdir\use_tf_3b.sql

TRUNCATE TABLE obsolete_products_errors;
SELECT DISTINCT prod_category, prod_status FROM TABLE(obsolete_products_dml(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products),'Photo'));
SELECT DISTINCT msg FROM obsolete_products_errors;

You see that, dependent on the input variable, the table function processes the input rows differently, thus delivering a different result set and inserting different information into obsolete_products_error.

Back to Topic

6. Performing Seamless Streaming Through Several Table Functions

Besides the transparent usage of table functions within SQL statements and its capability of being processed in parallel, one of the biggest advantages is that table functions can be called from within each other. Furthermore, table functions can be used in any SQL statement and become the input for all kinds of DML statements.

To see how this is done, perform the following steps:

1.

This SQL statement uses two table functions, nested in each other. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\use_tf_stream.sql

SELECT DISTINCT prod_category, prod_status
FROM TABLE(obsolete_products_dml(CURSOR(SELECT *
FROM TABLE(obsolete_products_pipe(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products))))));
SELECT COUNT(*) FROM obsolete_products_errors;
SELECT DISTINCT msg FROM obsolete_products_errors;
set echo off


        
2.

Now you can use this table function as INPUT for a CREATE TABLE AS SELECT command. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\CTAS_tf.sql

CREATE TABLE PIPE_THROUGH AS
SELECT DISTINCT prod_category, prod_status
FROM TABLE(obsolete_products_dml(CURSOR(SELECT *
FROM TABLE(obsolete_products_pipe(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products))))));

 

3.

Before you move to the next section of this tutorial, you need to clean up your environment by executing the following script:

@c:\wkdir\cleanup_tf.sql

DROP TABLE obsolete_products_errors;
DROP TABLE pipe_through;
DROP FUNCTION obsolete_products;
DROP FUNCTION obsolete_products_pipe;
DROP FUNCTION obsolete_products_dml;

 

Back to Topic

Back to Topic List

Data warehousing involves the extraction and transportation of relational data from one or more source databases, into the data warehouse for analysis. Change Data Capture quickly identifies and processes only the data that has changed, not entire tables, and makes the change data available for further use.

Without Change Data Capture, database extraction is a cumbersome process in which you move the entire contents of tables into flat files, and then load the files into the data warehouse. This ad hoc approach is expensive in a number of ways.

Change Data Capture does not depend on intermediate flat files to stage the data outside of the relational database. It captures the change data resulting from INSERT, UPDATE, and DELETE operations made to user tables. The change data is then stored in a database object called a change table, and the change data is made available to applications in a controlled way.

Publish and Subscribe Model

Most Change Data Capture systems have one publisher that captures and publishes change data for any number of Oracle source tables. There can be multiple subscribers accessing the change data. Change Data Capture provides PL/SQL packages to accomplish the publish and subscribe tasks.

Publisher

The publisher is usually a database administrator (DBA) who is in charge of creating and maintaining schema objects that make up the Change Data Capture system. The publisher performs the following tasks:

Determines the relational tables (called source tables) from which the data warehouse application is interested in capturing change data.

Uses the Oracle supplied package, DBMS_LOGMNR_CDC_PUBLISH, to set up the system to capture data from one or more source tables.

Publishes the change data in the form of change tables.
Allows controlled access to subscribers by using the SQL GRANT and REVOKE statements to grant and revoke the SELECT privilege on change tables for users and roles.

Subscriber

The subscribers, usually applications, are consumers of the published change data. Subscribers subscribe to one or more sets of columns in source tables. Subscribers perform the following tasks:

Use the Oracle supplied package, DBMS_LOGMNR_CDC_SUBSCRIBE, to subscribe to source tables for controlled access to the published change data for analysis.

Extend the subscription window and create a new subscriber view when the subscriber is ready to receive a set of change data.

Use SELECT statements to retrieve change data from the subscriber views.
Drop the subscriber view and purge the subscription window when finished processing a block of changes.
Drop the subscription when the subscriber no longer needs its change data.

Please consult the ‘Oracle Data Warehousing Guide’ for a more detailed discussion about the Change Data Capture Architecture.

To learn more about this topic, perform the following steps:

1.
2.
3.
4.
5.
6.
7.
8.

Back to Topic

Back to Topic List

1. Using Synchronous CDC to Track all the Incremental Changes

You use synchronous CDC to track all the incremental changes for the table PRODUCTS by performing the following steps ( For demonstration purposes the publisher and subscriber will be the same database user.):

To see how this is done, perform the following steps:

1.

First create a new intermediate table where you apply all changes. From a SQL*Plus session logged on to the SH schema, run cr_cdc_target.sql, or copy the following SQL statements into your SQL*Plus session:

@c:\wkdir\cr_cdc_target.sql

CREATE TABLE my_price_change_Electronics
(prod_id number, prod_min_price number, prod_list_price number, when date);


2.

If you never used Change Data Capture before, you won't see any objects related to CDC other than the synchronous change set SYNC_SET and the synchronous change source SYNC_SOURCE. Both of these objects are created as part of the database creation.

@c:\wkdir\show_cdc_env1.sql

SELECT * FROM change_sources;
PROMPT see the change tables
Rem shouldn't show anything
SELECT * FROM change_tables;
PROMPT see the change sets SYNC_SET
SELECT decode(to_char(end_date,'dd-mon-yyyy hh24:mi:ss'),null,'No end date set.')
end_date,
decode(to_char(freshness_date,'dd-mon-yyyy hh24:mi:ss'),null,'No freshness
date set.') freshness_date
FROM change_sets WHERE set_name='SYNC_SET';


Back to Topic

2. Creating a Change Table

All changes are stored in change tables. Change tables have a one-to-one dependency to a source table; they consist of several fixed metadata columns and a dynamic set of columns equivalent to the identified source columns of interest.

1.

You create a change table by using the provided package DBMS_CDC_PUBLISH.

Note that the old package name DBMS_CDC_LOGMNR_PUBLISH is a synonym pointing to DBMS_CDC_PUBLISH and is maintainted as-is for backward compatibility.

@c:\wkdir\cr_cdc_ct.sql

rem create a change table within the change set SYNC_SET.
Rem details on parameters see doc
PROMPT *** 10g ***
PROMPT NOTE THAT DBMS_LOGMNR_CDC_* are synonyms for the DBMS_CDC_* packages and are only around
PROMPT for backwards compatibility
begin
DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE (OWNER => 'SH', -
CHANGE_TABLE_NAME => 'PROD_price_CT',
CHANGE_SET_NAME => 'SYNC_SET',
SOURCE_SCHEMA => 'SH',
SOURCE_TABLE => 'PRODUCTS',
COLUMN_TYPE_LIST => 'prod_id number(6), prod_min_price number(8,2),
prod_list_price number(8,2)',
CAPTURE_VALUES => 'both',
RS_ID => 'y',
ROW_ID => 'n',
USER_ID => 'n',
TIMESTAMP => 'n',
OBJECT_ID => 'n',
SOURCE_COLMAP => 'y',
TARGET_COLMAP => 'y',
OPTIONS_STRING => null);
end;
/

This script creates a change table called PROD_PRICE_CT and the necessary trigger framework to track all subsequent changes on PRODUCTS. Note that the tracking is done as part of the atomic DML operation against PRODUCTS.


2. Describe the change table PROD_PRICE_CT:
set linesize 120
desc prod_price_ct

See the static metadata columns as well as the columns PROD_ID, PROD_MIN_PRICE, and PROD_LIST_PRICE (equivalent to source table PRODUCTS).

 

3.

To see the metadata of the published source table and the change tables in the system, issue the following SQL commands:

@c:\wkdir\show_cdc_env2.sql

prompt see the published source tables
SELECT * FROM dba_source_tables;
prompt see published source columns
SELECT source_schema_name,source_table_name, column_name
FROM dba_source_tab_columns;
prompt see the change tables
SELECT * FROM change_tables;


Back to Topic

3. Subscribing to a Change Set and to All Source Table Columns of Interest

You can subscribe to a change set for controlled access to the published change data for analysis. Note that the logical entity for a subscription handle is a change set and not a change table. A change set can consist of several change tables and guarantees logical consistency for all of its change tables. After subscribing to a change set, you have to subscribe to all source table columns of interest.

1.

You must get a unique subscription handle that is used throughout the session and tell the system what columns you are interested in. You are using the existing change set SYNC_SET. This functionality is provided by the package DBMS_CDC_SUBSCRIBE.

@c:\wkdir\subs_cdc_ct.sql

variable subname varchar2(30)
begin
:subname := 'my_subscription_no_1';
DBMS_CDC_SUBSCRIBE.CREATE_SUBSCRIPTION (
CHANGE_SET_NAME => 'SYNC_SET',
DESCRIPTION => 'Change data PRODUCTS for ELECTRONICS',
SUBSCRIPTION_name => :subname);
END;
/
PRINT subname


2. You can query the meta information about your subscription.

@c:\wkdir\show_cdc_env3.sql

SELECT handle, description,
decode(status,'N','Not activated.','Activated.') status,
earliest_scn, latest_scn, decode(last_purged,null,'Never purged.',
to_char(last_purged,'dd-mon-yyyy hh24:mi:ss')) last_purged,
decode(last_extended, null,'Never extended.',
to_char(last_extended,'dd-mon-yyyy hh24:mi:ss')) last_extended
FROM
user_subscriptions;
SELECT * FROM user_subscribed_tables;
SELECT source_table_name, column_name 
FROM dba_subscribed_columns;

 

3.

After creating a subscription, you set up a change view for this subscription. The complete handling of change views was enhanced with Oracle Database 10g: you can name a change table or you do not have to drop change views to extend and/or purge subscriber windows.

@c:\wkdir\cr_cdc_rv.sql

rem now subscribe to a single source table and columns of interest.
rem A subscription can contain one or more tables from the same change set.
PROMPT ***10g***
Rem use new interface to name subscriber view explicitly !!
variable view_name varchar2(30);
BEGIN
:view_name := 'my_prod_price_change_view';
DBMS_CDC_SUBSCRIBE.SUBSCRIBE (
SUBSCRIPTION_NAME => :subname,
SOURCE_SCHEMA => 'sh',
SOURCE_TABLE => 'products',
COLUMN_LIST => 'prod_id, prod_min_price, prod_list_price',
subscriber_view => :view_name );
END;
/
PROMPT ***10g***
Rem The view is created automatically
desc MY_PROD_PRICE_CHANGE_VIEW


So far you accomplished the necessary steps to set up a CDC environment:

On the publisher site you prepared a source system for CDC.

On the subscriber site, you identified and subscribed to all source tables (and columns) of interest and defined/created our Change View.

Now you are ready to use CDC.

Back to Topic

4.Activating a Subscription and Extending the Subscription Window

The first step after setting up the publish and subscribe framework is to activate a subscription and to extend the subscription window to see all changes that have taken place since the last subscription window extension or since the activation of a subscription.

1.

Activate and extend a subscription window. You will also have a look into the metadata.

@c:\wkdir\sub_cdc1.sql

rem now activate the subscription since we are ready to receive 
rem change data the ACTIVATE_SUBSCRIPTION procedure sets
rem subscription window to empty initially
rem At this point, no additional source tables can be added to the
rem subscription
EXEC DBMS_LOGMNR_CDC_SUBSCRIBE.ACTIVATE_SUBSCRIPTION -
(SUBSCRIPTION_name => 'my_subscription_no_1')
rem now recheck the subscriptions and see that it is active
rem the view is still not created ... SELECT handle, description,
decode(status,'N','Not activated.','Activated.') status, earliest_scn, latest_scn,
decode(last_purged,null,'Never purged.',
to_char(last_purged,'dd-mon-yyyy hh24:mi:ss')) last_purged,
decode(last_extended, null,'Never extended.',
to_char(last_extended,'dd-mon-yyyy hh24:mi:ss')) last_extended
from user_subscriptions;

2. Any changes you apply now to the source table PRODUCTS is reflected in the change table. The changes are transparently maintained with triggers on the source table.

@c:\wkdir\dml_cdc1.sql

Rem now do some changes

UPDATE products
SET prod_list_price=prod_list_price*1.1
WHERE prod_min_price > 100;
COMMIT;
Rem you will see entries in the change table
Rem note that we have entries for the old and the new values
SELECT Count(*) FROM prod_price_ct;

Note that you have two records for each source row, one representing the old, and the other the new value set for the subscribed columns.

Again, please never use the change table itself for identifying the changes on a source table. Use the subscriber view that will be created for you. This is the only supported way to guarantee that every change is delivered only once to a subscribing application.

3.

Unlike the change table, the change view does not show any records yet. This is because you applied the DML operation AFTER you extended the subscription window. Consequently you must not see anything yet.

Rem the view does not show anything
select count(*) from MY_PROD_PRICE_CHANGE_VIEW;

You are now preparing for consuming the changes by extending the subscription window.

@c:\wkdir\ext_cdc_sub1.sql

rem now set upper boundary (high-water mark) for the subscription window
rem At this point, the subscriber has created a new window that begins
rem where the previous window ends.
rem The new window contains any data that was added to the change table.
EXEC DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW -
(SUBSCRIPTION_NAME => 'my_subscription_no_1');
rem now recheck the subscriptions and see that it is active
SELECT handle, description,
decode(status,'N','Not activated.','Activated.') status,
earliest_scn, latest_scn,
decode(last_purged,null,'Never purged.',
to_char(last_purged,'dd-mon-yyyy hh24:mi:ss')) last_purged,
decode(last_extended, null,'Never extended.',
to_char(last_extended,'dd-mon-yyyy hh24:mi:ss')) last_extended
from user_subscriptions;
Rem ... and you will see data
SELECT count(*) FROM my_prod_price_change_view;


4.

You can now select from this system-generated change view, and you will see only the changes that have happened for your time window.

@c:\wkdir\sel_cdc_cv1.sql

Rem changes classified for specific product groups
SELECT p1.prod_id, p2.prod_category, p1.prod_min_price,
p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id
AND operation$='UN';
PROMPT and especially the Electronics' ones - 3 records only
SELECT p1.prod_id, p2.prod_category, p1.prod_min_price,
p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id
AND operation$='UN'
AND p2.prod_category='Electronics';

In the example shown above, you are joining back to the source table PRODUCTS. When you set up a CDC environment, please ensure that such an operation should not be necessary; if possible, a change table should be usable as stand-alone source for synchronizing your target environment. Joining back to the source table may reduce the data stored on disk, but it creates additional workload on the source site. Join-back cannot be accomplished on a different system than the source system.


5.

You can now consume the changes on your target system.

@c:\wkdir\consume_cdc1.sql

Prompt ... you can now consume the changes
rem arbitrary example, where we only track ELECTRONICS changes. This shall demonstrate
rem the flexibility as well as the responsibility of the client site (consumer) to deal
rem appropriately with the changes
INSERT into my_price_change_electronics
SELECT p1.prod_id, p1.prod_min_price, p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id
AND p2.prod_category='Electronics' AND operation$='UN';
COMMIT; SELECT prod_id, prod_min_price, prod_list_price,
to_char(when,'dd-mon-yyyy hh24:mi:ss')
FROM my_price_change_electronics;


Back to Topic

5.Activating a Subscription and Extending the Subscription Window

So far you exercised your first consumption with the synchronous CDC framework. Now investigate how to handle such an environment over time when additional changes will happen and old changes are consumed by all subscribers.

1.

Other DML operations take place on the source table PRODUCTS.

@c:\wkdir\dml_cdc2.sql

PROMPT other changes will happen
UPDATE products
SET prod_min_price=prod_min_price*1.1
WHERE prod_min_price < 10;
COMMIT;

2. The synchronous CDC framework tracks these DML operations transparently in the change table.

@c:\wkdir\show_cdc_ct2.sql

SELECT count(*) FROM prod_price_ct ;
PROMPT and especially the ELECTRONICS' ones
SELECT COUNT(*) FROM prod_price_ct p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';

 

3.

Run the following script to view the product data.

@c:\wkdir\sel_cdc_cv1.sql

Rem changes classified for specific product groups
SELECT p1.prod_id, p2.prod_category, p1.prod_min_price,
p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id
AND operation$='UN';
PROMPT and especially the Electronics' ones - 3 records only
SELECT p1.prod_id, p2.prod_category, p1.prod_min_price,
p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id
AND operation$='UN'
AND p2.prod_category='Electronics';


4.

To tell the CDC framework that you are done with a set of changes, you have to purge and extend the subscription.

@c:\wkdir\purge_cdc_sub_window1.sql

PROMPT purge old data from the subscription window
Rem this will NOT delete any records from the change table. It just tells the CDC system
Rem that the changed data used so fast is no longer needed
EXEC DBMS_CDC_SUBSCRIBE.PURGE_WINDOW -
(SUBSCRIPTION_name => 'my_subscription_no_1');
PROMPT still the same number of records in the change table
PROMPT REMEMBER THE NUMBER OF ROWS !!!
SELECT COUNT(*) FROM prod_price_ct;
PROMPT ... change view is empty
SELECT COUNT(*) FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';


5.

To get the new changes reflected in the change view, you have to extend the time window for the change view.

@c:\wkdir\ext_cdc_sub_window1.sql

PROMPT let's get the new change
Rem 'do it again Sam'
Rem first extend the window you want to see
EXEC DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW -
(SUBSCRIPTION_name => 'my_subscription_no_1');

Rem ... now you will see exactly the new changed data since the last consumption
SELECT COUNT(*) FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';

The new change view shows exactly the changes that were not consumed yet. This is different than the content in the change table.


6.

Now consume the new changes. Because the incremental changes you are interested in are stored in a change table (a "normal" Oracle table), you can use any language or SQL construct the database supports.

@c:\wkdir\consume_cdc2.sql

PROMPT the new changes will be used
MERGE INTO my_price_change_Electronics t
USING (
SELECT p1.prod_id, p1.prod_min_price, p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics' AND
operation$='UN') cv
ON (cv.prod_id = t.prod_id)
WHEN MATCHED THEN
UPDATE SET t.prod_min_price=cv.prod_min_price,
t.prod_list_price=cv.prod_list_price
WHEN NOT MATCHED THEN
INSERT VALUES (cv.prod_id, cv.prod_min_price, cv.prod_list_price,
commit_timestamp$);
COMMIT;
rem look at them
SELECT prod_id, prod_min_price, prod_list_price,
to_char(when,'dd-mon-yyyy hh24:mi:ss')
FROM my_price_change_electronics;


Back to Topic

6. Running the Publisher

The Publisher is responsible for maintaining the CDC framework and guaranteeing that the change tables are purged regularly as soon as all subscribers have consumed a specific set of change information.

1.

From a SQL*Plus session logged on to the SH schema, run purge_ct.sql, or copy the following SQL statements into your SQL*Plus session:

@c:\wkdir\purge_cdc_ct.sql

PROMPT what to do to avoid overflow of change table ? PURGE IT !


Rem only the rows where we have no potential subscribers will be purged !!!
Rem we guarantee to keep all rows as long as we have subscribers which haven't
rem purged their window ...
exec DBMS_CDC_PUBLISH.purge_change_table('sh','prod_price_ct')
PROMPT REMEMBER THE NUMBER OF ROWS !!!
Rem you will have 18 entries for the last updated 9 records
SELECT COUNT(*) FROM prod_price_ct;
PROMPT this is exactly twice the number of changes we made with the second DML operation
SELECT COUNT(*) FROM prod_price_ct p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';

Although we haven’t specified any specific time window for the purge operation of the change table, the CDC framework ensures that only those changes will be purged that are no longer needed by a subscriber.


Back to Topic

7. Dropping the Used Change View and Purging the Subscription Window

1.

Because you consumed the second set of change already, you can drop your used change view and purge your subscription window. You will also purge the change table.

@c:\wkdir\purge_cdc_sub_window2.sql

PROMPT ... purge the newly consumed data from the subscription window (2nd DML operation))
EXEC DBMS_CDC_SUBSCRIBE.PURGE_WINDOW -
(SUBSCRIPTION_name =>'my_subscription_no_1');
PROMPT purge all change tables again
EXEC DBMS_CDC_PUBLISH.purge_change_table('sh','prod_price_ct')
PROMPT ... and you will see an empty change
SELECT * FROM prod_price_ct;


The change table is now empty. All changes are consumed and can be purged.

 

Back to Topic

8. Cleaning Up the CDC Environment

Clean up the CDC environment. To do this, perform the following step:

1.

From a SQL*Plus session logged on to the SH schema, run cleanup_cdc.sql, or copy the following SQL statements into your SQL*Plus session:

@c:\wkdir\cleanup_cdc.sql

PROMPT CLEAN UP
exec DBMS_CDC_SUBSCRIBE.drop_subscription -
(subscription_name=> 'my_subscription_no_1');
exec DBMS_CDC_PUBLISH.DROP_CHANGE_TABLE (OWNER => 'sh', -
CHANGE_TABLE_NAME => 'prod_price_CT', -
FORCE_FLAG => 'Y');
DROP TABLE my_price_change_electronics;
UPDATE products SET prod_list_price=prod_list_price/1.1
WHERE prod_min_price > 100;
UPDATE products SET prod_list_price=prod_list_price/1.1
WHERE prod_min_price < 10;
COMMIT;


Back to Topic

Back to Topic List

Besides its central data warehouse, MyCompany is running several small divisional data marts. For example, the products department wants to receive all transactional SALES data partitioned by its main product categories for marketing campaign analysis; only the SALES data of 2000 is relevant. You will address the business problem by using Oracle’s Transportable Tablespace capabilities and Oracle’s LIST partitioning. Furthermore, to guarantee a successful completion of the generation process of this new table, you run the statement in RESUMABLE mode, thus ensuring that any space problem will not cause the creation process to fail.

Note: To demonstrate the benefit of resumable statements, you need TWO sessions and therefore TWO windows. Please read the following section CAREFULLY before taking the exercise.

To propagate from a data warehouse to a data mart, you perform the following steps:

1.
2.
3.
4.
5.
6.

Back to Topic List

1. Enabling a RESUMABLE Session

With resumable statements, Oracle provides the ability to suspend and resume the execution of large database operations in the event of repairable failure. Currently the types of errors from which a statement can resume are space limit and out-of-space errors. When an operation suspends, you have the opportunity to take the corrective steps to resolve the error condition. Alternatively, a procedure can be registered to automate the error correction. After the error condition is solved, the suspended statement will automatically resume and continues operation. If the error is not corrected within an optionally specified suspension time limit, the statement will finally fail. You will choose a manual error correction for this exercise.

To see how this is done, perform the following step:

1.

This SQL statement uses two table functions, nested in each other. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\set_resumable.sql

PROMPT CLEAN UP
ALTER SESSION ENABLE RESUMABLE TIMEOUT 1200 NAME 
'create list partitioning';
This brings your session in the so-called resumable mode, names it, and enables a maximum suspension time of 1200 seconds.


Back to Topic

2. Creating a New Tablespace as Potential Transportable Tablespace

Now create an additional tablespace for storing our new LIST partitioned fact table. To do this, perform the following steps:

1.

First, make sure that the potential tablespace TRANSFER does not exist. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\drop_TS.sql

DROP TABLESPACE my_obe_transfer INCLUDING CONTENTS AND DATAFILES;

The tablespace is dropped. If you get an ORA-959 message, you can ignore it; this means that the tablespace does not exist.

2. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

NOTE: If you are using a different drive or path for your data files, the script will need to be edited accordingly.

@c:\wkdir\create_ts.sql

CREATE TABLESPACE my_obe_transfer DATAFILE 'c:\obetemp' SIZE 2M REUSE autoextend off;

The tablespace is intentionally defined too small for the target table to enforce a space error during table creation.

 

Back to Topic

 

3. Creating a LIST Partitioned Table in the New Tablespace

In Window 1, you create a LIST partitioned table with a CREATE TABLE AS SELECT command in the newly created tablespace transfer.

1.

From a SQL*Plus session logged on to the SH schema, run create_list_part_table.sql.

NOTE: This statement will hang due to space problems in the TRANSFER tablespace and show up in the alert log. When it hangs move to the next step.

@c:\wkdir\create_list_part_table.sql

DROP TABLE sales_prod_dept;
PROMPT create table in new TS that is too small
CREATE TABLE sales_prod_dept
(prod_category, prod_subcategory,cust_id, 
time_id,channel_id,promo_id, quantity_sold, amount_sold
) NOLOGGING TABLESPACE my_obe_transfer
PARTITION BY LIST (prod_category)
(PARTITION electronic_sales values ('Electronics'),
PARTITION hardware_sales values ('Hardware'),
PARTITION sw_other_sales values ('Software/Other'),
PARTITION p_and_a values ('Peripherals and Accessories'),
PARTITION photo_sales values ('Photo')
)
AS
SELECT p.prod_category, p.prod_subcategory, s.cust_id, s.time_id,s.channel_id,
s.promo_id, SUM(s.amount_sold) amount_sold, SUM(s.quantity_sold) quantity_sold
FROM sales s, products p, times t
WHERE p.prod_id=s.prod_id
AND s.time_id = t.time_id
AND t.fiscal_year=2000
GROUP BY prod_category, prod_subcategory,cust_id, s.time_id,channel_id, promo_id
;


Back to Topic

4. Leveraging the New RESUMABLE Statement Capabilities for Efficient Error Detection and Handling

While the CREATE TABLE AS SELECT is running in Window 1, you can control the status of all sessions running in RESUMABLE mode in Window 2.

1.

From a SQL*Plus session logged on to the SH schema, execute the following SQL statement:

@c:\wkdir\run_sel_w2.sql

SELECT NAME, STATUS, ERROR_MSG FROM dba_resumable;

As long as no error occurs, the session stays in a normal status. As soon as an error occurs, your session status changes, and the ORA-error causing this problem is displayed. The current statement in the RESUMABLE session will be suspended until either (a) the problem is fixed, or (b) the timeout limit is reached. The error will also show up in the alert.log file of the running instance.


2. Now fix the problem manually. Issue the individual tablespace/datafile definitions of the following commands. From a SQL*Plus session logged on to the SYSTEM Schema, execute the following SQL script:

@c:\wkdir\fix_ts.sql

ALTER DATABASE DATAFILE 'c:\obetemp' AUTOEXTEND ON NEXT 5M;

The tablespace can now be autoextended by the database itself without any further interaction.

 

3.

As soon as the error is fixed, the suspended session automatically resumes and continues processing. Check the data dictionary view for change of status. From a SQL*Plus session logged on to the SH schema, check the status again:

@c:\wkdir\run_sel_w2.sql

SELECT NAME, STATUS, ERROR_MSG FROM dba_resumable;

The resumption of the statement occurs in the previously hung window.


Back to Topic

5. Creating a Range-List Partitioned Table in the Transfer Tablespace

With the additional partitioning strategy RANGE-LIST, you can use a composite partitioning method to subdivide a partitioned table based on two logical attributes. The main partitioning strategy is the well-known and most used Range partitioning strategy. You find this implementation in nearly every data warehouse implementation dealing with Rolling Window Operations. Furthermore, every Range partition is partitioned based on a List Partitioning strategy, enabling a finer partition granule for addressing your business needs.

A typical example leveraging Range-List Partitioning is a global Retail environment, having time range partitions (Rolling Window) and a region-oriented list partition underneath, so that you can maintain every time window for a specific region independent of each other.

List partitioned tables enable you to group sets of distinct unrelated values together in one partition. Consequently, all values not being covered with an existing partition definition will raise an Oracle error message. Although this enforces a business constraint (you should not see a non-existing region in your data warehouse), you cannot always be sure to not having such violations. Oracle therefore introduces the capability to create a DEFAULT List Partition, a kind of catch-all partition for all undefined partition key values.

Oracle has also introduced the so-called subpartition template technology, a common optional element of both range-hash and range-list composite partitioning. The template provides an easy and convenient way to define default subpartitions for each table partition. Oracle will create these default subpartitions in any partition for which you do not explicitly define subpartitions. This clause is useful for creating symmetric partitions.

In this section, you create a Range-List partitioned table, leveraging all of the functionality mentioned above.

1.

From a SQL*Plus session logged on to the SH schema, run the following SQL statements into your SQL*Plus session:

@c:\wkdir\create_new_range_list.sql

CREATE TABLE sales_rlp
COMPRESS
TABLESPACE MY_OBE_TRANSFER
PARTITION BY RANGE (time_id)
SUBPARTITION BY LIST (channel_id)
SUBPARTITION TEMPLATE
( SUBPARTITION direct values (3),
SUBPARTITION internet values (4),
SUBPARTITION partner values (2),
SUBPARTITION other values (DEFAULT)
)
(PARTITION SALES_before_1999 VALUES LESS THAN (TO_DATE('01-JAN-1999','DD-MON-YYYY')),
PARTITION SALES_Q1_1999 VALUES LESS THAN (TO_DATE('01-APR-1999','DD-MON-YYYY')),
PARTITION SALES_Q2_1999 VALUES LESS THAN (TO_DATE('01-JUL-1999','DD-MON-YYYY')),
PARTITION SALES_Q3_1999 VALUES LESS THAN (TO_DATE('01-OCT-1999','DD-MON-YYYY')),
PARTITION SALES_Q4_1999 VALUES LESS THAN (TO_DATE('01-JAN-2000','DD-MON-YYYY')),
PARTITION SALES_Q1_2000 VALUES LESS THAN (TO_DATE('01-APR-2000','DD-MON-YYYY')),
PARTITION SALES_Q2_2000 VALUES LESS THAN (TO_DATE('01-JUL-2000','DD-MON-YYYY')),
PARTITION SALES_Q3_2000 VALUES LESS THAN (TO_DATE('01-OCT-2000','DD-MON-YYYY')),
PARTITION SALES_Q4_2000 VALUES LESS THAN (MAXVALUE) NOCOMPRESS)
AS
SELECT * FROM sales sample(10);

This example also shows how you can easily create compressed and uncompressed partitions as part of an initial table creation. The partition and template names are inherited by all the subpartitions, thus making the naming and identification of subpartitions much easier and more convenient.

 

2. From a SQL*Plus session logged on to the SH schema, run the following SQL statements into your SQL*Plus session:

@c:\wkdir\show_range_list_names.sql

PROMPT as you can see, the partition and template name is inherited to all the subpartitions,
PROMPT thus making the naming - and identification - of subpartitions much easier ...
select partition_name, subpartition_name, high_value from user_tab_subpartitions
where table_name='SALES_RLP';

A List and Range-List partitioned table group sets distinct unrelated values together in one partition. Consequently, all values not being covered with an existing partition definition will raise an Oracle error message. We will now create a DEFAULT List Partition for the previously created List partitioned table sales_prod_dept, a kind of catch-all partition for all undefined partition key values.

 

3.

From a SQL*Plus session logged on to the SH schema, run the following SQL statements into your SQL*Plus session:

@c:\wkdir\cr_default_list_part.sql

PROMPT add a new partition to sales_prod_dept that does NOT have a DEFAULT partition
Rem it is added just like for a range partitioned table
ALTER TABLE sales_prod_dept ADD PARTITION gameboy_sales VALUES ('Gameboy');
PROMPT now add another one, covering the DEFAULT value
ALTER TABLE sales_prod_dept ADD PARTITION all_other_sales VALUES (DEFAULT);
PROMPT view the data dictionary select partition_name, high_value from user_tab_partitions where table_name='SALES_PROD_DEPT';

All records having an undefined partitioning key will be stored in this new partition other_sales. However, having a DEFAULT partition changes the way you’re adding new partitions to this table. Conceptually, you cover all possible values for the partitioning key—the defined ones and "all others"—as soon as you have created a default partition.

To ADD a new partition to this table, you logically have to SPLIT the default partition into a new partition with a set of defined keys and a new default partition, still covering "all other values" (now reduced by the keys we have specified for the new partition). Any attempt to add a new partition like you did before will raise an ORA error message.


4.

From a SQL*Plus session logged on to the SH schema, run the following SQL statements into your SQL*Plus session:

@c:\wkdir\split_default_list_part.sql

PROMPT Unlike the first time, we cannot simply add a new partition
PROMPT raises ORA-14323: cannot add partition when DEFAULT partition exists
Rem we cannot be sure whether the new value already exists in the DEFAULT partition
ALTER TABLE sales_prod_dept ADD PARTITION undefined_sales VALUES ('Undefined');

This will raise an Oracle error. Run the following commands or the script split_default_list_part_b.sql

@c:\wkdir\split_default_list_part_b.sql

PROMPT so we have to SPLIT the default partition to ensure that 
PROMPT all potential values of 'Undefined' are in the new partition

ALTER TABLE sales_prod_dept SPLIT
PARTITION other_sales VALUES ('Undefined') INTO
(PARTITION undefined_sales, PARTITION other_sales);

PROMPT control the data dictionary
PROMPT Note that without specifying any tablespace, the default
PROMPT tablespace of the partitioned table is used select partition_name, tablespace_name, high_value
from user_tab_partitions
where table_name='SALES_PROD_DEPT';


Back to Topic

6. Preparing the Metadata Export for a Transportable Tablespace

To prepare a tablespace for transportation, you simply have to export the metadata information of the tablespace. This export, together with a physical copy of the tablespace, can be used for importing the tablespace into another database instance. Prior to Oracle9i, the target database had to be running on the same operating system with the same database block size. Oracle9i lifted the restriction of the same block size. Oracle Database 10g lifts the restriction even more by introducing heterogeneous transportable tablespaces.

To show this capability, you perform the following steps:

1.

From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@c:\wkdir\make_ts_ro.sql

ALTER TABLESPACE my_obe_transfer READ ONLY;

This guarantees that no further changes can happen on the data in the tablespace.


2. You can now export the data dictionary information of tablespace my_obe_transfer as follows:

@c:\wkdir\export_metadata.sql

Optionally, you can create a specific directory where you want to store the export dump file:

CREATE DIRECTORY my_obe_dump_dir as 'c:\wkdir';

Now you can export the metadata only. Please compare the file size with the tablespace size to get a feeling what it would have meant to extract all the data and not only the metadata.

expdp \'/ as sysdba\' DIRECTORY=my_obe_dump_dir DUMPFILE= meta_MY_OBE_TRANSFER.dmp

Note: This script may take several minutes to complete running.

Note: The same export syntax can be used at the DOS prompt by dropping the SQL*Plus "host" command.

3.

Before you import the tablespace information into another database, you will want to clean up and reset the session status by executing the SQL script:

@c:\wkdir\cleanup_tts.sql

DROP TABLESPACE transfer INCLUDING CONTENTS AND DATAFILES;
ALTER SESSION DISABLE RESUMABLE;


4.

Trying to access table sales_prod_dept raises an error. The table does not exist any longer.

@c:\wkdir\sel_spd.sql

SELECT count(*) 
from sales_prod_dept;

 

Back to Topic

Back to Topic List

To clean up your environment, you will need to perform the following step:

1.

From a SQL*Plus session logged on to the SH schema, execute the following commands:

SET SERVEROUTPUT ON
EXEC dw_handsOn.cleanup_modules


Back to Topic List

In this lesson, you learned how to:

Back to Topic List

Place the cursor on this icon to hide all screenshots.