Legal | Privacy
Using the ETL Infrastructure Inside Oracle Database 10g

Using the ETL Infrastructure Inside Oracle Database 10g

Purpose

This tutorial covers the Extraction, Transformation, and Loading (ETL) infrastructure of Oracle Database 10g.

Approximately 1 hour

Topics

This tutorial covers the following topics:

Viewing Screenshots

Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: Because this action loads all screenshots simultaneously, response time may be slow depending on your Internet connection.)

Note: Alternatively, you can place the cursor over each individual icon in the following steps to load and view only the screenshot associated with that step.

Overview

What Happens During the ETL Process?

What happens during the ETL process? During Extraction, the desired data has to be identified and extracted from many different sources, including database systems and applications. Very often, it’s not possible to identify the specific subset of interest, therefore more than necessary data has to be extracted, and the identification of the relevant data is done at a later point in time. Depending on the source system’s capabilities (for example, OS resources), some transformations may take place during this extraction process. The size of the extracted data varies from hundreds of kilobytes to hundreds of gigabytes, depending on the source system and the business situation. The same is true for the time delta between two (logically) identical extractions: the time span may vary between days/hours and minutes to near real-time. For example, Web server log files can easily become hundreds of megabytes in a very short period of time.

After extracting data, it has to be physically transported to the target system or an intermediate system for further processing. Depending on the chosen way of transportation, some transformations can be done during this process, too. For example, a SQL statement that directly accesses a remote target through a gateway, can concatenate two columns as part of the SELECT statement .

After extracting and transporting the data, the most challenging (and time consuming) part of ETL follows: Transformation and Loading into the target system. This may include:

Application of complex filters

Data has to be validated against information already existing in target database tables

Data extracted without the knowledge of new versus changed information has to be checked against the target objects to determine whether it must be updated or inserted

The same data has to be inserted several times as detail level and aggregated information

This should be done as quickly as possible in a scaleable manner and must not affect the existing target with respect to concurrent access for information retrieval.

Oracle offers a wide variety of capabilities to address all the issues and tasks relevant in an ETL scenario. The Oracle Database 10g is the ETL transformation engine.

Back to Topic List

Before starting this tutorial, you should have:

1.

Completed the Installing Oracle Database 10g on Windows Using Real Application Clusters (RAC) and Automated Storage Management (ASM) tutorial

OR

Completed the Installing Oracle Database 10g on Linux tutorial

 

2.

Downloaded and unzipped etl2.zip into your working directory (/home/oracle/wkdir)

 

3.

Downloaded and unzipped setup_dwh.zip into your working directory (/home/oracle/wkdir)

 

Some changes on the existing Sales History schema objects are necessary, and some additional system privileges must be granted to the user SH. The SQL file for applying those changes is modifySH_10g.sql. This file is provided to you in the setup_dwh.zip file. To use the setup files for the Data Warehousing tutorials, perform the following steps:

1.

Start a SQL*Plus session and Run the modifySH_10g.sqlscript from your SQL*Plus session.

Move your mouse over this icon to see the image

 

Back to Topic List

Becoming Familiar with the New Multi-Table Insert

MyCompany gets some nonrelational data structures from one of its partner companies, which sells our products for a special advertisement campaign. The data structure is a denormalized, nonrelational record structure from proprietary mainframe systems consisting of one record per customer and product per week. Those data structures have to be inserted into our data warehouse. Because sales record data is stored per customer and product per day, you have to transform the incoming data.

As part of the transformation, the nonrelational denormalized data structure must be transformed from one record per week into seven records, each consisting the information of a business day (first example business transformation). In addition, the data warehouse keeps track of all new customers with a credit limit above a certain limit. Those customers should be tracked in addition separately.

In this section, you will implement those business transformations by leveraging Oracle’s new multi-table insert capabilities. To do this, perform the following steps:

Use the Multi-Table Insert for Pivoting

Use the Multi-Table Insert for Conditional Insertion

Back to Topic List

Using the Multi-Table Insert for Pivoting

Before issuing the multi-table insert, you want to investigate plans for the multi-table insert and how what looks like without having this functionality. To do this, you perform the following steps:

1.

Show the execution plan for the new multi-table insert. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@explain_mti_new.sql

DELETE
FROM PLAN_TABLE;

EXPLAIN PLAN FOR
INSERT ALL
INTO sales
VALUES(product_id, customer_id,weekly_start_date,2,9999, q_sun,sales_sun)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+1,2,9999, q_mon,sales_mon)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+2,2,9999, q_tue,sales_tue)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+3,2,9999, q_wed,sales_wed)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+4,2,9999, q_thu,sales_thu)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+5,2,9999, q_fri,sales_fri)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+6,2,9999, q_sat,sales_sat)
SELECT * FROM sales_input_table;

SET linesize 140
SELECT * from table(dbms_xplan.display);

Place the cursor over this icon to see the image

Note: The input source table is scanned only once! The complexity of the denormalization is handled within the several INSERT INTO branches, thus avoiding multiple scans.

 

2.

Now show the execution plan for the multi-table insert based on a UNION ALL set operation. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@explain_mti_old.sql

DELETE FROM PLAN_TABLE;
COMMIT;

EXPLAIN PLAN FOR
INSERT INTO sales
(prod_id, cust_id, time_id, channel_id,promo_id,amount_sold,quantity_sold)
SELECT product_id, customer_id,weekly_start_date,2,9999,sales_sun,q_sun
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+1,2,9999,sales_mon,q_mon
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+2,2,9999,sales_tue,q_tue
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+3,2,9999,sales_wed,q_wed
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+4,2,9999,sales_thu,q_thu
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+5,2,9999,sales_fri,q_fri
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+6,2,9999,sales_sat,q_sat
FROM sales_input_table;

SET linesize 140
SELECT * from table(dbms_xplan.display);

COMMIT;

Place the cursor over this icon to see the image

Note: The input source table is scanned seven times! The complexity of the denormalization is handled within the several SELECT operations.

With an increasing number of input records, the superiority and the performance improvement of the new multi-table insert statementby reducing the statement to only one SCANwill become more and more obvious.

 

3.

Now, you are ready to execute the following SQL script to perform the new multi-table insert:

@do_mti.sql

set timing on
alter system flush buffer_cache;

INSERT ALL
INTO sales
VALUES(product_id, customer_id,weekly_start_date,2,9999,q_sun,sales_sun)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+1,2,9999, q_mon,sales_mon)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+2,2,9999, q_tue,sales_tue)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+3,2,9999, q_wed,sales_wed)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+4,2,9999, q_thu,sales_thu)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+5,2,9999, q_fri,sales_fri)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+6,2,9999, q_sat,sales_sat)
SELECT * FROM sales_input_table;
Rollback;

Place the cursor over this icon to see the image

 

4.

You can now compare the run-time statistics to the old way, using a UNION ALL pivoting approach:

@do_mti1_old.sql

set timing on
alter system flush buffer_cache;

INSERT INTO sales
(prod_id, cust_id, time_id, channel_id,promo_id,amount_sold,quantity_sold)
SELECT product_id, customer_id,weekly_start_date,2,9999,sales_sun,q_sun
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+1,2,9999,sales_mon,q_mon
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+2,2,9999,sales_tue,q_tue
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+3,2,9999,sales_wed,q_wed
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+4,2,9999,sales_thu,q_thu
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+5,2,9999,sales_fri,q_fri
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+6,2,9999,sales_sat,q_sat
FROM sales_input_table;

ROLLBACK;

Place the cursor over this icon to see the image

 

Back to Topic

Using the Multi-Table Insert for Conditional Insertion

In this multi-table insert, you insert data into several tables with different table structures based on some conditions. To do this, you perform the following steps:

1.

Create an intermediate table consisting of new information. From a SQL*Plus session logged on to the SH schema, execute the following SQL script, which contains the SQL statements from steps 1-4:

@mti2_prepare.sql

Rem create intermediate table with some records

CREATE
TABLE customers_new AS

SELECT *
FROM customers
WHERE cust_id BETWEEN 2000 AND 5000;

Place the cursor over this icon to see the image

 

2.

Disable constraints on the SALES table; necessary for step 3.

ALTER TABLE sales DISABLE CONSTRAINT sales_customer_fk;

 

3.

Delete some data from the PRODUCTS table.

DELETE FROM customers WHERE cust_id BETWEEN 2000 AND 5000;

 

4.

Create an empty table for our special promotion information.

CREATE TABLE customers_special AS
SELECT cust_id, cust_credit_limit
FROM customers
WHERE rownum > 1;

 

5.

Issue the multi-table insert into several tables with different table structures. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@do_mti2.sql

INSERT /*+ APPEND NOLOGGING */FIRST
WHEN cust_credit_limit >= 4500 THEN
INTO customers
INTO customers_special VALUES (cust_id, cust_credit_limit)
ELSE
INTO customers
SELECT * FROM customers_new;

Place the cursor over this icon to see the image

 

6.

Now, you can see what was inserted. Execute the following SQL script:

@control_mti2.sql

SELECT
COUNT(*) FROM customers;
SELECT
COUNT(*) FROM customers_special;
SELECT
MIN(cust_credit_limit) FROM customers_special;

Place the cursor over this icon to see the image

 

7.

Before continuing you will want to reset your environment by executing the following SQL script:

@reset_mti2.sql

ALTER
TABLE sales
MODIFY
CONSTRAINT sales_customer_fk
RELY
ENABLE NOVALIDATE;

DROP TABLE customers_special;
DROP TABLE customers_new;

COMMIT;

Place the cursor over this icon to see the image

 

Back to Topic

Back to Topic List

Using the New Upsert Functionality, SQL MERGE Keyword Overview

MyCompany has to update its product information in the data warehouse periodically with data from its products database. Unfortunately, you are getting product delta information without the possibility to distinguish between new and updated information. Therefore, you have to figure this out on the data warehouse site.

To use the new SQL MERGE command to either update or insert the data conditionally, you perform the following steps:

1.

Create an external table (and directories) for the external products information.

2.

Perform an Upsert using the new SQL MERGE command.

3.

Show the execution plan of the new MERGE command.

4.

Perform an Upsert using two separate SQL commands (prior to Oracle9i functionality).

Back to Topic List

1. Creating an External Table (and Directories) for the External Products Information

1.

To create the directory, perform the following. If you have already completed High-Speed Data Load and Rolling Window Operations, the directories will already exist and you can skip this step. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@create_directory.sql

DROP DIRECTORY data_dir;
DROP DIRECTORY log_dir;

Rem *****
Rem CREATE DIRECTORIES
Rem note that security is controlled
Rem (a) with privileges on directory (read/write)
Rem (b) with privileges on external table
Rem *****

CREATE
DIRECTORY data_dir AS '/home/oracle/wkdir/';
CREATE
DIRECTORY log_dir AS '/home/oracle/wkdir/';

 

2.

When creating an external table, you are defining two parts of information:

1.

The metadata information for the table representation inside the database

2.

The access parameter definition of how to extract the data from the external file

After creation of this meta information, the external data can be accessed from within the database, without the necessity of an initial load. To create the external table, from a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@create_external_table2.sql

Rem *****
Rem preparation for upsert
Rem file prodDelta.dat, 1.2 MB must
Rem reuse of directory STAGE_DIR
Rem access via external table
Rem *****

DROP TABLE products_delta;
CREATE TABLE products_delta
(
PROD_ID NUMBER(6),
PROD_NAME VARCHAR2(50),
PROD_DESC VARCHAR2(4000),
PROD_SUBCATEGORY VARCHAR2(50),
PROD_SUBCATEGORY_ID NUMBER,
PROD_SUBCATEGORY_DESC VARCHAR2(2000),
PROD_CATEGORY VARCHAR2(50),
PROD_CATEGORY_ID NUMBER,
PROD_CATEGORY_DESC VARCHAR2(2000),
PROD_WEIGHT_CLASS NUMBER(2),
PROD_UNIT_OF_MEASURE VARCHAR2(20),
PROD_PACK_SIZE VARCHAR2(30),
SUPPLIER_ID NUMBER(6),
PROD_STATUS VARCHAR2(20),
PROD_LIST_PRICE NUMBER(8,2),
PROD_MIN_PRICE NUMBER(8,2),
PROD_TOTAL VARCHAR2(13),
PROD_TOTAL_ID NUMBER,
PROD_SRC_ID NUMBER,
PROD_EFF_FROM DATE,
PROD_EFF_TO DATE,
PROD_VALID CHAR(1)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY data_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
NOBADFILE
NOLOGFILE
FIELDS TERMINATED BY "|" LDRTRIM
)
location
('prodDelta.dat')
REJECT LIMIT UNLIMITED NOPARALLEL;

Place the cursor over this icon to see the image

 

Back to Topic

2. Performing an Upsert by Using the New SQL MERGE Command

To perform an upsert using the new SQL MERGE command, perform the following step:

1.

From a SQL*Plus session logged on to the SH schema, execute the following SQL script:

@do_merge_new.sql

Rem *****
Rem MERGE
Rem new and modified product information has arrived
Rem use of external table again
Rem - seemlessly within the MERGE command !!!
Rem *****

MERGE INTO products t
USING products_delta s
ON ( t.prod_id=s.prod_id )
WHEN MATCHED THEN
UPDATE SET t.prod_name=s.prod_name, t.prod_list_price=s.prod_list_price,
t.prod_min_price=s.prod_min_price
WHEN NOT MATCHED THEN
INSERT
(prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id, prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
prod_weight_class, prod_pack_size, supplier_id)
VALUES
(s.prod_id, s.prod_name, s.prod_desc, s.prod_subcategory, s.prod_subcategory_id,
s.prod_subcategory_desc, s.prod_category, s.prod_category_id,
s.prod_category_desc,
s.prod_status, s.prod_list_price, s.prod_min_price, s.prod_total,
s.prod_total_id,
s.prod_weight_class, s.prod_pack_size, s.supplier_id)
;

ROLLBACK;

Place the cursor over this icon to see the image

Note: You will ROLLBACK so that you can reissue the same upsert with two SQL statements in a subsequent operation.

 

Back to Topic

3. Showing the Execution Plan of the New MERGE Command

To look at the execution plan of the MERGE statement, you perform the following steps:

1.

From a SQL*Plus session logged on to the SH schema, run explain_merge_new.sql, or copy the following SQL statement into your SQL*Plus session:

@explain_merge_new.sql

DELETE FROM plan_table;

COMMIT;

EXPLAIN PLAN FOR
MERGE INTO products t
USING products_delta s
ON ( t.prod_id=s.prod_id )
WHEN MATCHED THEN
UPDATE SET t.prod_name=s.prod_name, t.prod_list_price=s.prod_list_price,
t.prod_min_price=s.prod_min_price
WHEN NOT MATCHED THEN
INSERT
(prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id, prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
prod_weight_class, prod_pack_size, supplier_id)
VALUES
(s.prod_id, s.prod_name, s.prod_desc, s.prod_subcategory, s.prod_subcategory_id,
s.prod_subcategory_desc, s.prod_category, s.prod_category_id,
s.prod_category_desc,
s.prod_status, s.prod_list_price, s.prod_min_price, s.prod_total,
s.prod_total_id,
s.prod_weight_class, s.prod_pack_size, s.supplier_id)
;

set linesize 140

select * from table(dbms_xplan.display);

Place the cursor over this icon to see the image

 

Back to Topic

4. Performing an Upsert by Using Two Separate SQL Commands (Before Oracle9i)

Before Oracle9i, such upsert functionality could be implemented as a combination of two SQL statements (insert plus update) or in a procedural manner. To demonstrate how this was done, perform the following steps:

1.

To leverage an updatable join view, a primary or unique key must exist on the join column. Therefore, you cannot use the external table directly. As a result, you need to create an intermediate table in the database with the appropriate unique key constraint. Run the following script:

do_merge_816.sql

CREATE TABLE prod_delta NOLOGGING AS SELECT * FROM products_delta;

CREATE UNIQUE INDEX p_uk ON prod_delta(prod_id);

UPDATE
(SELECT s.prod_id, s.prod_name sprod_name, s.prod_desc sprod_desc,
s.prod_subcategory sprod_subcategory,
s.prod_subcategory_desc sprod_subcategory_desc, s.prod_category sprod_category,
s.prod_category_desc sprod_category_desc, s.prod_status sprod_status,
s.prod_list_price sprod_list_price, s.prod_min_price sprod_min_price,
t.prod_id, t.prod_name tprod_name, t.prod_desc tprod_desc, t.prod_subcategory
tprod_subcategory,
t.prod_subcategory_desc tprod_subcategory_desc, t.prod_category tprod_category,
t.prod_category_desc tprod_category_desc, t.prod_status tprod_status,
t.prod_list_price tprod_list_price, t.prod_min_price tprod_min_price
FROM products t, prod_delta s WHERE s.prod_id=t.prod_id) JV
SET
tprod_list_price =sprod_list_price,
tprod_min_price =sprod_min_price;

INSERT INTO products
(prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id, prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
prod_weight_class, prod_pack_size, supplier_id)
SELECT prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id, prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
prod_weight_class, prod_pack_size, supplier_id
FROM prod_delta s WHERE NOT EXISTS (SELECT 1 FROM products t WHERE t.prod_id =
s.prod_id);

ROLLBACK;

You have to issue two separate SQL statements to accomplish the same functionality than the new MERGE command. To leverage the updatable join view functionality of 8.1.6, we also had to copy the external table into a real database table and create a unique index on the join column. In short, more operations, more space requirements, and more processing time.

Place the cursor over this icon to see the image

 

Back to Topic

Back to Topic List

Table Function Overview

In the ETL process, the data extracted from a source system passes through a sequence of transformations before it is loaded into a data warehouse. Complex transformations are implemented in a procedural manner, either outside the database or inside the database in PL/SQL. When the results of a transformation become too large to fit into memory, they must be staged by materializing them either into database tables or flat files. This data is then read and processed as input to the next transformation in the sequence. Oracle table functions provide support for pipelined and parallel execution of such transformations implemented in PL/SQL, C, or Java.

A table function is defined as a function that can produce a set of rows as output; additionally, Oracle table functions can take a set of rows as input. These sets of rows are processed iteratively in subsets, thus enabling a pipelined mechanism to stream these subset results from one transformation to the next before the first operation has finished. Furthermore, table functions can be processed transparently in parallel, which is equivalent to SQL operations such as a table scan or a sort. Staging tables are no longer necessary.


Examining Basic Implementation Issues for Table Functions