Place the cursor
over this icon to load and view all the screenshots for this tutorial. (Caution:
Because this action loads all screenshots simultaneously, response time may
be slow depending on your Internet connection.)
Note:
Alternatively, you can place the cursor over each individual icon in the following
steps to load and view only the screenshot associated with that step.
Overview
What Happens During the ETL Process?
What
happens during the ETL process? During Extraction,
the desired data has to be identified and extracted from many different sources,
including database systems and applications. Very often, it’s not possible
to identify the specific subset of interest, therefore more than necessary
data has to be extracted, and the identification of the relevant data is done
at a later point in time. Depending on the source system’s capabilities
(for example, OS resources), some transformations may take place during this extraction
process. The size of the extracted data varies from hundreds of kilobytes to hundreds
of gigabytes, depending on the source system and the business situation. The same
is true for the time delta between two (logically) identical extractions: the
time span may vary between days/hours and minutes to near real-time. For example, Web server
log files can easily become hundreds of megabytes in a very short
period of time.
After extracting data, it has to be physically transported to the target
system or an intermediate system for further processing. Depending on the chosen
way of transportation, some transformations can be done during this process, too.
For example, a SQL statement that directly accesses a remote target through
a gateway, can concatenate two columns as part of the SELECT statement .
After extracting and transporting the data, the most challenging (and
time consuming) part of ETL follows: Transformation
and Loading into the target system.
This may include:
Application of
complex filters
Data
has to be validated against information already existing in target database tables
Data extracted
without the knowledge of new versus changed information has to be checked against
the target objects to determine whether it must be updated or inserted
The same data
has to be inserted several times as detail level and aggregated information
This should be done as quickly as possible in a scaleable manner and must
not affect the existing target with respect to concurrent access for information
retrieval.
Oracle offers a wide variety of capabilities
to address all the issues and tasks relevant in an ETL scenario.The Oracle Database 10g is the ETL transformation engine.
Implement Schema Changes for the Sales History
Schema
Some changes
on the existing Sales History schema objects are necessary, and some additional
system privileges must be granted to the user SH. The SQL file for applying those
changes is modifySH_10g.sql. This
file is provided to you in the setup_dwh.zip file.
To use the setup files for the Data Warehousing tutorials, perform the following
steps:
1.
Start
a SQL*Plus session and Run the modifySH_10g.sqlscript from your SQL*Plus session.
MyCompany gets some nonrelational data structures from one of its partner
companies, which sells our products for a special advertisement campaign. The
data structure is a denormalized, nonrelational record structure from proprietary
mainframe systems consisting of one record per customer and product per week.
Those data structures have to be inserted into our data warehouse. Because sales
record data is stored per customer and product per day, you have to transform
the incoming data.
As part of the transformation, the nonrelational denormalized data structure
must be transformed from one record per week into seven records, each consisting
the information of a business day (first example business transformation). In
addition, the data warehouse keeps track of all new customers with a creditlimit
above a certain limit. Those customers should be tracked in addition separately.
In this section, you will implement those business transformations by leveraging
Oracle’s new multi-table insert capabilities. To do this, perform
the following steps:
Before issuing the multi-table insert, you want to investigate plans for
the multi-table insert and how what looks like without having this functionality.
To do this, you perform the following steps:
1.
Show the execution plan for the new
multi-table insert. From a SQL*Plus session logged on to the SH schema, execute
the following SQL script:
@explain_mti_new.sql
DELETE FROM PLAN_TABLE;
EXPLAIN PLAN FOR INSERT ALL INTO sales VALUES(product_id, customer_id,weekly_start_date,2,9999,
q_sun,sales_sun) INTO sales VALUES(product_id, customer_id,weekly_start_date+1,2,9999,
q_mon,sales_mon) INTO sales VALUES(product_id, customer_id,weekly_start_date+2,2,9999,
q_tue,sales_tue) INTO sales VALUES(product_id, customer_id,weekly_start_date+3,2,9999,
q_wed,sales_wed) INTO sales VALUES(product_id, customer_id,weekly_start_date+4,2,9999,
q_thu,sales_thu) INTO sales VALUES(product_id, customer_id,weekly_start_date+5,2,9999,
q_fri,sales_fri) INTO sales VALUES(product_id, customer_id,weekly_start_date+6,2,9999,
q_sat,sales_sat) SELECT * FROM sales_input_table;
SET linesize 140 SELECT * from table(dbms_xplan.display);
Note: The input
source table is scanned only once! The complexity of the denormalization is handled
within the several INSERT INTO branches, thus avoiding multiple scans.
2.
Now show the execution plan for the
multi-table insert based on a UNION ALL set operation. From a SQL*Plus session
logged on to the SH schema, execute the following SQL script:
@explain_mti_old.sql
DELETE FROM PLAN_TABLE;
COMMIT;
EXPLAIN PLAN FOR INSERT
INTO sales (prod_id, cust_id, time_id, channel_id,promo_id,amount_sold,quantity_sold)
SELECT product_id, customer_id,weekly_start_date,2,9999,sales_sun,q_sun FROM
sales_input_table UNION ALL SELECT product_id, customer_id,weekly_start_date+1,2,9999,sales_mon,q_mon
FROM sales_input_table UNION ALL SELECT product_id, customer_id,weekly_start_date+2,2,9999,sales_tue,q_tue
FROM sales_input_table UNION ALL SELECT product_id, customer_id,weekly_start_date+3,2,9999,sales_wed,q_wed
FROM sales_input_table UNION ALL SELECT product_id, customer_id,weekly_start_date+4,2,9999,sales_thu,q_thu
FROM sales_input_table UNION ALL SELECT product_id, customer_id,weekly_start_date+5,2,9999,sales_fri,q_fri
FROM sales_input_table UNION ALL SELECT product_id, customer_id,weekly_start_date+6,2,9999,sales_sat,q_sat
FROM sales_input_table;
SET linesize 140 SELECT
* from table(dbms_xplan.display);
COMMIT;
Note: The input
source table is scanned seven times! The complexity of the denormalization is
handled within the several SELECT operations.
With an increasing number of input
records, the superiority and the performance improvement of the new multi-table
insert statement—by reducing the statement to only one SCAN—will become more
and more obvious.
3.
Now, you are ready to execute the
following SQL script to perform the new multi-table insert:
@do_mti.sql
set timing on alter system
flush buffer_cache;
INSERT ALL INTO sales
VALUES(product_id, customer_id,weekly_start_date,2,9999,q_sun,sales_sun) INTO
sales VALUES(product_id, customer_id,weekly_start_date+1,2,9999, q_mon,sales_mon)
INTO sales VALUES(product_id, customer_id,weekly_start_date+2,2,9999, q_tue,sales_tue)
INTO sales VALUES(product_id, customer_id,weekly_start_date+3,2,9999, q_wed,sales_wed)
INTO sales VALUES(product_id, customer_id,weekly_start_date+4,2,9999, q_thu,sales_thu)
INTO sales VALUES(product_id, customer_id,weekly_start_date+5,2,9999, q_fri,sales_fri)
INTO sales VALUES(product_id, customer_id,weekly_start_date+6,2,9999, q_sat,sales_sat)
SELECT * FROM sales_input_table; Rollback;
4.
You can now compare the run-time statistics
to the old way, using a UNION ALL pivoting approach:
@do_mti1_old.sql
set timing on alter system
flush buffer_cache;
INSERT INTO sales (prod_id,
cust_id, time_id, channel_id,promo_id,amount_sold,quantity_sold) SELECT product_id,
customer_id,weekly_start_date,2,9999,sales_sun,q_sun FROM sales_input_table
UNION ALL SELECT product_id, customer_id,weekly_start_date+1,2,9999,sales_mon,q_mon
FROM sales_input_table UNION ALL SELECT product_id, customer_id,weekly_start_date+2,2,9999,sales_tue,q_tue
FROM sales_input_table UNION ALL SELECT product_id, customer_id,weekly_start_date+3,2,9999,sales_wed,q_wed
FROM sales_input_table UNION ALL SELECT product_id, customer_id,weekly_start_date+4,2,9999,sales_thu,q_thu
FROM sales_input_table UNION ALL SELECT product_id, customer_id,weekly_start_date+5,2,9999,sales_fri,q_fri
FROM sales_input_table UNION ALL SELECT product_id, customer_id,weekly_start_date+6,2,9999,sales_sat,q_sat
FROM sales_input_table;
Using
the Multi-Table Insert for Conditional Insertion
In this multi-table insert, you insert data into several tables with different
table structures based on some conditions. To do this, you perform the following
steps:
1.
Create an intermediate table consisting
of new information. From a SQL*Plus session logged on to the SH schema, execute
the following SQL script, which contains the SQL statements from steps 1-4:
@mti2_prepare.sql
Rem create intermediate table
with some records
CREATE TABLE customers_new
AS
SELECT * FROM customers
WHERE cust_id BETWEEN 2000 AND 5000;
2.
Disable constraints on the SALES table;
necessary for step 3.
ALTER TABLE
sales DISABLE CONSTRAINT sales_customer_fk;
3.
Delete some data from the PRODUCTS
table.
DELETE FROM
customers WHERE cust_id BETWEEN 2000 AND 5000;
4.
Create an empty table for our special
promotion information.
CREATE TABLE
customers_special AS SELECT cust_id, cust_credit_limit FROM customers
WHERE rownum > 1;
5.
Issue the multi-table insert into
several tables with different table structures. From a SQL*Plus session logged
on to the SH schema, execute the following SQL script:
@do_mti2.sql
INSERT /*+ APPEND NOLOGGING
*/FIRST WHEN cust_credit_limit >= 4500 THEN INTO customers INTO
customers_special VALUES (cust_id, cust_credit_limit) ELSE INTO customers
SELECT * FROM customers_new;
6.
Now, you can see what was inserted.
Execute the following SQL script:
@control_mti2.sql
SELECT COUNT(*) FROM customers; SELECT COUNT(*) FROM customers_special;
SELECT MIN(cust_credit_limit) FROM customers_special;
7.
Before continuing you will want to
reset your environment by executing the following SQL script:
@reset_mti2.sql
ALTER TABLE sales
MODIFY CONSTRAINT sales_customer_fk RELY ENABLE NOVALIDATE;
DROP TABLE customers_special;
DROP TABLE customers_new;
Using the New Upsert
Functionality, SQL MERGE Keyword Overview
MyCompany has to update its product information in the data warehouse periodically
with data from its products database. Unfortunately, you are getting product delta
information without the possibility to distinguish between new and updated information. Therefore, you have to figure this out on the data warehouse site.
To use the new SQL MERGE command to either update
or insert the data conditionally, you perform the following steps:
1.
Creating an External Table (and Directories) for the External Products Information
Before you can create the external table, you need to create a directory
object in the database that will point to the directory on the file system where
the data files will reside. Optionally, you can separate the location for the
logged, bad and discarded files from the location of the data files, as shown in the
following example.
1.
To create the directory, perform the
following. If you have already completed High-Speed Data Load and Rolling Window
Operations, the directories will already exist and you can skip this step. From
a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@create_directory.sql
DROP DIRECTORY data_dir;
DROP DIRECTORY log_dir;
Rem ***** Rem CREATE DIRECTORIES
Rem note that security is controlled Rem (a) with privileges on directory
(read/write) Rem (b) with privileges on external table Rem *****
CREATE DIRECTORY data_dir
AS '/home/oracle/wkdir/'; CREATE DIRECTORY log_dir AS '/home/oracle/wkdir/';
2.
When creating an external table, you
are defining two parts of information:
1.
The metadata information for the table
representation inside the database
2.
The access parameter definition of
how to extract the data from the external file
After creation of this meta information,
the external data can be accessed from within the database, without the necessity
of an initial load. To create the external table, from a SQL*Plus session logged
on to the SH schema, execute the following SQL script:
@create_external_table2.sql
Rem ***** Rem preparation
for upsert Rem file prodDelta.dat, 1.2 MB must Rem reuse of directory
STAGE_DIR Rem access via external table Rem *****
2.
Performing an Upsert by Using the New SQL MERGECommand
To perform an upsert using the new SQL MERGE
command, perform the following step:
1.
From a SQL*Plus session logged on
to the SH schema, execute the following SQL script:
@do_merge_new.sql
Rem ***** Rem MERGE
Rem new and modified product information has arrived Rem use of external table
again Rem - seemlessly within the MERGE command !!! Rem *****
MERGE INTO products t
USING products_delta s ON ( t.prod_id=s.prod_id ) WHEN MATCHED THEN
UPDATE SET t.prod_name=s.prod_name, t.prod_list_price=s.prod_list_price, t.prod_min_price=s.prod_min_price
WHEN NOT MATCHED THEN INSERT (prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_id, prod_subcategory_desc, prod_category, prod_category_id,
prod_category_desc, prod_status, prod_list_price, prod_min_price, prod_total,
prod_total_id, prod_weight_class, prod_pack_size, supplier_id) VALUES
(s.prod_id, s.prod_name, s.prod_desc, s.prod_subcategory, s.prod_subcategory_id,
s.prod_subcategory_desc, s.prod_category, s.prod_category_id, s.prod_category_desc,
s.prod_status, s.prod_list_price, s.prod_min_price, s.prod_total, s.prod_total_id,
s.prod_weight_class, s.prod_pack_size, s.supplier_id) ;
ROLLBACK;
Note: You will
ROLLBACK so that you can reissue
the same upsert with two SQL statements in a subsequent operation.
3.
Showing the Execution Plan of the New MERGECommand
To look at the execution plan of the MERGE
statement, you perform the following steps:
1.
From a SQL*Plus session logged on
to the SH schema, run explain_merge_new.sql, or copy the following SQL statement
into your SQL*Plus session:
@explain_merge_new.sql
DELETE FROM plan_table;
COMMIT;
EXPLAIN PLAN FOR MERGE
INTO products t USING products_delta s ON ( t.prod_id=s.prod_id )
WHEN MATCHED THEN UPDATE SET t.prod_name=s.prod_name, t.prod_list_price=s.prod_list_price,
t.prod_min_price=s.prod_min_price WHEN NOT MATCHED THEN INSERT (prod_id,
prod_name, prod_desc, prod_subcategory, prod_subcategory_id, prod_subcategory_desc,
prod_category, prod_category_id, prod_category_desc, prod_status, prod_list_price,
prod_min_price, prod_total, prod_total_id, prod_weight_class, prod_pack_size,
supplier_id) VALUES (s.prod_id, s.prod_name, s.prod_desc, s.prod_subcategory,
s.prod_subcategory_id, s.prod_subcategory_desc, s.prod_category, s.prod_category_id,
s.prod_category_desc, s.prod_status, s.prod_list_price, s.prod_min_price,
s.prod_total, s.prod_total_id, s.prod_weight_class, s.prod_pack_size,
s.supplier_id) ;
4.
Performing an Upsert by Using Two Separate SQL Commands (Before Oracle9i)
Before Oracle9i, such upsert
functionality could be implemented as a combination of two SQL statements (insert
plus update) or in a procedural manner. To demonstrate how this was done, perform
the following steps:
1.
To leverage an updatable join view,
a primary or unique key must exist on the join column. Therefore, you cannot
use the external table directly. As a result, you need to create an intermediate
table in the database with the appropriate unique key constraint. Run
the following script:
do_merge_816.sql
CREATE TABLE prod_delta NOLOGGING
AS SELECT * FROM products_delta;
INSERT INTO products
(prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id, prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
prod_weight_class, prod_pack_size, supplier_id)
SELECT prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id, prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
prod_weight_class, prod_pack_size, supplier_id
FROM prod_delta s WHERE NOT EXISTS (SELECT 1 FROM products t WHERE t.prod_id
=
s.prod_id);
ROLLBACK;
You have to issue two separate SQL
statements to accomplish the same functionality than the new MERGE command.
To leverage the updatable join view functionality of 8.1.6, we also had
to copy the external table into a real database table and create a
unique index on the join column. In short, more operations, more space
requirements, and more processing time.
In the ETL process, the data extracted from a source system passes through
a sequence of transformations before it is loaded into a data warehouse. Complex
transformations are implemented in a procedural manner, either outside the database
or inside the database in PL/SQL. When the results of a transformation become
too large to fit into memory, they must be staged by materializing them either
into database tables or flat files. This data is then read and processed as input
to the next transformation in the sequence. Oracle table functions provide
support for pipelined and parallel execution of such transformations implemented
in PL/SQL, C, or Java.
A table function is defined as a function that can produce a set of rows
as output; additionally, Oracle table functions can take a set of rows as input.
These sets of rows are processed iteratively in subsets, thus enabling a pipelined
mechanism to stream these subset results from one transformation to the next before
the first operation has finished. Furthermore, table functions can be processed
transparently in parallel, which is equivalent to SQL operations such as a table
scan or a sort. Staging tables are no longer necessary.
Examining Basic Implementation Issues
for Table Functions
Because of the powerful and multifaceted usage of table functions as part
of complex transformations, you will examine basic implementation issues for table
functions only. To do this, perform the following steps:
1.
Setting Up the Basic Objects for a Table Function
A table function can produce a set of rows as output and can take a set
of rows as input. Therefore, it is imperative that you have to define a record
type and a collection of records to define the input and output; those types can
either be strongly typed, that is, well-defined prior to their execution, or weakly
typed, which means without a fixed record format before run time. You will examine
both variants.
To set up the table function objects, perform the following steps:
1.
Define the object (record) type for
the examples. From a SQL*Plus session logged on to the SH schema, execute the
following SQL script:
This is the basic object record type
you will use in the examples.
2.
Define the object (collection) type
for the examples. From a SQL*Plus session logged on to the SH schema, execute
the following SQL script:
@setup_tf_collection.sql
CREATE TYPE product_t_table
AS TABLE OF product_t; /
This type represents the structure
of the set of records, which will be delivered back by your table functions.
3.
Define a package for the REF CURSOR
types. From a SQL*Plus session logged on to the SH schema, execute the following
SQL script:
@setup_tf_package.sql
Rem package of all cursor
types Rem handle the input cursor type and the output cursor collection type
CREATE OR REPLACE PACKAGE cursor_PKG as TYPE product_t_rec IS RECORD (
prod_id NUMBER(6) , prod_name VARCHAR2(50) , prod_desc VARCHAR2(4000)
, prod_subcategory VARCHAR2(50) , prod_subcategory_desc VARCHAR2(2000)
, prod_category VARCHAR2(50) , prod_category_desc VARCHAR2(2000) , prod_weight_class
NUMBER(2) , prod_unit_of_measure VARCHAR2(20) , prod_pack_size VARCHAR2(30)
, supplier_id NUMBER(6) , prod_status VARCHAR2(20) , prod_list_price NUMBER(8,2)
, prod_min_price NUMBER(8,2)); TYPE product_t_rectab IS TABLE OF product_t_rec;
TYPE strong_refcur_t IS REF CURSOR RETURN product_t_rec; TYPE refcur_t IS
REF CURSOR; END;
4.
Create a log table for the table function
example with autonomous DML. From a SQL*Plus session logged on to the SH schema,
execute the following SQL script:
This table will be used for the table
function example that fans out some data as part of its execution.
Table functions return a set of records to the subsequent operation. The
result set can be either delivered en-block (the default PL/SQL behavior) or pipelined – increments of the result set are streamed to the subsequent operation as
soon as they are produced.
For table function implementations in PL/SQL, the PL/SQL engine controls
the size of the incrementally returned array. For interface implementations of
table functions, in C or Java, it is up to the programmer to define the incremental
return set.
2.
Performing Nonpipelined Table Function, Returning an Array of Records
The following example represents a simple table function, returning its
result set nonpipelined as a complete array. The table function filters out
all records of the product category "Boys" and returns all other records. Note that
the input REF CURSOR is weakly typed, meaning it could be any valid SQL statement.
It is then up to the table function code to handle its input, probably based on
other input variables. To perform the nonpipelined table function, you perform
the following steps:
1.
From a SQL*Plus session logged on
to the SH schema, run create_tf_1.sql, or copy the following SQL statements into
your SQL*Plus session:
@create_tf_1.sql
PROMPT
SIMPLE PASS-THROUGH, FILTERING OUT obsolete products without product_category
'Electronics'
Rem uses weakly typed cursor as input
CREATE OR REPLACE FUNCTION
obsolete_products(cur cursor_pkg.refcur_t) RETURN product_t_table IS
prod_id NUMBER(6); prod_name VARCHAR2(50); prod_desc VARCHAR2(4000);
prod_subcategory VARCHAR2(50); prod_subcategory_desc VARCHAR2(2000); prod_category
VARCHAR2(50); prod_category_desc VARCHAR2(2000); prod_weight_class NUMBER(2);
prod_unit_of_measure VARCHAR2(20); prod_pack_size VARCHAR2(30); supplier_id
NUMBER(6); prod_status VARCHAR2(20); prod_list_price NUMBER(8,2);
prod_min_price NUMBER(8,2); sales NUMBER:=0; objset product_t_table :=
product_t_table(); i NUMBER := 0; BEGIN LOOP
-- Fetch from cursor variable
FETCH cur INTO prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class, prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price, prod_min_price;
EXIT WHEN cur%NOTFOUND; -- exit when last row is fetched
-- Category Electronics is
not meant to be obsolete and will be suppressed IF prod_status='obsolete'
AND prod_category != 'Electronics' THEN -- append to collection i:=i+1;
objset.extend; objset(i):=product_t( prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc, prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure, prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price); END IF; END LOOP; CLOSE cur; RETURN objset;
END; /
2.
You cannot transparently SELECT from
the defined table function obsolete_products. From a SQL*Plus session logged on
to the SH schema, execute the following SQL script:
Note that the input SELECT list for the cursor variable must match the definition
of the table function, in our case pkg.refcur_t.
If you can ensure that the table definition (in this case products) matches this
input cursor, you can simplify the statement by using the SELECT * FROM TABLE
notation.
The following example represents the same filtering-out than the previous
example. Unlike the first one, this table function returns its result set pipelined,
which means incrementally. Note that the input REF CURSOR is strongly typed here,
meaning it could be only a SQL statement returning a record type of product_type_record.
To perform the pipelined table function, perform the following steps:
1.
From a SQL*Plus session logged on
to the SH schema, execute the following SQL script:
@create_tf_2.sql
Rem same example, pipelined
implementation Rem strong ref cursor (input type is defined) Rem a table
without a strong typed input ref cursor Rem cannot be parallelized
CREATE OR REPLACE FUNCTION
obsolete_products_pipe(cur cursor_pkg.strong_refcur_t) RETURN product_t_table
PIPELINED PARALLEL_ENABLE (PARTITION cur BY ANY) IS prod_id NUMBER(6);
prod_name VARCHAR2(50); prod_desc VARCHAR2(4000); prod_subcategory VARCHAR2(50);
prod_subcategory_desc VARCHAR2(2000); prod_category VARCHAR2(50); prod_category_desc
VARCHAR2(2000); prod_weight_class NUMBER(2); prod_unit_of_measure VARCHAR2(20);
prod_pack_size VARCHAR2(30); supplier_id NUMBER(6); prod_status VARCHAR2(20);
prod_list_price NUMBER(8,2); prod_min_price NUMBER(8,2); sales NUMBER:=0;
BEGIN LOOP
-- Fetch from cursor variable
FETCH cur INTO prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class, prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price, prod_min_price;
EXIT WHEN cur%NOTFOUND; -- exit when last row is fetched IF prod_status='obsolete'
AND prod_category !='Electronics' THEN PIPE ROW (product_t( prod_id, prod_name,
prod_desc, prod_subcategory, prod_subcategory_desc, prod_category, prod_category_desc,
prod_weight_class, prod_unit_of_measure, prod_pack_size, supplier_id,
prod_status, prod_list_price, prod_min_price)); END IF; END LOOP;
CLOSE cur; RETURN; END; /
2.
You can now transparently SELECT from
the defined table function obsolete_products. From a SQL*Plus session logged on
to the SH schema, execute the following SQL script:
4.
Performing Transparent Parallel Execution of Table Functions
Table functions can be parallelized transparently without any special intervention
or setup. The only necessary declaration, which has to be done as part of the
table function creation, is to tell the SQL engine if there are any restrictions
or rules for parallelizing its execution.
Think of a table function that processes a complex aggregation operation
over some product_category attributes. If you want to process this operation in
parallel, you have to guarantee that all records with the same product category
attributes are processed from the same parallel execution slave, so that your
aggregation operation covers all records that belong to the same grouping. This
implies that, during parallel processing, a distribution rule must be used for
parallel processing, which would guarantee correct distribution as described above.
This parallel distribution rule is defined as part of the table function
header, for example:
PARALLEL_ENABLE (PARTITION cur
BY ANY) IS
That's all a developer has to do to enable parallel execution of a table
function. To perform one of the table functions you previously defined in parallel,
perform the following steps:
1.
From a SQL*Plus session logged on
to the SH schema, execute the following SQL script:
@pq_session.sql
SELECT * FROM v$pq_sesstat
WHERE statistic in ('Queries Parallelized','Allocation Height');
2.
Now you can use one of the table functions
in parallel. You enforce the parallel execution with a PARALLEL hint on the REF
CURSOR. From a SQL*Plus session logged on to the SH schema, execute the following
SQL script:
Although a table function is part of a single atomic transaction, it provides
the additional capability to fan out data to other tables within the scope of
an autonomous transaction. This can be used in a variety of ways, such as for
exception or progress logging or to fan out subsets of data to be used by other
independent transformations. The following example logs some "exception" information into the OBSOLETE_PRODUCTS_ERRORS
table:
1.
From a SQL*Plus session logged on
to the SH schema, execute the following SQL script:
@create_tf_3.sql
CREATE OR REPLACE FUNCTION
obsolete_products_dml(cur cursor_pkg.strong_refcur_t, prod_cat varchar2 DEFAULT
'Electronics') RETURN product_t_table PIPELINED PARALLEL_ENABLE (PARTITION
cur BY ANY) IS PRAGMA AUTONOMOUS_TRANSACTION; prod_id NUMBER(6); prod_name
VARCHAR2(50); prod_desc VARCHAR2(4000); prod_subcategory VARCHAR2(50);
prod_subcategory_desc VARCHAR2(2000); prod_category VARCHAR2(50); prod_category_desc
VARCHAR2(2000); prod_weight_class NUMBER(2); prod_unit_of_measure VARCHAR2(20);
prod_pack_size VARCHAR2(30); supplier_id NUMBER(6); prod_status VARCHAR2(20);
prod_list_price NUMBER(8,2); prod_min_price NUMBER(8,2); sales NUMBER:=0;
BEGIN LOOP -- Fetch from cursor variable FETCH cur INTO prod_id, prod_name,
prod_desc, prod_subcategory, prod_subcategory_desc, prod_category, prod_category_desc,
prod_weight_class, prod_unit_of_measure, prod_pack_size, supplier_id,
prod_status, prod_list_price, prod_min_price; EXIT WHEN cur%NOTFOUND;
-- exit when last row is fetched IF prod_status='obsolete' THEN IF prod_category=prod_cat
THEN INSERT INTO obsolete_products_errors VALUES (prod_id, 'correction:
category '||UPPER(prod_cat)||' still available'); COMMIT; ELSE PIPE
ROW (product_t( prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class, prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price, prod_min_price));
END IF; END IF; END LOOP; CLOSE cur; RETURN; END; /
2.
Now you can select from this table
function and check the log table. From a SQL*Plus session logged on to the SH
schema, execute the following SQL script:
SELECT DISTINCT msg FROM obsolete_products_errors;
You see that, dependent on the input variable,
the table function processes the input rows differently, thus delivering a different
result set and inserting different information into obsolete_products_error.
6.
Performing Seamless Streaming Through Several Table Functions
Besides the transparent usage of table functions within SQL statements
and its capability of being processed in parallel, one of the biggest advantages
is that table functions can be called from within each other. Furthermore, table
functions can be used in any SQL
statement and become the input for all kinds of DML statements. To see how this is done, perform the following steps:
1.
This SQL statement uses two table
functions, nested in each other. From a SQL*Plus session logged on to the SH schema,
execute the following SQL script:
@use_tf_stream.sql
SELECT DISTINCT prod_category,
prod_status FROM TABLE(obsolete_products_dml(CURSOR(SELECT * FROM TABLE(obsolete_products_pipe(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class, prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price, prod_min_price
FROM products))))));
SELECT COUNT(*) FROM obsolete_products_errors;
SELECT DISTINCT msg FROM obsolete_products_errors;
set echo off
2.
Now you can use this table function
as INPUT for a CREATE
TABLE AS SELECTcommand. From a SQL*Plus session logged on to the
SH schema, execute the following SQL script:
@CTAS_tf.sql
CREATE TABLE PIPE_THROUGH
AS SELECT DISTINCT prod_category, prod_status FROM TABLE(obsolete_products_dml(CURSOR(SELECT
* FROM TABLE(obsolete_products_pipe( CURSOR(SELECT prod_id, prod_name,
prod_desc, prod_subcategory, prod_subcategory_desc, prod_category, prod_category_desc,
prod_weight_class, prod_unit_of_measure, prod_pack_size, supplier_id,
prod_status, prod_list_price, prod_min_price FROM products))))));
3.
Before you move to the next section
of this tutorial, you need to clean up your environment by executing the following
script:
@cleanup_tf.sql
DROP TABLE obsolete_products_errors;
DROP TABLE pipe_through; DROP FUNCTION obsolete_products; DROP FUNCTION
obsolete_products_pipe; DROP FUNCTION obsolete_products_dml;
Using Synchronous
Change Data Capture (CDC) to Capture and Consume Incremental Source Changes
Data warehousing involves the extraction and transportation of relational
data from one or more source databases, into the data warehouse for analysis.
Change Data Capture quickly identifies and processes only the data that has changed,
not entire tables, and makes the change data available for further use.
Without Change Data Capture, database extraction is a cumbersome process
in which you move the entire contents of tables into flat files, and then load
the files into the data warehouse. This ad hoc approach is expensive in a number
of ways.
Change Data Capture does not depend on intermediate flat files to stage
the data outside of the relational database. It captures the change data resulting
from INSERT, UPDATE, and DELETE operations made to user tables. The change data
is then stored in a database object called a change table, and the change data
is made available to applications in a controlled way.
PUBLISH AND SUBSCRIBE MODEL
Most Change Data Capture systems have one publisher that captures and publishes
change data for any number of Oracle source tables. There can be multiple subscribers
accessing the change data. Change Data Capture provides PL/SQL packages to accomplish
the publish and subscribe tasks.
Publisher
The publisher is usually a database administrator (DBA) who is in charge
of creating and maintaining schema objects that make up the Change Data Capture
system. The publisher performs the following tasks:
Determines the relational tables (called source tables) from which the
data warehouse application is interested in capturing change data.
Uses the Oracle supplied
package, DBMS_LOGMNR_CDC_PUBLISH, to set up the system to capture data from one
or more source tables.
Publishes the change
data in the form of change tables.
Allows controlled access to subscribers by using the SQL GRANT and REVOKE
statements to grant and revoke the SELECT privilege on change tables for users
and roles.
Subscriber
The subscribers, usually applications, are consumers of the published change
data. Subscribers subscribe to one or more sets of columns in source tables. Subscribers
perform the following tasks:
Use the Oracle supplied
package, DBMS_LOGMNR_CDC_SUBSCRIBE, to subscribe to source tables for controlled
access to the published change data for analysis.
Extend the subscription
window and create a new subscriber view when the subscriber is ready to receive
a set of change data.
Use SELECT statements
to retrieve change data from the subscriber views.
Drop the subscriber view and purge the subscription window when finished
processing a block of changes.
Drop the subscription
when the subscriber no longer needs its change data.
To learn more about this topic, perform the following steps:
1.
Using Synchronous CDC to Track all the Incremental Changes
You will use synchronous CDC to track all the incremental changes for the
table PRODUCTS by performing the following steps ( For demonstration purposes
the publisher and subscriber will be the same database user.):
1.
First create a new intermediate table
where you apply all changes. From a SQL*Plus session logged on to the SH schema,
run cr_cdc_target.sql, or copy the following SQL statements into your SQL*Plus
session:
If you never used Change Data Capture
before, you won't see any objects related to CDC other than the synchronous change
set SYNC_SET and the synchronous change source SYNC_SOURCE. Both of these objects
are created as part of the database creation.
@show_cdc_env1.sql
SELECT * FROM change_sources;
PROMPT see the change tables
Rem shouldn't show anything
SELECT * FROM change_tables;
PROMPT see the change sets
SYNC_SET
SELECT decode(to_char(end_date,'dd-mon-yyyy
hh24:mi:ss'),null,'No end date set.') end_date, decode(to_char(freshness_date,'dd-mon-yyyy
hh24:mi:ss'),null,'No freshness date set.') freshness_date FROM change_sets
WHERE set_name='SYNC_SET';
All changes are stored in change tables. Change tables have a one-to-one
dependency to a source table; they consist of several fixed metadata columns and
a dynamic set of columns equivalent to the identified source columns of interest.
1.
You create a change table by using
the provided package DBMS_CDC_PUBLISH.
Note that the old package name DBMS_CDC_LOGMNR_PUBLISH
is a synonym pointing to DBMS_CDC_PUBLISH and is maintainted as-is for backward
compatibility.
@cr_cdc_ct.sql
rem create a change table
within the change set SYNC_SET. Rem details on parameters see doc
PROMPT *** 10g *** PROMPT
NOTE THAT DBMS_LOGMNR_CDC_* are synonyms for the DBMS_CDC_* packages and are only
around PROMPT for backwards compatibility
This script creates a change table
called PROD_PRICE_CT and the necessary trigger framework to track all subsequent
changes on PRODUCTS. Note that the tracking is done as part of the atomic DML
operation against PRODUCTS.
2.
Describe the change table PROD_PRICE_CT:
Desc
prod_price_ct
See the static metadata columns as
well as the columns PROD_ID, PROD_MIN_PRICE, and PROD_LIST_PRICE (equivalent to
source table PRODUCTS).
3.
To see the metadata of the published
source table and the change tables in the system, issue the following SQL commands:
@show_cdc_env2.sql
prompt see the published source
tables SELECT * FROM dba_source_tables;
prompt see published source
columns SELECT source_schema_name,source_table_name, column_name FROM
dba_source_tab_columns;
prompt see the change tables
SELECT * FROM change_tables;
3.
Subscribing to a Change Set and to All Source Table Columns of Interest
Anybody who's interested in the change information can subscribe to a change
set for controlled access to the published change data for analysis. Note that
the logical entity for a subscription handle is a change set and not a change
table. A change set can consist of several change tables and guarantees logical
consistency for all of its change tables. After subscribing to a change set, you
have to subscribe to all source table columns of interest.
1.
You must get a unique subscription
handle that is used throughout the session and tell the system what columns you
are interested in. You are using the existing change set SYNC_SET. This functionality
is provided by the package DBMS_CDC_SUBSCRIBE.
@subs_cdc_ct.sql
variable subname varchar2(30)
begin :subname := 'my_subscription_no_1'; DBMS_CDC_SUBSCRIBE.CREATE_SUBSCRIPTION
( CHANGE_SET_NAME => 'SYNC_SET', DESCRIPTION => 'Change data PRODUCTS
for ELECTRONICS', SUBSCRIPTION_name => :subname); END; /
PRINT subname
2.
You can query the meta information
about your subscription.
SELECT * FROM user_subscribed_tables;
SELECT source_table_name, column_name FROM dba_subscribed_columns;
3.
After creating a subscription, you
set up a change view for this subscription. The complete handling of change views
was enhanced with Oracle Database 10g: you can name a change table or you do
not have to drop change views to extend and/or purge subscriber windows.
@cr_cdc_rv.sql
rem now subscribe to a single
source table and columns of interest. rem A subscription can contain one or
more tables from the same change set.
PROMPT ***10g*** Rem use
new interface to name subscriber view explictly !! variable view_name varchar2(30);
4.
Activating a Subscription and Extending the Subscription Window
The first step after setting up the publish and subscribe framework is
to activate a subscription and to extend the subscription window to see all changes
that have taken place since the last subscription window extension or since the
activation of a subscription.
1.
Activate and extend a subscription
window. You will also have a look into the metadata.
@sub_cdc1.sql
rem now activate the subscription
since we are ready to receive rem change data the ACTIVATE_SUBSCRIPTION procedure
sets rem subscription window to empty initially rem At this point, no
additional source tables can be added to the rem subscription
Any changes you apply now to the source
table PRODUCTS is reflected in the change table. The changes are transparently
maintained with triggers on the source table.
@dml_cdc1.sql
Rem now do some changes
UPDATE products SET prod_list_price=prod_list_price*1.1
WHERE prod_min_price > 100; COMMIT;
Rem you will see entries in
the change table Rem note that we have entries for the old and the new values
SELECT Count(*) FROM prod_price_ct;
Note that you have two records for
each source row, one representing the old, and the other the new value set for
the subscribed columns.
Again, please never use the change table itself for identifying
the changes on a source table. Use the subscriber view that will be
created for you. This is the only supported way to guarantee that every
change is delivered only once to a subscribing application.
3.
Unlike the change table, the change
view does not show any records yet. This is because you applied the DML operation
AFTER you extended the subscription window. Consequently you must not see anything
yet.
Rem the view does not show
anything select count(*) from MY_PROD_PRICE_CHANGE_VIEW;
You are now preparing for consuming
the changes by extending the subscription window.
@ext_cdc_sub1.sql
rem now set upper boundary
(high-water mark) for the subscription window rem At this point, the subscriber
has created a new window that begins rem where the previous window ends.
rem The new window contains any data that was added to the change table.
Rem ... and you will see data
SELECT count(*) FROM my_prod_price_change_view;
4.
You can now select from this system-generated
change view, and you will see only the changes that have happened for your time
window.
@sel_cdc_cv1.sql
Rem changes classified for
specific product groups
SELECT p1.prod_id, p2.prod_category,
p1.prod_min_price, p1.prod_list_price, commit_timestamp$ FROM my_prod_price_change_view
p1, products p2 WHERE p1.prod_id=p2.prod_id AND operation$='UN';
PROMPT and especially the
Electronics' ones - 3 records only
SELECT p1.prod_id, p2.prod_category,
p1.prod_min_price, p1.prod_list_price, commit_timestamp$ FROM my_prod_price_change_view
p1, products p2 WHERE p1.prod_id=p2.prod_id AND operation$='UN' AND
p2.prod_category='Electronics';
In the example shown above, you are
joining back to the source table PRODUCTS. When you set up a CDC environment,
please ensure that such an operation should not be necessary; if possible, a change
table should be usable as stand-alone source for synchronizing your target environment.
Joining back to the source table may reduce the data stored on disk, but it
creates additional workload on the source site. Join-back cannot be accomplished on a different system
than the source system.
5.
You can now consume the changes on
your target system.
@consume_cdc1.sql
Prompt ... you can now consume
the changes rem arbitrary example, where we only track ELECTRONICS changes.
This shall demonstrate rem the flexibility as well as the responsibility of
the client site (consumer) to deal rem appropriately with the changes
INSERT into my_price_change_electronics
SELECT p1.prod_id, p1.prod_min_price, p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2 WHERE p1.prod_id=p2.prod_id
AND p2.prod_category='Electronics' AND operation$='UN';
COMMIT;
SELECT prod_id, prod_min_price,
prod_list_price, to_char(when,'dd-mon-yyyy hh24:mi:ss') FROM my_price_change_electronics;
5.
Investigating How to Handle the New Environment Over Time
So far you exercised your first consumption with the synchronous CDC framework.
Now investigate how to handle such an environment over time when additional changes
will happen and old changes are consumed by all subscribers.
1.
Other DML operations take place on
the source table PRODUCTS.
@dml_cdc2.sql
PROMPT other changes will
happen UPDATE products SET prod_min_price=prod_min_price*1.1 WHERE
prod_min_price < 10;
COMMIT;
2.
The synchronous CDC framework tracks
these DML operations transparently in the change table.
@show_cdc_ct2.sql
SELECT count(*) FROM prod_price_ct
;
PROMPT and especially the
ELECTRONICS' ones
SELECT COUNT(*) FROM prod_price_ct
p1, products p2 WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';
3.
@sel_cdc_cv1.sql
Rem changes classified for
specific product groups
SELECT p1.prod_id, p2.prod_category,
p1.prod_min_price, p1.prod_list_price, commit_timestamp$ FROM my_prod_price_change_view
p1, products p2 WHERE p1.prod_id=p2.prod_id AND operation$='UN';
PROMPT and especially the
Electronics' ones - 3 records only
SELECT p1.prod_id, p2.prod_category,
p1.prod_min_price, p1.prod_list_price, commit_timestamp$ FROM my_prod_price_change_view
p1, products p2 WHERE p1.prod_id=p2.prod_id AND operation$='UN' AND
p2.prod_category='Electronics';
You haven't extended the time window
for the change view yet.
4.
To tell the CDC framework that you
are done with a set of changes, you simply have to purge and extend the subscrtiption.
@purge_cdc_sub_window1.sql
PROMPT purge old data from the subscription window
Rem this will NOT delete any
records from the change table. It just tells the CDC system Rem that the changed
data used so fasr is no longer needed
EXEC DBMS_CDC_SUBSCRIBE.PURGE_WINDOW
- (SUBSCRIPTION_name => 'my_subscription_no_1'); PROMPT still the same
number of records in the change table PROMPT REMEMBER THE NUMBER OF ROWS !!!
SELECT COUNT(*) FROM prod_price_ct; PROMPT ... change view is empty
SELECT COUNT(*) FROM my_prod_price_change_view
p1, products p2 WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';
5.
To get the new changes reflected in
the change view, you have to extend the time window for the change view.
@ext_cdc_sub_window1.sql
PROMPT let's get the new change
Rem 'do it again Sam' Rem first extend the window you want to see EXEC
DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW - (SUBSCRIPTION_name => 'my_subscription_no_1');
Rem ... now you will see exactly the new changed data since the last consumption
SELECT COUNT(*) FROM my_prod_price_change_view p1, products p2 WHERE p1.prod_id=p2.prod_id
AND p2.prod_category='Electronics';
The new change view shows exactly
the changes that were not consumed yet. This is different than the content in
the change table.
6.
Now consume the new changes. Because
the incremental changes you are interested in are stored in a change table (a
"normal" Oracle table), you can use any language or SQL construct
the database supports.
@consume_cdc2.sql
PROMPT the new changes will
be used
MERGE INTO my_price_change_Electronics
t USING ( SELECT p1.prod_id, p1.prod_min_price, p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2 WHERE p1.prod_id=p2.prod_id
AND p2.prod_category='Electronics' AND operation$='UN') cv ON (cv.prod_id
= t.prod_id) WHEN MATCHED THEN UPDATE SET t.prod_min_price=cv.prod_min_price,
t.prod_list_price=cv.prod_list_price WHEN NOT MATCHED THEN INSERT VALUES
(cv.prod_id, cv.prod_min_price, cv.prod_list_price, commit_timestamp$);
COMMIT;
rem look at them
SELECT prod_id, prod_min_price,
prod_list_price, to_char(when,'dd-mon-yyyy hh24:mi:ss') FROM my_price_change_electronics;
The Publisher is responsible for maintaining the CDC framework and to guarantee
that the change tables are purged regularly as soon as all subscribers have consumed
a specific set of change information.
1.
From a SQL*Plus session logged on
to the SH schema, run purge_ct.sql, or copy the following SQL statements into
your SQL*Plus session:
@purge_cdc_ct.sql
PROMPT what to do to avoid
overflow of change table ? PURGE IT !
Rem only the rows where we
have no potential subscribers will be purged !!!
Rem we guarantee to keep all
rows as long as we have subscribers which haven't rem purged their window
...
exec DBMS_CDC_PUBLISH.purge_change_table('sh','prod_price_ct')
PROMPT REMEMBER THE NUMBER OF ROWS !!!
Rem you will have 18 entries
for the last updated 9 records
SELECT COUNT(*) FROM prod_price_ct;
PROMPT this is exactly twice
the number of changes we made with the second DML operation
SELECT COUNT(*) FROM prod_price_ct
p1, products p2 WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';
Although we haven’t specified
any specific time window for the purge operation of the change table, the CDC
framework ensures that only those changes will be purged that are no longer needed
by a subscriber.
7.
Dropping the Used Change View and Purging the Subscription Window
1.
Because you consumed the second set
of change already, you can drop your used change view and purge your subscription
window. You will also purge the change table.
@purge_cdc_sub_window2.sql
PROMPT ... purge the newly
consumed data from the subscription window (2nd DML operation))
Propagation from a
Data Warehouse to a Data Mart Overview
Besides its central data warehouse, MyCompany is running several small
divisional data marts. For example, the products department wants to get all transactional
SALES data partitioned by its main product
categories for marketing campaign analysis; only the SALES data of 2000 is relevant.
You will address the business problem by using Oracle’s Transportable Tablespace
capabilities and Oracle’s LIST partitioning. Furthermore, to guarantee a
successful completion of the generation process of this new table, you will run
this statement in RESUMABLE mode, thus ensuring that any
space problem will not cause the creation process to fail.
Note: To demonstrate
the benefit of resumable statements, you need TWO sessions and therefore TWO windows.
Please read the following section CAREFULLY before taking the exercise.
To propagate from a data warehouse to a data mart, you perform the following
steps:
With resumable statements, Oracle provides the ability to suspend and resume
the execution of large database operations in the event of repairable failure. Currently
the types of errors from which a statement can resume are space limit and out-of-space
errors. When an operation suspends, you have the opportunity to take the corrective
steps to resolve the error condition. Alternatively, a procedure can be registered
to automate the error correction. After the error condition is solved, the suspended
statement will automatically resume and continues operation. If the error is not
corrected within an optionally specified suspension time limit, the statement
will finally fail. You will choose a manual error correction for this exercise.
To see how this is done, perform the following step:
1.
This SQL statement uses two table
functions, nested in each other. From a SQL*Plus session logged on to the SH schema,
execute the following SQL script:
@set_resumable.sql
ALTER SESSION ENABLE RESUMABLE
TIMEOUT 1200 NAME 'create list partitioning';
This brings your session in the so-called
resumable mode, names it, and enables a maximum suspension time of 1200 seconds.
2.
Creating a New Tablespace as Potential Transportable Tablespace
Now create an additional tablespace for storing our new LIST partitioned
fact table. To do this, perform the following steps:
1.
First, make sure that the potential
tablespace TRANSFER does not
exist. From a SQL*Plus session logged on to the SH schema, execute the following
SQL script:
@drop_TS.sql
DROP TABLESPACE my_obe_transfer INCLUDING
CONTENTS AND DATAFILES;
The tablespace is dropped. If you
get an ORA-959 message, you can ignore it; this means that the tablespace does
not exist.
2.
From a SQL*Plus session logged on
to the SH schema, execute the following SQL script:
NOTE: If you are
using a different drive or path for your data files, the script will need to be
edited accordingly.
AS SELECT p.prod_category,
p.prod_subcategory, s.cust_id, s.time_id,s.channel_id, s.promo_id, SUM(s.amount_sold)
amount_sold, SUM(s.quantity_sold) quantity_sold FROM sales s, products p,
times t WHERE p.prod_id=s.prod_id AND s.time_id = t.time_id AND t.fiscal_year=2000
GROUP BY prod_category, prod_subcategory,cust_id, s.time_id,channel_id, promo_id ;
4.
Leveraging the New RESUMABLE Statement Capabilities
for Efficient Error Detection and Handling
While the CREATE TABLE AS SELECT is running in Window
1, you can control the status of all sessions running in RESUMABLE mode in Window 2.
1.
From a SQL*Plus session logged on
to the SH schema, execute the following SQL statement:
SELECT NAME, STATUS, ERROR_MSG
FROM dba_resumable;
As long as no error occurs, the session
stays in a normal status. As soon as an error occurs, your session status changes,
and the ORA-error causing this problem is displayed. The current statement
in the RESUMABLE session will
be suspended until either (a) the problem is fixed, or (b) the timeout limit is
reached. The error will also show up in the alert.log file of the
running instance.
2.
Now fix the problem manually. Issue
the individual tablespace/datafile definitions of the following commands. From a SQL*Plus
session logged on to the SYSTEM Schema, execute the following SQL script:
@fix_ts.sql
ALTER DATABASE DATAFILE '/tmp/tt'
AUTOEXTEND ON NEXT 5M;
The tablespace can now be autoextended
by the database itself without any further interaction.
3.
As soon as the error is fixed, the
suspended session automatically resumes and continues processing. Check the data
dictionary view for change of status. From a SQL*Plus session logged on to the
SH schema, check the status again:
SELECT NAME, STATUS, ERROR_MSG
FROM dba_resumable;
The resumption of the statement also
shows up in the alert.log file of the running instance and in the previously hung
window.
5.
Creating a Range-List Partitioned Table in the Transfer Tablespace
With the additional partitioning strategy RANGE-LIST, Oracle offers a composite
partitioning method to subdivide a partitioned table based on two logical attributes.
The main partitioning strategy is the well-known and most used Range partitioning
strategy. You'll find this implementation in nearly every data warehouse implementation
dealing with Rolling Window Operations. Furthermore, every Range partition is
partitioned based on a List Partitioning strategy, enabling a finer partition
granule for addressing your business needs.
A typical example leveraging Range-List Partitioning is a global Retail
environment, having time range partitions (Rolling Window) and a region-oriented
list partition underneath, so that you can maintain every time window for a specific
region independent of each other.
List partitioned tables enable you to group sets of distinct unrelated
values together in one partition. Consequently, all values not being covered with
an existing partition definition will raise an Oracle error message. Although
this enforces a business constraint (you should not see a non-existing region
in your data warehouse), you cannot always be sure to not having such violations.
Oracle therefore introduces the capability to create a DEFAULT List Partition,
a kind of catch-all partition for all undefined partition key values.
Oracle has also introduced the so-called subpartition template technology,
a common optional element of both range-hash and range-list composite partitioning.
The template provides an easy and convenient way to define default subpartitions
for each table partition. Oracle will create these default subpartitions in any
partition for which you do not explicitly define subpartitions. This clause is
useful for creating symmetric partitions.
In this section, you will create a Range-List partitioned table, leveraging
all of the functionality mentioned above.
1.
From a SQL*Plus session logged on
to the SH schema, run the following SQL statements into your SQL*Plus session:
@create_new_range_list.sql
CREATE TABLE sales_rlp
COMPRESS TABLESPACE MY_OBE_TRANSFER PARTITION BY RANGE (time_id) SUBPARTITION
BY LIST (channel_id) SUBPARTITION TEMPLATE ( SUBPARTITION direct values
(3), SUBPARTITION internet values (4), SUBPARTITION partner values (2),
SUBPARTITION other values (DEFAULT) ) (PARTITION SALES_before_1999 VALUES
LESS THAN (TO_DATE('01-JAN-1999','DD-MON-YYYY')), PARTITION SALES_Q1_1999
VALUES LESS THAN (TO_DATE('01-APR-1999','DD-MON-YYYY')), PARTITION SALES_Q2_1999
VALUES LESS THAN (TO_DATE('01-JUL-1999','DD-MON-YYYY')), PARTITION SALES_Q3_1999
VALUES LESS THAN (TO_DATE('01-OCT-1999','DD-MON-YYYY')), PARTITION SALES_Q4_1999
VALUES LESS THAN (TO_DATE('01-JAN-2000','DD-MON-YYYY')), PARTITION SALES_Q1_2000
VALUES LESS THAN (TO_DATE('01-APR-2000','DD-MON-YYYY')), PARTITION SALES_Q2_2000
VALUES LESS THAN (TO_DATE('01-JUL-2000','DD-MON-YYYY')), PARTITION SALES_Q3_2000
VALUES LESS THAN (TO_DATE('01-OCT-2000','DD-MON-YYYY')), PARTITION SALES_Q4_2000
VALUES LESS THAN (MAXVALUE) NOCOMPRESS) AS SELECT * FROM sales sample(10);
PROMPT as you can see, the
partition and template name is inherited to all the subpartitions, PROMPT
thus making the naming - and identification - of subpartitions much easier ...
SELECT partition_name, subpartition_name,
high_value from user_tab_subpartitions WHERE table_name='SALES_RLP';
This example also shows how you can
easily create compressed and uncompressed partitions as part of an initial table
creation. The partition and template names are inherited by all the subpartitions,
thus making the naming and identification of subpartitions much easier and more
convenient.
2.
From a SQL*Plus session logged on
to the SH schema, run the following SQL statements into your SQL*Plus session:
@show_range_list_names.sql
PROMPT as you can see, the
partition and template name is inherited to all the subpartitions, PROMPT
thus making the naming - and identification - of subpartitions much easier ...
select partition_name, subpartition_name,
high_value from user_tab_subpartitions where table_name='SALES_RLP';
A List and Range-List partitioned
table group sets distinct unrelated values together in one partition. Consequently,
all values not being covered with an existing partition definition will raise
an Oracle error message. We will now create a DEFAULT List Partition for the previously
created List partitioned table sales_prod_dept, a kind of catch-all partition
for all undefined partition key values.
3.
From a SQL*Plus session logged on
to the SH schema, run the following SQL statements into your SQL*Plus session:
@cr_default_list_part.sql
PROMPT add a new partition
to sales_prod_dept that does NOT have a DEFAULT partition Rem it is added
just like for a range partitioned table ALTER TABLE sales_prod_dept ADD PARTITION
gameboy_sales VALUES ('Gameboy'); PROMPT now add another one, covering the
DEFAULT value ALTER TABLE sales_prod_dept ADD PARTITION all_other_sales VALUES
(DEFAULT); PROMPT control the data dictionary
select partition_name, high_value
from user_tab_partitions where table_name='SALES_PROD_DEPT';
All records having an undefined partitioning
key will be stored in this new partition other_sales. However, having a DEFAULT
partition changes the way you’re adding new partitions to this table. Conceptually,
you cover all possible values for the partitioning key—the defined ones
and "all others"—as soon as you have created a default partition.
So, to ADD a new partition to this
table, you logically have to SPLIT the default partition into a new partition
with a set of defined keys and a new default partition, still covering "all
other values" (now reduced by the keys we have specified for the new partition).
Any attempt to add a new partition like you did before will raise an ORA error
message.
4.
From a SQL*Plus session logged on
to the SH schema, run the following SQL statements into your SQL*Plus session:
@split_default_list_part.sql
PROMPT Unlike the first time,
we cannot simply add a new partition PROMPT raises ORA-14323: cannot add partition
when DEFAULT partition exists Rem we cannot be sure whether the new value
already exists in the DEFAULT partition
ALTER TABLE sales_prod_dept
ADD PARTITION undefined_sales VALUES ('Undefined');
This will raise an Oracle error. Run
the following commmands or the script split_default_list_part_b.sql
PROMPT so we have to SPLIT
the default partition to ensure that PROMPT all potential values of 'Undefined'
are in the new partition
ALTER TABLE sales_prod_dept
SPLIT PARTITION other_sales VALUES ('Undefined') INTO (PARTITION undefined_sales,
PARTITION other_sales);
PROMPT control the data dictionary
PROMPT Note that without specifying any tablespace, the default PROMPT tablespace
of the partitioned table is used
select partition_name, tablespace_name,
high_value from user_tab_partitions where table_name='SALES_PROD_DEPT';
6.
Preparing the Metadata Export for a Transportable Tablespace
To prepare a tablespace for transportation, you simply have to export the
metadata information of the tablespace. This export, together with a physical
copy of the tablespace, can be used for importing the tablespace into another database
instance. Before Oracle9i, the target database must be running on the same operating
system with the same database block size. Oracle9i lifts the restricition of the same block
size. Oracle Database 10g lifts
the restriction even more by introducing heterogeneous transportable tablespaces.
To show this capability, you perform the following steps:
1.
From a SQL*Plus session logged on
to the SH schema, execute the following SQL script:
@make_ts_ro.sql
ALTER TABLESPACE my_obe_transfer
READ ONLY;
This guarantees that no further changes
can happen on the data in the tablespace.
2.
You can now export the data dictionary
information of tablespace my_obe_ transfer as follows:
@export_metadata.sql
Optionally, you can create a specific
directory where you want to store the export dump file:
CREATE DIRECTORY
my_obe_dump_dir as '/home/oracle/wkdir/';
Now you can export the metadata only.
Please compare the file size with the tablespace size to get a feeling what it
would have meant to extract all the data and not only the metadata.
expdp \'/
as sysdba\' DIRECTORY=my_obe_dump_dir DUMPFILE= meta_MY_OBE_TRANSFER.dmp
Note: The same
export syntax can be used at the DOS prompt by dropping the SQL*Plus "host" command.
3.
Before you import the tablespace information
into another database, you will want to clean up and reset the session status
by executing the SQL script:
@cleanup_tts.sql
DROP TABLESPACE transfer INCLUDING
CONTENTS AND DATAFILES; ALTER SESSION DISABLE RESUMABLE;
4.
Trying to access table sales_prod_dept
raises an error. The table does not exist any longer.