| |
Using Basic Database Functionality for Data Warehousing
Module Objectives
Purpose
In this module, you will learn how to use basic Oracle9i
database functionality to query and improve performance in a data warehouse.
Objectives
After completing this module, you should be able to:
 |
Compare star query transformation with other, different
access plans |
 |
Use the new bitmap join indexes |
 |
Examine the differences between B-tree and bitmap indexes
for data warehousing |
 |
Become familiar with Oracle advanced partitioning mechanisms
for improved query performance |
 |
Use the new automatic gathering of statistics |
Prerequisites
Before starting this module, you should have:
Reference Material
The following is a list of useful reference material if you want additional
information about the topics in this module:
 |
Documentation: Data Warehousing Guide
|
Overview
Oracle8i is
the leading relational database for data warehousing. Oracle has achieved
this success by focusing on basic, core requirements for data warehousing:
performance, scalability, and manageability. Oracle7 (Release 7.3), Oracle8,
and Oracle8i
each provided significant new capabilities to meet these core requirements.
Data warehouses will store larger volumes of data, support more users,
and require faster performance, so these core requirements remain key
factors in the successful implementation of data warehouses. Oracle9i
continues to focus on these core requirements, with significant enhancements
to every facet of Oracle8is
data-warehouse capabilities.
Comparing Oracle's Star Query Transformations with Other, Different
Access Plans
In this topic, you will examine Oracles superior query execution
capabilities for data warehousing, namely Oracles star transformation,
which was first introduced in Oracle8.0. You will compare the execution
plans for one distinct query using star transformation with other plans
less suitable for handling large volumes of data, in order of decreasing
performance.
Note: Because of the small amount of data used in the hands-on
activity, you wont see the great benefit of this transformation
(or even, a detrimental behavior). This is because the database or file
system cache is hardly exceeded, so that I/O disadvantages of the other
mechanisms are mostly eliminated.
| 1. |
From a SQL*Plus session logged on to the SH schema, run show_star1.sql,
or copy the following SQL statements into your SQL*Plus session:
The following ALTER SESSION
command enables Oracles star query transformation without
the use of temporary tables.
Beginning with Oracle8i,
temporary tables might be used by the optimizer to further improve
a star transformation. The next example will illustrate the difference
between a true star transformation (introduced in Oracle8i)
and the behavior shown here.
Note: The NOREWRITE hint is used to avoid any interference
with possibly existing materialized views. Alternatively, you could
disable query_rewrite for this particular session, or the complete
instance. This is true for all subsequent statements.
The STAR_TRANSFORMATION hint is to enforce the ability within Oracle
to use star transformation even with a small set of data, where
another plan might be better.
@show_star1.sql
ALTER SESSION SET star_transformation_enabled=TEMP_DISABLE;
show parameters star_transformation
set sqlprompt "STAR - NO TEMP - SQL> "
TRUNCATE TABLE plan_table;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */
t.calendar_month_desc
, p.prod_subcategory
, c.channel_desc
, sum(s.amount_sold) AS dollars
FROM sales s
, times t
, products p
, channels c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.channel_id = c.channel_id
AND c.channel_desc in ('Internet','Catalog')
AND p.prod_category in ('Men')
AND t.fiscal_year in (2000,1999)
GROUP BY p.prod_subcategory, calendar_month_desc
, c.channel_desc;
set linesize 132
select * from table(dbms_xplan.display);

Note: Disregard the PSTART
and PSTOP
columns in the output for the moment. They will be dicussed later.
The plan shown above represents a typical star query transformation.
The records in the fact table, satisfying the query WHERE
condition, are found by scanning only the bitmap index structures
rather than the whole large sales
fact table.
In a first, internal recursive step, the Oracle database selects
all records of the three dimension tables (products,
times, and channels),
which satisfy the WHERE
condition. You can see this table access of the dimension tables
underneath the key iteration.
The Oracle database then uses the key values of these records as
predicates for probing against the bitmap index structures of the
sales fact table
itself. You see that the predicates of the query - on the channels,
products, and times
dimension tables - are used for a high-selective preselection of
only the relevant records of the table sales, using the bitmap index
structures sales_prod_bix,
sales_channel_bix, and
sales_time_bix.
In a second step, the result set is joined back with all the dimension
tables to get the final query result. These are all operations after
PARTITION RANGE ITERATION.
|
| 2. |
From a SQL*Plus session logged on to the SH schema, run
show_star2.sql, or copy the following SQL statements into
your SQL*Plus session:
@show_star2.sql
ALTER SESSION SET star_transformation_enabled=TRUE;
show parameters star_transformation
set sqlprompt "STAR - SQL> "
TRUNCATE TABLE plan_table;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */
t.calendar_month_desc
, p.prod_subcategory
, c.channel_desc
, sum(s.amount_sold) AS dollars
FROM sales s
, times t
, products p
, channels c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.channel_id = c.channel_id
AND c.channel_desc in ('Internet','Catalog')
AND p.prod_category in ('Men')
AND t.fiscal_year in (2000,1999)
GROUP BY p.prod_subcategory, calendar_month_desc
, c.channel_desc;
Rem show plan set linesize 132 select * from table(dbms_xplan.display);
Rem explain usage of TEMP TABLES set long 400 select other from plan_table where other is not null;
The following plan looks similar to the one shown before, although
it is not identical. The difference between the plans is that the
Oracle database now uses a table named ora_temp_1_4e
(which is not part of the query) for satisfying the SQL statement.
Note that this name is system-generated and will vary.


Note: The SELECT
DISTINCT statement in the output should be disregarded
and will be discussed later.
What Does This Mean?
The second output gives us an answer: The optimizer evaluated the
selectivity of the WHERE
conditions on the products,
times, and
channels dimension tables and the
size of those dimension tables. It detected that there is a high
selectivity on the products
table and that this table is "large" enough. The costs
for creating a temporary table, consisting of the result set for
the predicate on the PRODUCTS
table, is cheaper than accessing the PRODUCTS
table twice, as in the first plan without temporary table usage.
The temporary table is then used instead of the products
table itself.
The creation of this temporary table and the data insertion are
shown in the OTHER
column of the plan table.
|
| 3. |
From a SQL*Plus session logged on to the SH schema, run show_star3.sql,
or copy the following SQL statements into your SQL*Plus session.
The STAR hint is
used to enforce Oracle7 behavior.
@show_star3.sql
set sqlprompt "STAR - 7.x - SQL> "
TRUNCATE TABLE plan_table;
EXPLAIN PLAN FOR
SELECT /*+ norewrite star */
t.calendar_month_desc
, p.prod_subcategory
, c.channel_desc
, sum(s.amount_sold) AS dollars
FROM sales s, times t, products p, channels c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.channel_id = c.channel_id
AND c.channel_desc in ('Internet','Catalog')
AND p.prod_category in ('Men')
AND t.fiscal_year in (2000,1999)
GROUP BY p.prod_subcategory, calendar_month_desc
, c.channel_desc;
Rem show plan
set linesize 132
select * from table(dbms_xplan.display);

Here you see the so-called star query optimization, which was introduced
in Oracle7. To avoid several joins with the large fact table, the
optimizer builds a Cartesian product of the times,
products, and
channels dimension tables and joins
the Cartesian result once with the sales
fact table.
Note that this star optimization technique has nothing to do with
the current star query transformation, and its usage is not dependent
on the setting of the star_transformation_enabled
parameter.
|
| 4. |
From a SQL*Plus session logged on to the SH schema, run show_star4.sql,
or copy the following SQL statements into your SQL*Plus session:
@show_star4.sql
ALTER SESSION SET star_transformation_enabled=false;
set sqlprompt "no STAR - SQL> "
TRUNCATE TABLE plan_table;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */
t.calendar_month_desc
, p.prod_subcategory
, c.channel_desc
, sum(s.amount_sold) AS dollars
FROM sales s, times t, products p, channels c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.channel_id = c.channel_id
AND c.channel_desc in ('Internet','Catalog')
AND p.prod_category in ('Men')
AND t.fiscal_year in (2000,1999)
GROUP BY p.prod_subcategory, calendar_month_desc
, c.channel_desc;
Rem show plan
set linesize 132
select * from table(dbms_xplan.display);

From a join perspective, this plan has no optimizations for data
warehousing. Subsequently, it joins the four tables.
|
| 5. |
From a SQL*Plus session logged on to the SH schema, run reset_star_test.sql,
or copy the following SQL statements into your SQL*Plus session:
This will reset the session environment to our initial setting
with an enabled star query
transformation, the recommended setting for a data warehousing environment.
@reset_star_test.sql
Rem BACK TO NORMALITY
alter session set star_transformation_enabled=TRUE;
set sqlprompt "SQL> "

|
Using New Bitmap Join Indexes
Bitmap join indexes are introduced in Oracle9i.
A join index is an index on one table that uses columns of one or more
different tables through a join.
The volume of data that must be joined can be reduced if join indexes
are used as joins have already been precalculated. In addition, join indexes
which contain multiple dimension tables can eliminate bitwise operations
which are necessary in the star transformation with existing bitmap indexes.
| 1. |
Create a bitmap join index on fact table sales
, for a joined attribute from the channels
dimension table. From a SQL*Plus session logged on to the SH schema,
run cr_bj_idx.sql,
or copy the following SQL statements into your SQL*Plus session:
@cr_bj_idx.sql
CREATE BITMAP INDEX bji_sales_prod_cat
ON sales(p.prod_category)
FROM sales s, products p
WHERE s.prod_id = p.prod_id
LOCAL NOLOGGING COMPUTE STATISTICS;

|
| 2. |
Look at the execution plan of your query again and compare it with
the original star transformation plan in the previous section. For
the sake of simplicity, the usage of temp tables for star transformation
is disabled.
@plan_bj_idx.sql
AALTER SESSION SET star_transformation_enabled=TEMP_DISABLE;
TRUNCATE TABLE plan_table;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */
p.prod_subcategory, c.channel_desc,
sum(s.amount_sold) AS dollars
FROM sales s, times t, products p, channels c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.channel_id = c.channel_id
AND c.channel_desc in ('Internet','Catalog')
AND p.prod_category in ('Men')
AND t.fiscal_year in (2000,1999)
GROUP BY p.prod_subcategory, c.channel_desc;
Rem show plan
set linesize 132
select * from table(dbms_xplan.display);
ALTER SESSION SET star_transformation_enabled=TRUE;

When comparing this plan with the equivalent star transformation
plan, you can see that now you dont have to query the dimension
table channels for probing the bitmap index on the sales fact table.
The predicate in the query is on the channel_desc
column, which is stored in the bitmap join index, thus making the
join no longer necessary.
Another benefit of the bitmap join index is the CPU (and IO) savings
gained by the removal of the bitmap key iteration work (which has
to merge all of the bitmaps together).
|
Examining the Differences Between B-tree and Bitmap Indexes for Data
Warehousing
Real (persistent) indexes were introduced with Oracle 7.3. A B-tree index
stores a list of row IDs for each key corresponding to the rows with that
key value; a bitmap index, however, stores a bitmap for each key value
instead of a list of row IDs.
Bitmap indexes are stored compressed; if the number of different key
values is small, then bitmap indexes are very space efficient and on average
20 to 30 times smaller than the equivalent B-tree index structure.
This provides the following benefits for data warehousing:
- Reduced response time for large classes of ad hoc queries
- Substantial reduction of space usage compared to other indexing techniques
- Dramatic performance gains even on very low-end hardware
- Very efficient parallel DML and loads
Now create a new table with the structure of a typical fact table in
a data warehouse environment, representing a subset of our sales transaction
fact data. To investigate the time for creation and the space usage of
bitmap indexes versus B-tree indexes, you are going to create an index
for each dimension join column, one time as a B-tree and then another
time as a bitmap index.
| 1. |
From a SQL*Plus session logged on to the SH schema, run create_stage_table.sql,
or copy the following SQL statements into your SQL*Plus session:
@create_stage_table.sql
DROP TABLE sales_delta;
CREATE TABLE sales_delta
NOLOGGING AS
SELECT *
FROM sales
WHERE 1=0;
|
| 2. |
Fill the table with some data from the sales
fact table.
@load_stage_table3.sql
INSERT /*+ APPEND */ INTO sales_delta
SELECT PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID,
PROMO_ID, QUANTITY_SOLD, AMOUNT_SOLD
FROM sales PARTITION ( sales_q1_2000)
;
COMMIT;

|
| 3. |
Create B-tree indexes on all dimension join columns. Note the time
needed for creation.
@cr_btree_idx.sql
set timing on
CREATE INDEX sales_prod_local_bix
ON sales_delta (prod_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE INDEX sales_cust_local_bix
ON sales_delta (cust_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE INDEX sales_time_local_bix
ON sales_delta (time_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE INDEX sales_channel_local_bix
ON sales_delta (channel_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE INDEX sales_promo_local_bix
ON sales_delta (promo_id)
NOLOGGING COMPUTE STATISTICS ;

On your system, you can see that you needed 3-5 second to create
the B-tree indexes.
|
| 4. |
Use SQL to store the SIZES
in a table for a direct comparison. To do this, select the actual
segment sizes of the B-tree indexes from the data dictionary.
@cr_compare_tab.sql
DROP TABLE compare_idx_size;
CREATE TABLE compare_idx_size
AS
SELECT segment_name index_name,'STANDARD BTREE' index_type,
sum(bytes)/(1024*1024) index_size
FROM user_segments us, user_indexes ui
WHERE us.segment_name=ui.index_name
AND ui.table_name='SALES_DELTA'
GROUP BY segment_name, index_type;
|
| 5. |
Determine what the size of the indexes would be if you had used
static bitmap indexes. First you have to clean up the environment
by running cleanup_idx_comp.sql.
@cleanup_idx_comp.sql
DROP INDEX sales_prod_local_bix;
DROP INDEX sales_cust_local_bix;
DROP INDEX sales_time_local_bix;
DROP INDEX sales_channel_local_bix;
DROP INDEX sales_promo_local_bix;

|
| 6. |
Create static bitmap indexes on all dimension join columns. Note
the time needed for creation; you will recognize the faster creation,
compared to that of B-tree indexes.
@cr_bitmap_idx.sql
Set timing on
CREATE BITMAP INDEX sales_prod_local_bix
ON sales_delta (prod_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_cust_local_bix
ON sales_delta (cust_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_time_local_bix
ON sales_delta (time_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_channel_local_bix
ON sales_delta (channel_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_promo_local_bix
ON sales_delta (promo_id)
NOLOGGING COMPUTE STATISTICS ;

|
| 7. |
Complete the comparison table, so that you can use SQL to investigate
the differences between B-tree and bitmap indexes.
@fill_comp_table.sql
INSERT INTO compare_idx_size
SELECT segment_name index_name, 'BITMAP',sum(bytes)/(1024*1024)
FROM user_segments us, user_indexes ui
WHERE us.segment_name=ui.index_name
AND ui.table_name='SALES_DELTA'
GROUP BY segment_name;
COMMIT;

|
| 8. |
Compare the different sizes of those indexes. To show all the facts,
you can run comp_idx1.sql
as user SH.
@comp_idx1.sql
select substr(a.index_name,1,23) "Index Name",
a.index_type "Index Type",
a.index_size "Size [MB]",
b.index_type "Index Type",
b.index_size "Size [MB]",
'btree ' || trunc((b.index_size/a.index_size),2) ||
' times bigger than bitmap' "btree X times larger"
FROM compare_idx_size a, compare_idx_size b
WHERE a.index_name=b.index_name
AND a.index_type='BITMAP'
AND b.index_type='STANDARD BTREE'
ORDER BY 6 asc;

For a shorter version, run the following:
@comp_idx2.sql
SELECT substr(a.index_name,1,23) "Index Name",
'btree ' || trunc((b.index_size/a.index_size),2) ||
' times bigger than bitmap' "btree X times larger"
FROM compare_idx_size a, compare_idx_size b
WHERE a.index_name=b.index_name
AND a.index_type='BITMAP'
AND b.index_type='STANDARD BTREE'
ORDER BY 2 asc;

|
| 9. |
Clean up the environment.
@cleanup_idx_test.sql
DROP TABLE compare_idx_size;
DROP TABLE sales_delta;

|
Becoming Familiar with Oracle Advanced Partitioning Mechanisms for Improved
Query Performance
Oracle Partitioning not only has benefits from a table maintenance perspective,
but also has a lot of mechanisms available to speed up query performance
transparently up to an order of magnitudes.
However, query performance should never be the main design criteria for
designing the partitioned objects in your system. Oracle recommends to
design your partitioned objects to satisfy your system maintenance requirements,
such as load windows, backup and recovery time and volume constraints,
or common data warehousing tasks, such as rolling window operations.
Note: For more information about common rolling window operations
in data warehousing, see the "High-Speed Data Load and Rolling
Window Operations" module.
 |
Partition Pruning |
 |
Partitionwise Joins
|
Partition Pruning
Partition pruning, also called partition elimination, is a very important
for the optimization of big data warehouse projects. Large tables are
partitioned into smaller fragments called partitions. Only partitions
that are relevant to the user's statement are accessed and processed.
Partition pruning is the process you use to filter out only the necessary
partitions of a partitioned object to satisfy a query. It can be either
static or dynamic; dynamic partition pruning was introduced with Oracle8.1.6.
|
|
Static partition pruning takes place,
when the optimizer can eliminate specific partitions at parse time,
for example a query predicates on the partitioning key column. |
|
|
Dynamic partition pruning takes place at query run time,
when the following criteria are satisfied:
- A predicate condition on a table that is joined to the partitioned
object on the partition key column(s)
- A sufficient selectivity for the predicate - and therefore for
the partition pruning - to make the additional recursive SQL cheaper
than processing the query without dynamic partition pruning
|
Dynamic partition pruning also takes place when the predicate on the
join column cannot be determined at parse time and needs additional recursive
SQL. This happens for example when you’re using an ‘incomplete’ DATE value,
such as TO_DATE(‘01-JAN-00’,’DD-MON-RR’). Here we need a recursive statement
to determine the century to complete the DATE value.
Static Partition Pruning
The following example demonstrates static partition pruning with a predicate
condition on the partition key column.
| 1. |
From a SQL*Plus session logged on to the SH schema, run plan_static_pruning.sql,
or copy the following SQL statements into your SQL*Plus session:
@plan_static_pruning.sql
truncate table plan_table;
EXAPLIN PLAN FOR
SELECT c.channel_desc, sum(amount_sold)
FROM sales s, channels c
WHERE s.channel_id = c.channel_id
AND s.time_id >= to_date('01-JAN-2000','DD-MON-YYYY')
AND s.time_id <= to_date('31-MAR-2000','DD-MON-YYYY')
GROUP BY channel_desc;
Rem show plan
set linesize 132
select * from table(dbms_xplan.display);
Look at the PSTART
and PSTOP
column to see information about partition pruning. You see that
the query has to access only one partition for getting the result,
namely partition # 13.
|
| 2. |
To get more information about the partition that must be accessed,
you can query the data dictionary with the appropriate partition
number.
@select_part_name.sql
SELECT partition_name
FROM user_tab_partitions
WHERE table_name='SALES'
AND partition_position=13;
|
Dynamic Partition Pruning
The sales fact table of
the sample schema is range partitioned on the time_id
column. Joining the sales
fact table with the times
dimension table over time_id
satisfies the first criteria, so that dynamic partition pruning can take
place.
| 1. |
From a SQL*Plus session logged on to the SH schema, run plan_dyn_pruning.sql,
or copy the following SQL statements into your SQL*Plus session:
@plan_dyn_pruning.sql
TRUNCATE TABLE plan_table;
EXPLAIN PLAN FOR
SELECT t.day_number_in_month, sum(s.amount_sold)
FROM sales s, times t
WHERE s.time_id = t.time_id
AND t.calendar_month_desc='2000-12'
GROUP BY t.day_number_in_month;
set linesize 132
Rem show the plan with dynamic partition pruning
select * from table(dbms_xplan.display);
Unlike with static partition pruning, you will not see any absolute
numbers in the PSTART
and PSTOP
column, but the word KEY. This indicates that dynamic partition
pruning will take place for this query at run time.
|
| 2. |
To show the recursive SQL statement that is issued for dynamic
partition pruning, run the following query against the plan_table:
@select_other
set long 400
SELECT other
FROM plan_table
WHERE other IS NOT NULL;
You can see how the Oracle database translates the original predicate
condition into an inline view for getting the appropriate partition
information. The output is shown in the screenshot above
|
Partitionwise Joins
Partitionwise joins reduce query response time by minimizing the amount
of data exchanged among parallel execution servers when joins execute
in parallel. This significantly reduces response time and improves the
use of both CPU and memory resources. In Oracle Real Application Cluster
environments, partitionwise joins also avoid or at least limit the data
traffic over the interconnect, which is the key to achieving good scalability
for massive join operations. Partitionwise joins can be full or partial.
Oracle decides which type of join to use.
Note: This module does not cover all possibilities of partitionwise
operations in detail. It should give you a basic understanding of partitionwise
operations, and an understanding of how to read an execution plan appropriately.
For detailed information about partitionwise joins, see the Data Warehousing
Guide.
Steps
| 1. |
Create two additional hash-partitioned tables with the
appropriate index structures. |
| 2. |
Import statistics for those two tables.
|
| 3. |
Experience serial partitionwise joins:
- Non-partitionwise join
- Full partitionwise join
|
| 4. |
Experience parallel partitionwise joins:
- Non-partitionwise join
- Full partitionwise join
- Partial partitionwise join
|
1. Creating Additional Hash-Partitioned Tables and Indexes
One of the driving factors (besides parallel execution) that the Oracle
database uses to decide whether or not to perform partitionwise joins
is the physical structure of the objects that will be joined. Two additional
tables will be used to demonstrate partitionwise joins.
| 1. |
From a SQL*Plus session logged on to the SH schema, run the following
SQL statements in your SQL*Plus session to create the additional
structures:
Create a hash-partitioned table with the same structure as the
customers dimension
table.
@create_cust_hash.sql
DROP TABLE customers_hash;
CREATE TABLE customers_hash
( cust_id NUMBER
, cust_first_name VARCHAR2(20)
, cust_last_name VARCHAR2(40)
, cust_gender CHAR(1)
, cust_year_of_birth NUMBER(4)
, cust_marital_status VARCHAR2(20)
, cust_street_address VARCHAR2(40)
, cust_postal_code VARCHAR2(10)
, cust_city VARCHAR2(30)
, cust_state_province VARCHAR2(40)
, country_id CHAR(2)
, cust_main_phone_number VARCHAR2(25)
, cust_income_level VARCHAR2(30)
, cust_credit_limit NUMBER
, cust_email VARCHAR2(30)
)
PCTFREE 5
PARTITION BY HASH (cust_id)
(PARTITION h1_cust, PARTITION h2_cust, PARTITION h3_cust,
PARTITION h4_cust, PARTITION h5_cust, PARTITION h6_cust,
PARTITION h7_cust, PARTITION h8_cust, PARTITION h9_cust,
PARTITION h10_cust,PARTITION h11_cust,PARTITION h12_cust,
PARTITION h13_cust,PARTITION h14_cust,PARTITION h15_cust,
PARTITION h16_cust);
CREATE UNIQUE INDEX customers_hash_pk
ON customers_hash (cust_id) ;
ALTER TABLE customers_hash
ADD ( CONSTRAINT customers_hash_pk
PRIMARY KEY (cust_id) RELY ENABLE VALIDATE
) ;
Create a composite range-hash-partitioned table with the same structure
as the sales fact
table.
@create_sales_hash.sql
DROP TABLE sales_hash;
CREATE TABLE sales_hash
( prod_id NUMBER(6)
, cust_id NUMBER
, time_id DATE
, channel_id CHAR(1)
, promo_id NUMBER(6)
, quantity_sold NUMBER(3)
, amount_sold NUMBER(10,2)
) PCTFREE 5 NOLOGGING
PARTITION BY RANGE (time_id)
SUBPARTITION BY HASH (cust_id)
SUBPARTITIONS 16
( PARTITION SALES_HASH_1995
VALUES LESS THAN (TO_DATE('01-JAN-1996','DD-MON-YYYY')),
PARTITION SALES_HASH_1996
VALUES LESS THAN (TO_DATE('01-JAN-1997','DD-MON-YYYY')),
PARTITION SALES_HASH_1_1997
VALUES LESS THAN (TO_DATE('01-JUL-1997','DD-MON-YYYY')),
PARTITION SALES_HASH_2_1997
VALUES LESS THAN (TO_DATE('01-JAN-1998','DD-MON-YYYY')),
PARTITION SALES_HASH_Q1_1998
VALUES LESS THAN (TO_DATE('01-APR-1998','DD-MON-YYYY')),
PARTITION SALES_HASH_Q2_1998
VALUES LESS THAN (TO_DATE('01-JUL-1998','DD-MON-YYYY')),
PARTITION SALES_HASH_Q3_1998
VALUES LESS THAN (TO_DATE('01-OCT-1998','DD-MON-YYYY')),
PARTITION SALES_HASH_Q4_1998
VALUES LESS THAN (TO_DATE('01-JAN-1999','DD-MON-YYYY')),
PARTITION SALES_HASH_Q1_1999
VALUES LESS THAN (TO_DATE('01-APR-1999','DD-MON-YYYY')),
PARTITION SALES_HASH_Q2_1999
VALUES LESS THAN (TO_DATE('01-JUL-1999','DD-MON-YYYY')),
PARTITION SALES_HASH_Q3_1999
VALUES LESS THAN (TO_DATE('01-OCT-1999','DD-MON-YYYY')),
PARTITION SALES_HASH_Q4_1999
VALUES LESS THAN (TO_DATE('01-JAN-2000','DD-MON-YYYY')),
PARTITION SALES_HASH_Q1_2000
VALUES LESS THAN (TO_DATE('01-APR-2000','DD-MON-YYYY')),
PARTITION SALES_HASH_Q2_2000
VALUES LESS THAN (TO_DATE('01-JUL-2000','DD-MON-YYYY')),
PARTITION SALES_HASH_Q3_2000
VALUES LESS THAN (TO_DATE('01-OCT-2000','DD-MON-YYYY')),
PARTITION SALES_HASH_Q4_2000
VALUES LESS THAN (TO_DATE('01-JAN-2001','DD-MON-YYYY'))
);
CREATE BITMAP INDEX sales_cust_hash_bix
ON sales_hash (cust_id)
LOCAL NOLOGGING;
|
2. Importing Statistics for Those Tables
You have set up two additional tables, customers_hash
and sales_hash,
but they do not contain any data. Instead of inserting the same data into
those tables that you have in the appropriate tables, customers
and sales, youre
need to use Oracles capabilities to export and import table statistics
without any data. This functionality was introduced with Oracle8i.
Object statistics are used by the Oracle optimizer to evaluate execution
plans. Being able to export and import statistics without the appropriate
data enables you to get identical optimizer behavior for a test or development
environment as in the large production system without the necessity of
having the same size and data.
Note: Another means of getting identical optimizer behavior
is to use Oracles plan stability capabilities. See the Performance
Guide and Reference.
| 1. |
You havent gathered any statistics for the two new objects
yet:
@show_tab_stats.sql
SELECT table_name, num_rows
FROM user_tables
WHERE table_name in ('SALES_HASH','CUSTOMERS_HASH');
|
| 2. |
Because you dont have any data in those objects,
and you dont plan to insert any, you need to import existing
statistics for those two objects.
@imp_tab_stats.sql
Rem import those statistics from STAT_TABLE
Rem file stat_table.dmp, must be imported at the beginning
exec dbms_stats.import_table_stats('sh','sales_hash',
stattab =>'stat_table', statid =>'HANDS_ON');
exec dbms_stats.import_table_stats('sh','customers_hash',
stattab =>'stat_table', statid =>'HANDS_ON')
|
| 3. |
Examine the statistics for those tables as if they contain
a lot of data.
@show_tab_stats.sql
SELECT table_name, num_rows
FROM user_tables
WHERE table_name IN ('SALES_HASH','CUSTOMERS_HASH');
|
3. Experiencing Serial Partitionwise Joins
Note: The NOREWRITE
hint is used to avoid any interference with possibly existing materialized
views. Alternatively, you can disable query_rewrite
for this particular session, or for the complete instance. This is true
for all subsequent statements.
| 1. |
From a SQL*Plus session logged on to the SH schema, run set_noparallel.sql,
or copy the following SQL statements into your SQL*Plus session
to ensure that you get serial execution plans:
@set_noparallel
ALTER TABLE sales NOPARALLEL;
ALTER TABLE sales_hash NOPARALLEL;
ALTER TABLE customers NOPARALLEL;
ALTER TABLE customers_hash NOPARALLEL;
|
Non-Partitionwise Joins
Now join your range partitioned table sales
with the non-partitioned table customers.
| 1. |
From a SQL*Plus session logged on to the SH schema, run serial_nopwj.sql,
or copy the following SQL statements into your SQL*Plus session:
@serial_nopwj.sql
TRUNCATE TABLE plan_table;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */ cust_last_name, sum(amount_sold)
FROM sales s, customers c
WHERE s.cust_id = c.cust_id
GROUP BY cust_last_name;
set linesize 132
Rem show the plan with non partition-wise join
select * from table(dbms_xplan.display);
The plan shows full table access for both tables, sales
and customers,
and a hash join.
|
Full Partitionwise Joins
A full partitionwise join divides a large join into smaller joins between
a pair of partitions from the two joined tables. To use this feature,
you must equipartition both tables on their join keys. Joining your composite
(range-hash) partitioned table, sales_hash,
with the hash-partitioned table, customers_hash.
Note that those two tables are equipartitioned on the cust_id
join column.
| 1. |
From a SQL*Plus session logged on to the SH schema, run serial_pwj.sql,
or copy the following SQL statements into your SQL*Plus session:
@serial_pwj.sql
TRUNCATE TABLE plan_table;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */ cust_last_name, sum(amount_sold)
FROM sales_hash s, customers_hash c
WHERE s.cust_id = c.cust_id
GROUP BY cust_last_name;
set linesize 132
Rem show the plan with full partition-wise join
select * from table(dbms_xplan.display);
The plan looks slightly different than the non-partitioned plan.
You see an additional operation, ‘PARTITION
RANGE ALL’, in the plan. You can read this operation as specifying
how to process the hash-join, or, in pseudocode:
FOR LOOP over all partitions of customers_hash
DO
hash-join equivalent partitions
DONE
This full partitionwise join is done in serial.
|
4. Experiencing Parallel Partitionwise Joins
The NOREWRITE hint is
used to avoid any interference with possibly existing materialized views.
Alternatively, you could disable query_rewrite
for this particular session, or for the complete instance. This is true
for all subsequent statements.
| 1. |
From a SQL*Plus session logged on to the SH schema, run set_parallel4.sql,
or copy the following SQL statements into your SQL*Plus session
to ensure that you get serial execution plans:
@set_parallel4.sql
ALTER TABLE sales PARALLEL 4;
ALTER TABLE sales_hash PARALLEL 4;
ALTER TABLE customers PARALLEL 4;
ALTER TABLE customers_hash PARALLEL 4;
|
Non-Partitionwise Joins
Now join your range-partitioned table sales
with the non-partitioned table customers.
| 1. |
From a SQL*Plus session logged on to the SH schema, run par_nopwj.sql,
or copy the following SQL statements into your SQL*Plus session:
@par_nopwj.sql
TRUNCATE TABLE plan_table;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */ cust_last_name, sum(amount_sold)
FROM sales s, customers c
WHERE s.cust_id = c.cust_id
GROUP BY cust_last_name;
set linesize 132
Rem show the plan with non partition-wise join
select * from table(dbms_xplan.display);
You can see that the statement is executed in parallel; because
there is no existing physical partitioning of one of the tables
which could be used for this query, no partitionwise join takes
place.
Both tables are scanned in parallel and redistributed to the next
set of slaves based on a hash distribution; those slaves are executing
the HASH JOIN and
the first SORT GROUP BY
(TQ 0,02), before they redistribute their result sets to the next
set of slaves for the final SORT
GROUP BY operation.
If the query contains an additional ORDER
BY, you could see a range-based redistribution for feeding
the last SORT GROUP BY
(PQ Distribution method of TQ 0,02).
|
Full Partitionwise Joins
Parallel execution of a full partitionwise join is a straightforward
parallelization of the serial execution. Instead of joining one partition
pair at a time, the partition pairs are joined in parallel by n
query servers.

The above picture shows a parallel partitionwise join for a hash-hash
partitionwise join. In our example, you are joining the hash-partitioned
table, customers_hash,
with the composite partitioned table, sales_hash.
The hash partitions for the sales_hash
table are composed of a set of 16 subpartitions, one from each range partition.
|
1.
|
From a SQL*Plus session logged on to the SH schema, run
par_fullpwj.sql, or copy the following SQL statements into
your SQL*Plus session:
@par_fullpwj.sql
TRUNCATE TABLE plan_table;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */ cust_last_name, sum(amount_sold)
FROM sales_hash s, customers_hash c
WHERE s.cust_id = c.cust_id
GROUP BY cust_last_name;
set linesize 132
Rem show the plan with full partition-wise join
select * from table(dbms_xplan.display);
You see the same PARTITION
HASH ALL operation you know from a serial full partitionwise
join.
Furthermore, this plan shows one of the benefits of partitionwise
joins: When executed in parallel, partitionwise joins reduce communications
overhead. The Oracle database can avoid redistributing the partitions
because the two tables are already partitioned on the join column.
This enables each parallel execution server to join a pair of matching
partitions.
You can see that the same slave set (TQ 1,00), which is doing the
first SORT GROUP BY
operation, is also doing the table scans and the hash-join operation
for a partition pair (PQ Distribution method Parallel Combined
With Parent, PCWP). There is no data redistribution
up to this point in the execution plan.
|
Partial Partitionwise Joins
Unlike full partitionwise joins, partial partitionwise joins require
you to partition only one table on the join key, not both tables. The
partitioned table is referred to as the reference table. The other table
may or may not be partitioned. Partial partitionwise joins are more common
than full partitionwise joins.

To execute a partial partitionwise join, the Oracle database dynamically
repartitions the other table based on the partitioning of the reference
table. Once the other table is repartitioned, the execution is similar
to a full partitionwise join.
Partial partitionwise joins are only executed in parallel. A partitionwise
join is more a distribution method prior to the join to improve and speed
up the efficiency of the join operation. Because you only have one process
executing the join in serial, there is no need - and no benefit - to redistributing
the nonpartitioned table dynamically before the join.
|
1.
|
From a SQL*Plus session logged on to the SH schema, run
par_partpwj.sql, or copy the following
SQL statements into your SQL*Plus session:
@par_partpwj.sql
TRUNCATE TABLE plan_table;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */ cust_last_name, sum(amount_sold)
FROM sales_hash s, customers c
WHERE s.cust_id = c.cust_id
GROUP BY cust_last_name;
set linesize 132
Rem show the plan with partial partition-wise join
select * from table(dbms_xplan.display);
You can see that customers
table is dynamically redistributed based on the join key column,
cust_id, in the same
way that the sales_hash
table is subpartitioned on this column.
This is shown in the fact that the result sets of the parallel
table scan of the customers
table are fed to the HASH
JOIN operation (TQ 2,01); the same slave set that does the
parallel (partitionwise) scan of the composite partitioned table,
sales_hash.
The hash partitions for the sales_hash
table are composed of a set of 16 subpartitions, one from each range
partition.
|
Cleaning Up
You now need to clean up the environment.
|
1.
|
From a SQL*Plus session logged on to the SH schema, run cleanup_mod3.sql,
or copy the following SQL statements into your SQL*Plus session:
@cleanup_mod3.sql
ALTER TABLE sales NOPARALLEL;
ALTER TABLE sales_hash NOPARALLEL;
ALTER TABLE customers NOPARALLEL;
ALTER TABLE customers_hash NOPARALLEL;
DROP TABLE customers_hash;
DROP TABLE sales_hash;
SET SERVEROUTPUT ON EXEC dw_handsOn.cleanup_modules
|
Benefits of Partitionwise Joins
After examining partitionwise joins, you can summarize their benefits:
Reduction of Communications Overhead
When executed in parallel, partitionwise joins reduce communications
overhead. This is because, in the default case, parallel execution of
a join operation by a set of parallel execution servers requires the
redistribution of each table on the join column into disjoint subsets
of rows. These disjoint subsets of rows are then joined pairwise by
a single parallel execution server.
The Oracle database can avoid redistributing the partitions because
the two tables are already partitioned on the join column. This enables
each parallel execution server to join a pair of matching partitions.
This improved performance from using parallel execution is even more
noticeable in Oracle Real Application Cluster configurations with internode
parallel execution. Partitionwise joins dramatically reduce interconnect
traffic. This feature is useful for large DSS configurations that use
Oracle Real Application Clusters.
Reduction of Memory Requirements
Partitionwise joins require less memory than the equivalent join operation
of the complete data set of the tables being joined.
In the case of serial joins, the join is performed at the same time
on a pair of matching partitions. If data is evenly distributed across
partitions, the memory requirement is divided by the number of partitions.
There is no skew.
In the parallel case, memory requirements depend on the number of partition
pairs that are joined in parallel. For example, if the degree of parallelism
is 20 and the number of partitions is 100, 5 times less memory is required
because only 20 joins of two partitions are performed at the same time.
The fact that partitionwise joins require less memory has a direct effect
on performance. For example, the join probably does not need to write
blocks to disk during the build phase of a hash join.
Using the New Automatic Gathering of Statistics
The dbms_stats package
was introduced with Oracle8i.
This package simplifies the gathering of statistics for complete databases,
schemas, or specific objects; most of those operations can be done in
parallel. Satstistics could be gathered not only for complete database,
schemas, or tables; Oracle8i
also introduced the terms STALE and EMPTY objects to specify
additional granularities for the objects of interest, where you want to
get new statistics.
- An EMPTY table is a table without any statistics at all. Using options=GATHER
EMPTY gathers statistics of all these tables and its dependent
indexes.
- A table and its dependent index structures are considered STALE as
soon as more than 10% of the total data volume has been changed. Invoking
dbms_stats with the option=GATHER
STALE collects statistics for all objects that have been changed
by more than 10%.
To take advantage of the Oracle databases internal mechanism to
track the staleness of a table, you must enable monitoring of this particular
table during CREATE TABLE or
with an ALTER TABLE command.
So, you could use the dbms_stats
package to gather statistics only for those objects that are regarded
stale, thus making the gathering of statistics for only the changed objects
easier.
The threshold value of 10% for tables to be considered stale is not changeable;
it is an empirical value derived by internal testings, and works well
for most customer situations. The threshold value can be regarded as conservative,
which means that the Oracle database then tends to regard an object as
stale sometimes earlier than necessary. If a specific application environment
needs more appropriate statistics, you can leverage the internally tracked
information about the changes of an object and implement your own procedure
for gathering statistics for your need.
Besides the high sophisticated customer environments that want to control
everything by themselves, theres a high customer need for more simplicity.
With Oracle9i,
you now keep all the statistics of your system up-to-date with one single
command, thus reducing your statistics maintenance efforts dramatically.
Its no longer difficult to provide the optimizer appropriate statistics.
Oracle9i provides
a new option setting for using the dbms_stats
package. The option is GATHER AUTO. Apart from the schema
name, thats the only parameter you have to pass to Oracle, and it
determines all other settings such as estimate_percentage
or the need of histograms automatically for you. It will collect statistics
of all table without statistics and all objects considered stale.
This reduces the customer interaction to one single command and to the
decision about when to run it. Thats all. To use this option, you
perform the following steps:
| 1. |
Do you currently have any table with monitoring enabled?
From a SQL*Plus session logged on to the SH schema, run tab_status_mon.sql,
or copy the following SQL statements into your SQL*Plus session:
@tab_status_mon.sql
Rem actual tables and monitoring status SELECT table_name,
to_char(last_analyzed,'dd.mm.yy hh24:mi:ss') la,
num_rows, monitoring
FROM user_tables;
|
| 2. |
Oracle9i
provides a procedural interface that you use to enable or disable
monitoring for a schema or a complete database.
@enable_monitoring.sql
Rem enable or disable monitoring for a complete schema
exec dbms_stats.alter_schema_tab_monitoring('SH',FALSE);
exec dbms_stats.alter_schema_tab_monitoring('SH',TRUE);
|
| 3. |
The user_tab_modifications
data dictionary view lists all changes to your table with monitoring
enabled.
@show_tab_modifications.sql
SELECT *
FROM user_tab_modifications;
|
You are now ready to use the new single command for keeping your statistics
up-to-date.
| 1. |
From a SQL*Plus session logged on to the SH schema, run gather_auto.sql,
or copy the following SQL statements into your SQL*Plus session:
@gather_auto.sql
set serveroutput on
declare
list_of_objects dbms_stats.objectTab := dbms_stats.objectTab();
begin
dbms_output.enable(200000);
dbms_stats.gather_schema_stats('SH',options=>'GATHER AUTO',
objlist=>list_of_objects);
for i in 1 ..list_of_objects.count loop
dbms_output.put_line('updated:'||list_of_objects(i).objtype||' '||
list_of_objects(i).objname||' '||list_of_objects(i).partname);
end loop;
end;
/

Depending on the number of objects without any statistics you might
have left over from previous modules, this might take up to a couple
of minutes.
Note that we’re only passing the schema name and the request for
automatic gathering of statistics to Oracle. All other parameters
are derived internally. You might be astonished to see what statistics
you’re missing …
|
| 2. |
Perform DML on one of the tables.
@stat_dml1.sql
Rem some DML
Rem 499 rows
update products
set prod_min_price=prod_min_price-1
where prod_id < 2500;
commit;

You are modifying less than 10% of the data, so that the threshold
for gathering new statistics has not been reached. We will experience
this in a minute.
|
| 3. |
Take a look at what in user_tab_modifications now. The information
about table modifications is flushed periodically from the SGA unless
manually requested. You won’t see any changes yet (unless you’re
| |