Place the cursor over this icon to load and view all the screenshots for this
tutorial. (Caution: Because this action loads all screenshots simultaneously,
response time may be slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor
over each individual icon in the following steps to load and view only the screenshot
associated with that step.
Overview
Oracle9i
is the leading relational database for data warehousing and the first complete
Business Intelligence platform. It not only addresses the basic core requirements
of performance, scalability, and manageability, but also other data-relevant,
back-end functionality around ETL processing, data analysis (OLAP), and data
mining.
Oracle Database 10g has significant enhancements
to every facet of Oracle9i's
relational capabilities, including a complete new manageability infrastructure,
extending Oracle's lead in providing the most complete Business Intelligence
platform.
Implement Schema Changes
for the Sales History Schema
Some changes on the existing Sales History schema objects
are necessary, and some additional system privileges must be granted to the
user SH. The SQL file for applying those changes is modifySH_10g.sql. This file
is provided to you in the setup_dwh.zip file. To
use the setup files for the Data Warehousing tutorials, perform the following
steps:
1.
Start a SQL*Plus session and run the modifySH_10g.sql
script from your SQL*Plus session.
2.
From your SQL*Plus session, execute the following commands:
PROMPT "Fix" customers table and validate PK constraint
ALTER TABLE customers MODIFY CONSTRAINT customers_pk VALIDATE
Comparing Oracle's Star Query Transformations
with Other, Different Access Plans
In this topic, you examine Oracles superior query execution
capabilities for data warehousing, namely Oracles star transformation,
which was first introduced in Oracle8.0. You compare the execution plans for
one distinct query using star transformation with other plans less suitable
for handling large volumes of data, in order of decreasing performance.
Note: Because of the small amount of data used in the
hands-on activity, the great benefit of this transformation is not apparent.
This is because the database or file system cache is hardly exceeded, so that
I/O disadvantages of the other mechanisms are mostly eliminated. In addition,
due to potentially different init.ora settings and statistics, you may encounter
slightly different execution plans and costs.
1.
From a SQL*Plus session logged on to the SH schema,
run show_star1.sql, or copy SQL statements below into your SQL*Plus
session.
The following ALTER SESSION command enables Oracles
star query transformation without the use of temporary tables. Beginning
with Oracle8i,
temporary tables might be used by the optimizer to further improve a star
transformation. The next example illustrates the difference between a
true star transformation (introduced in Oracle8i)
and the behavior shown here.
Note: The NOREWRITE hint is used to avoid any
interference with possibly existing materialized views. Alternatively,
you could disable query_rewrite for this particular session, or the complete
instance. This is true for all subsequent statements.
The STAR_TRANSFORMATION hint is to enforce the ability
within Oracle to use star transformation even with a small set of data,
where another plan might be better.
@show_star1.sql
PROMPT let's disable the usage of TEMP TABLES to show simple star first
ALTER SESSION SET star_transformation_enabled=TEMP_DISABLE; show parameters star_transformation
DELETE FROM plan_table; COMMIT; EXPLAIN PLAN FOR SELECT /*+ norewrite */ t.calendar_month_desc , p.prod_subcategory , c.cust_city , sum(s.amount_sold) AS dollars FROM sales s , times t , products p , customers c WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id AND s.cust_id = c.cust_id AND c.cust_city='Ravensburg' AND p.prod_category in ('Hardware','Photo') AND t.fiscal_year in (2000,1999) GROUP BY p.prod_subcategory, calendar_month_desc , c.cust_city;
PROMPT show plan set linesize 140 select * from table(dbms_xplan.display);
Note: Disregard
the PSTART and
PSTOP columns in the output for the moment. They are dicussed later.
The plan shown above represents a typical star query
transformation. The records in the fact table, satisfying the query WHERE
condition, are found by scanning only the bitmap index structures rather
than the whole large sales fact table.
In a first, internal recursive step, the Oracle database
selects all records of the three dimension tables (products, times, and
channels), which satisfy the WHERE condition. You can see this table access
of the dimension tables underneath the key iteration.
The Oracle database then uses the key values of these
records as predicates for probing against the bitmap index structures
of the sales fact table itself. You see that the predicates of the queryon
the customers, products, and times dimension tablesare used for
a high-selective preselection of only the relevant records of the sales
table, using the bitmap index structures sales_prod_bix, sales_cust_bix,
and sales_time_bix.
In a second step, the result set is joined back with
all the dimension tables to get the final query result. These are all
operations after PARTITION RANGE ITERATION.
2.
From a SQL*Plus session logged on to the SH schema,
run show_star2.sql, or copy the following SQL statements into your SQL*Plus
session:
@show_star2.sql
PROMPT now use star_transformation ALTER SESSION SET star_transformation_enabled=TRUE; show parameters star_transformation
DELETE FROM plan_table; COMMIT; EXPLAIN PLAN FOR SELECT /*+ norewrite */ t.calendar_month_desc , p.prod_subcategory , c.cust_city , sum(s.amount_sold) AS dollars FROM sales s , times t , products p , customers c WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id AND s.cust_id = c.cust_id AND c.cust_city='Ravensburg' AND p.prod_category in ('Hardware','Photo') --AND p.prod_category in ('Hardware') AND t.fiscal_year in (2000,1999) GROUP BY p.prod_subcategory, calendar_month_desc , c.cust_city;
PROMPT show plan set linesize 140 select * from table(dbms_xplan.display);
This plan looks similar to the one shown before, although
it is not identical. The difference between the plans is that the Oracle
database now uses a table named sys_temp_xxx (which is not part of the
query) for satisfying the SQL statement. Note that this name is system-generated
and will vary.
What Does This Mean?
The optimizer evaluated the selectivity of the WHERE
conditions on the products, times,
and customers dimension tables and the size of
those dimension tables. It detected that there is a high selectivity on
the customers table and that this table is "large" enough. The
costs for creating a temporary table, consisting of the result set for
the predicate on the customers
table, is cheaper than accessing the products
table twice, as in the first plan without temporary table usage. The temp
table is then used instead of the customers table itself.
3.
From a SQL*Plus session logged on to the SH schema,
run show_star3.sql, or copy the following SQL statements into your SQL*Plus
session. The STAR hint is used to enforce Oracle7 behavior.
@show_star3.sql
PROMPT STAR JOIN TRANSFORMATION - 7.3 BEHAVIOR Rem show plan with star join transformation and discuss it
DELETE FROM plan_table; COMMIT; EXPLAIN PLAN FOR SELECT /*+ norewrite STAR */ t.calendar_month_desc , p.prod_subcategory , c.cust_city , sum(s.amount_sold) AS dollars FROM sales s , times t , products p , customers c WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id AND s.cust_id = c.cust_id AND c.cust_city='Ravensburg' --AND p.prod_category in ('Hardware') AND p.prod_category in ('Hardware','Photo') AND t.fiscal_year in (2000,1999) GROUP BY p.prod_subcategory, calendar_month_desc , c.cust_city;
Rem show plan set linesize 140 select * from table(dbms_xplan.display);
Here you see the so-called star query optimization,
which was introduced in Oracle7. To avoid several joins with the large
fact table, the optimizer builds a Cartesian product of the times,
products, and customers
dimension tables and joins the Cartesian result once with the sales fact
table.
Note: This star optimization technique has nothing
to do with the current star query transformation, and its usage is not
dependent on the setting of the star_transformation_enabled
parameter.
4.
From a SQL*Plus session logged on to the SH schema,
run show_star4.sql, or copy the following SQL statements into your SQL*Plus
session:
@show_star4.sql
PROMPT NO STAR TRANSFORMATION - WORST CASE Rem show plan without star transformation and discuss it alter session set star_transformation_enabled=false; set sqlprompt "no STAR - SQL> "
DELETE FROM plan_table; COMMIT; EXPLAIN PLAN FOR SELECT /*+ norewrite */ t.calendar_month_desc , p.prod_subcategory , c.cust_city , sum(s.amount_sold) AS dollars FROM sales s , times t , products p , customers c WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id AND s.cust_id = c.cust_id AND c.cust_city='Ravensburg' --AND p.prod_category in ('Hardware') AND p.prod_category in ('Hardware','Photo') AND t.fiscal_year in (2000,1999) GROUP BY p.prod_subcategory, calendar_month_desc , c.cust_city;
PROMPT show plan set linesize 140 select * from table(dbms_xplan.display);
From a join perspective, this plan has no optimizations
for data warehousing. Subsequently, it joins the four tables. Please note
that the complete sales fact table needs to be processed because there
are no predicates defined on the table.
5.
From a SQL*Plus session logged on to the SH schema,
run reset_star_test.sql, or copy
SQL statements below into your SQL*Plus session.
This script resets the session environment to your initial
settings with an enabled star query
transformation, the recommended setting for a data warehousing environment.
@reset_star_test.sql
PROMPT BACK TO NORMALITY alter session set star_transformation_enabled=TRUE; set sqlprompt "SQL> "
Bitmap join indexes were introduced in Oracle9i.
A join index is an index on one table that uses columns of one or more different
tables through a join.
The volume of data that must be joined can be reduced if join
indexes are used as joins have already been precalculated. In addition, join
indexes that contain multiple dimension tables can eliminate bitwise operations
that are necessary in the star transformation with existing bitmap indexes.
1.
Create a bitmap join index on fact table sales,
for a joined attribute from the products dimension table. From a SQL*Plus
session logged on to the SH schema, run cr_bj_idx.sql,
or copy the following SQL statements into your SQL*Plus session:
@cr_bj_idx.sql
DROP INDEX bji_sales_cust_city;
CREATE BITMAP INDEX bji_sales_cust_city on sales(c.cust_city) FROM sales s, customers c WHERE s.cust_id = c.cust_id LOCAL NOLOGGING COMPUTE STATISTICS;
2.
Look at the execution plan of your query again and compare
it with the original star transformation plan in the previous section.
For the sake of simplicity, the usage of temp tables for star transformation
is disabled.
@plan_bj_idx.sql
ALTER SESSION SET star_transformation_enabled=TEMP_DISABLE;
DELETE FROM plan_table; COMMIT; EXPLAIN PLAN FOR SELECT /*+ norewrite */ t.calendar_month_desc , p.prod_subcategory , c.cust_city , sum(s.amount_sold) AS dollars FROM sales s , times t , products p , customers c WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id AND s.cust_id = c.cust_id AND c.cust_city='Ravensburg' AND p.prod_category in ('Hardware','Photo') --AND p.prod_category in ('Hardware') AND t.fiscal_year in (2000,1999) GROUP BY p.prod_subcategory, calendar_month_desc , c.cust_city;
Rem show plan set linesize 140 select * from table(dbms_xplan.display); ALTER SESSION SET star_transformation_enabled=TRUE;
When comparing this plan with the equivalent star transformation
plan, you see that now you dont have to query the products dimension table
for probing the
bitmap index on the sales
fact table. The predicate in the query is on the prod_category
column, which is stored in the bitmap join index, thus making the join
unnecessary.
Another benefit of the bitmap join index is the CPU
(and IO) savings gained by the removal of the bitmap key iteration work
(which has to merge all of the bitmaps together).
Examining the Differences Between B-tree
and Bitmap Indexes for Data Warehousing
Real (persistent) bitmap indexes were introduced with Oracle
7.3. A B-tree index stores a list of row IDs for each key corresponding to the
rows with that key value; a bitmap index, however, stores a bitmap for each
key value instead of a list of row IDs.
Bitmap indexes are stored compressed; if the number of different
key values is small, then bitmap indexes are very space efficient and on average
20 to 30 times smaller than the equivalent B-tree index structure.
This provides the following benefits for data warehousing:
Reduced response time for large classes of
ad hoc queries
Substantial reduction of space usage compared
to other indexing techniques
Dramatic performance gains even on very low-end
hardware
Very efficient parallel DML and loads
Now create a new table with the structure of a typical fact
table in a data warehouse environment, representing a subset of our sales transaction
fact data. To investigate the time for creation and the space usage of bitmap
indexes versus B-tree indexes, you will create an index for each dimension
join column, once as a B-tree and again as a bitmap index.
1.
From a SQL*Plus session logged on to the SH schema,
run create_stage_table.sql,
or copy the following SQL statements into your SQL*Plus session:
@create_stage_table.sql
DROP TABLE sales_delta;
CREATE TABLE sales_delta
NOLOGGING AS
SELECT *
FROM sales
WHERE 1=0
2.
In case you haven't completed the High Speed Data
Loading and Rolling Window Operations tutorial yet, you have to create
an external table to be able to load some test data. From a SQL*Plus session
logged on to the SH schema, run create_ext_tab_for_bdf.sql,
or copy the following SQL statements into your SQL*Plus session:
@create_ext_tab_for_bdf.sql
CREATE OR REPLACE DIRECTORY data_dir AS '/home/oracle/wkdir'; CREATE OR REPLACE DIRECTORY log_dir AS '/tmp'; CREATE TABLE sales_delta_XT ( PROD_ID NUMBER, CUST_ID NUMBER, TIME_ID DATE, CHANNEL_ID CHAR(2), PROMO_ID NUMBER, QUANTITY_SOLD NUMBER(3), AMOUNT_SOLD NUMBER(10,2) ) ORGANIZATION external ( TYPE oracle_loader DEFAULT DIRECTORY data_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII BADFILE log_dir:'sh_sales.bad' LOGFILE log_dir:'sh_sales.log_xt' FIELDS TERMINATED BY "|" LDRTRIM (prod_id, cust_id, time_id CHAR(11) DATE_FORMAT DATE MASK "DD-MON-YYYY", channel_id, promo_id, quantity_sold, amount_sold ) ) location ( 'salesDec01.dat' ) )REJECT LIMIT UNLIMITED NOPARALLEL;
ALTER TABLE sales_delta_xt location ( 'salesQ1.dat' );
3.
Fill the table with some data from the
sales fact table.
@load_stage_table3.sql
INSERT /*+ APPEND */ INTO sales_delta SELECT PROD_ID, CUST_ID, TIME_ID, case CHANNEL_ID when 'S' then 3 when 'T' then 9 when 'C' then 5 when 'I' then 4 when 'P' then 2 else 99 end, PROMO_ID, sum(QUANTITY_SOLD) quantity_sold, sum(AMOUNT_SOLD) amount_sold FROM SALES_DELTA_XT GROUP BY prod_id,time_id,cust_id,channel_id,promo_id;
4.
Create B-tree indexes on all dimension join columns.
Note the time needed for creation.
@cr_btree_idx.sql
set timing on CREATE INDEX sales_prod_local_bix ON sales_delta (prod_id) NOLOGGING COMPUTE STATISTICS ; CREATE INDEX sales_cust_local_bix ON sales_delta (cust_id) NOLOGGING COMPUTE STATISTICS ; CREATE INDEX sales_time_local_bix ON sales_delta (time_id) NOLOGGING COMPUTE STATISTICS ; CREATE INDEX sales_channel_local_bix ON sales_delta (channel_id) NOLOGGING COMPUTE STATISTICS ; CREATE INDEX sales_promo_local_bix ON sales_delta (promo_id) NOLOGGING COMPUTE STATISTICS ;
On your system, you can see that you needed 3 to 5 seconds
to create the B-tree indexes.
5.
Use SQL to store the SIZES
in a table for a direct comparison. To do this, select the actual segment
sizes of the B-tree indexes from the data dictionary.
@cr_compare_tab.sql
DROP TABLE compare_idx_size;
CREATE TABLE compare_idx_size
AS
SELECT segment_name index_name,'STANDARD BTREE' index_type,
sum(bytes)/(1024*1024) index_size
FROM user_segments us, user_indexes ui
WHERE us.segment_name=ui.index_name
AND ui.table_name='SALES_DELTA'
GROUP BY segment_name, index_type;
6.
Determine what the size of the indexes would be if you
had used static bitmap indexes. First you have to clean up the environment
by running cleanup_idx_comp.sql.
@cleanup_idx_comp.sql
DROP INDEX sales_prod_local_bix;
DROP INDEX sales_cust_local_bix;
DROP INDEX sales_time_local_bix;
DROP INDEX sales_channel_local_bix;
DROP INDEX sales_promo_local_bix;
7.
Create static bitmap indexes on all dimension join columns.
Note that the time required for creation is considerably less than that of B-tree indexes
@cr_bitmap_idx.sql
Set timing on
CREATE BITMAP INDEX sales_prod_local_bix
ON sales_delta (prod_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_cust_local_bix
ON sales_delta (cust_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_time_local_bix
ON sales_delta (time_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_channel_local_bix
ON sales_delta (channel_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_promo_local_bix
ON sales_delta (promo_id)
NOLOGGING COMPUTE STATISTICS ;
Note: The creation of the bitmap indexes is much
faster than the creation of the equivalent B-tree indexes.
8.
Complete the comparison table, so that you can use SQL
to investigate the differences between B-tree and bitmap indexes.
@fill_comp_table.sql
INSERT INTO compare_idx_size
SELECT segment_name index_name, 'BITMAP',sum(bytes)/(1024*1024)
FROM user_segments us, user_indexes ui
WHERE us.segment_name=ui.index_name
AND ui.table_name='SALES_DELTA'
GROUP BY segment_name;
COMMIT;
9.
Compare the different sizes of those indexes. To show
all the facts, you can run comp_idx1.sql
as user SH.
@comp_idx1.sql
select substr(a.index_name,1,23) "Index Name",
a.index_type "Index Type",
a.index_size "Size [MB]",
b.index_type "Index Type",
b.index_size "Size [MB]",
'btree ' || trunc((b.index_size/a.index_size),2) ||
' times bigger than bitmap' "btree X times larger"
FROM compare_idx_size a, compare_idx_size b
WHERE a.index_name=b.index_name
AND a.index_type='BITMAP'
AND b.index_type='STANDARD BTREE'
ORDER BY 6 asc;
For a shorter version, run the following:
@comp_idx2.sql
SELECT substr(a.index_name,1,23) "Index Name",
'btree ' || trunc((b.index_size/a.index_size),2) ||
' times bigger than bitmap' "btree X times larger"
FROM compare_idx_size a, compare_idx_size b
WHERE a.index_name=b.index_name
AND a.index_type='BITMAP'
AND b.index_type='STANDARD BTREE'
ORDER BY 2 asc;
Depending on the cardinality of the indexed column,
a bitmap index is normally up to 30 times smaller than the equivalent
B-tree index; it can go even up to a factor of 50 to 60. Thinking about
high-end data warehousing system in Terabyte Ranges, even an average factor
of 5 saves a tremendous amount of disk space. Consider scanning a 250 GB
index versus a 50 GB index.
Furthermore, the creation time is much less than that of
B-tree indexes.
Providing real bitmap indexes is a crucial component
for any data warehousing environment where the data model has start or
snowflake schemata.
10.
Clean up the environment.
@cleanup_idx_test.sql
DROP TABLE compare_idx_size;
DROP TABLE sales_delta;
Improving Query Performance Using Oracle
Advanced Partitioning
Oracle Partitioning not only has benefits from a table maintenance
perspective, but also has a lot of mechanisms available to speed up query performance
transparently up to an order of magnitudes.
Query performance should never be the main design criteria
for designing the partitioned objects in your system. Oracle recommends to design
your partitioned objects to satisfy your system maintenance requirements, such
as load windows, backup and recovery time and volume constraints, or common
data warehousing tasks, such as rolling window operations.
Note: For more
information about common rolling window operations in data warehousing, see
the High-Speed Data Load and Rolling Window Operations tutorial.
Partition pruning, also called partition elimination, is a
very important optimization in big data warehouse projects. Large tables are
partitioned into smaller fragments called partitions. Only partitions that are
relevant to the user's statement are accessed and processed.
Partition pruning is the process used to filter only the necessary
partitions of a partitioned object to satisfy a query. It can be either static
or dynamic; dynamic partition pruning was introduced with Oracle8.1.6.
Static partition pruning takes place when
the optimizer can eliminate specific partitions at parse time, for example
a query predicate on the partitioning key column.
A predicate condition on
a table that is joined to the partitioned object on the partition
key column(s)
A sufficient selectivity for the predicate—and
therefore for the partition pruning—to make the additional recursive
SQL cheaper than processing the query without dynamic partition
pruning
Dynamic partition pruning also takes place when the predicate
on the join column cannot be determined at parse time and needs additional recursive
SQL. This happens, for example, when you’re using an "incomplete" DATE value,
such as TO_DATE(‘01-JAN-00’,’DD-MON-RR’). In this example, you need a recursive
statement to determine the century to complete the DATE value.
Static Partition Pruning
The following example demonstrates static partition pruning
with a predicate condition on the partition key column.
1.
From a SQL*Plus session logged on to the SH schema,
run plan_static_pruning.sql,
or copy the following SQL statements into your SQL*Plus session:
@plan_static_pruning.sql
truncate table plan_table;
DELETE FROM plan_table; COMMIT; explain plan for SELECT c.channel_desc, sum(amount_sold) FROM sales s, channels c WHERE s.channel_id = c.channel_id AND s.time_id >= to_date('04-JAN-2000','DD-MON-YYYY') AND s.time_id <= to_date('22-FEB-2000','DD-MON-YYYY') GROUP BY channel_desc;
Rem show plan set linesize 140 select * from table(dbms_xplan.display);
Look at the PSTART and PSTOP
columns to see information about partition pruning. You see that the query
has to access only one partition for getting the result, namely partition
# 13.
2.
To get more information about the partition that must
be accessed, you can query the data dictionary with the appropriate partition
number.
@select_part_name.sql
SELECT partition_name
FROM user_tab_partitions
WHERE table_name='SALES'
AND partition_position=13
The sales fact
table of the sample schema is range partitioned on the time_id
column. Joining the sales fact
table with the times dimension
table over time_id satisfies
the first criteria, so that dynamic partition pruning can take place.
1.
From a SQL*Plus session logged on to the SH schema,
run plan_dyn_pruning.sql,
or copy the following SQL statements into your SQL*Plus session:
@plan_dyn_pruning.sql
DELETE FROM plan_table; COMMIT; EXPLAIN PLAN FOR SELECT /*+ norewrite */ t.calendar_month_desc , p.prod_subcategory , c.cust_city , sum(s.amount_sold) AS dollars FROM sales s , times t , products p , customers c WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id AND s.cust_id = c.cust_id AND c.cust_city='Ravensburg' AND p.prod_category in ('Hardware') AND t.fiscal_year in (2000,1999) GROUP BY p.prod_subcategory, calendar_month_desc , c.cust_city;
set linesize 140 Rem show the plan with dynamic partition pruning select * from table(dbms_xplan.display);
Unlike with static partition pruning, you will not see
any absolute numbers in the PSTART and PSTOP
columns, but the word KEY. This indicates that dynamic partition pruning
will take place for this query at run time.
2.
To show the recursive SQL statement that is issued for
dynamic partition pruning, run the following query against the plan_table:
@select_other
set long 400
SELECT other
FROM plan_table
WHERE other IS NOT NULL;
You can see how the Oracle database translates the original
predicate condition into an inline view for getting the appropriate partition
information.
Partitionwise joins reduce query response time by minimizing
the amount of data exchanged among parallel execution servers when joins execute
in parallel. This significantly reduces response time and improves the use of
both CPU and memory resources. In Oracle Real Application Cluster environments,
partitionwise joins also avoid or at least limit the data traffic over the interconnect,
which is the key to achieving good scalability for massive join operations.
Partitionwise joins can be full or partial. Oracle decides which type of join
to use.
Note: This tutorial does not cover all possibilities
of partitionwise operations in detail. It should give you a basic understanding
of partitionwise operations, and an understanding of how to read an execution
plan appropriately. For detailed information about partitionwise joins, see
the Data Warehousing Guide.
When executed in parallel, partitionwise joins reduce communications
overhead. This is because, in the default case, parallel execution of a join
operation by a set of parallel execution servers requires the redistribution
of each table on the join column into disjointed subsets of rows. These disjointed
subsets of rows are then joined pairwise by a single parallel execution server.
The Oracle database can avoid redistributing the partitions
because the two tables are already partitioned on the join column. This enables
each parallel execution server to join a pair of matching partitions.
This improved performance from using parallel execution
is even more noticeable in Oracle Real Application Cluster configurations
with internode parallel execution. Partitionwise joins dramatically reduce
interconnect traffic. This feature is useful for large DSS configurations
that use Oracle Real Application Clusters.
Reduction of Memory Requirements
Partitionwise joins require less memory than the equivalent
join operation of the complete data set of the tables being joined.
In the case of serial joins, the join is performed at the
same time on a pair of matching partitions. If data is evenly distributed
across partitions, the memory requirement is divided by the number of partitions.
There is no skew.
In the parallel case, memory requirements depend on the
number of partition pairs that are joined in parallel. For example, if the
degree of parallelism is 20 and the number of partitions is 100, five times less
memory is required because only 20 joins of two partitions are performed at
the same time. The fact that partitionwise joins require less memory has a
direct effect on performance. For example, the join probably does not need
to write blocks to disk during the build phase of a hash join.
1. Create Two Additional Hash-Partitioned
Tables and Indexes
One of the driving factors (besides parallel execution) that
the Oracle database uses to decide whether or not to perform partitionwise joins
is the physical structure of the objects that are joined. Two additional tables
are needed to demonstrate partitionwise joins.
1.
Create a hash-partitioned table with the same structure
as the customers dimension
table. From a SQL*Plus session logged on to the SH schema, run the following
SQL statements in your SQL*Plus session to create the additional structures:
@create_cust_hash.sql
DROP TABLE customers_hash; CREATE TABLE customers_hash ( cust_id NUMBER , cust_first_name VARCHAR2(20) , cust_last_name VARCHAR2(40) , cust_gender CHAR(1) , cust_year_of_birth NUMBER(4) , cust_marital_status VARCHAR2(20) , cust_street_address VARCHAR2(40) , cust_postal_code VARCHAR2(10) , cust_city VARCHAR2(30) , cust_city_id number , cust_state_province VARCHAR2(40) , cust_state_province_id number , country_id number , cust_main_phone_number VARCHAR2(25) , cust_income_level VARCHAR2(30) , cust_credit_limit NUMBER , cust_email VARCHAR2(30) , cust_total varchar2(14) , cust_total_id number , cust_src_id number , cust_eff_from date , cust_eff_to date , cust_valid varchar2(1) ) PCTFREE 5 PARTITION BY HASH (cust_id) (PARTITION h1_cust, PARTITION h2_cust,PARTITION h3_cust,PARTITION h4_cust, PARTITION h5_cust,PARTITION h6_cust,PARTITION h7_cust,PARTITION h8_cust, PARTITION h9_cust,PARTITION h10_cust,PARTITION h11_cust,PARTITION h12_cust, PARTITION h13_cust,PARTITION h14_cust,PARTITION h15_cust,PARTITION h16_cust);
CREATE UNIQUE INDEX customers_hash_pk ON customers_hash (cust_id) ;
ALTER TABLE customers_hash ADD ( CONSTRAINT customers_hash_pk PRIMARY KEY (cust_id) RELY ENABLE VALIDATE ) ;
2.
Create a composite range-hash-partitioned table with
the same structure as the sales
fact table.
@create_sales_hash.sql
DROP TABLE sales_hash; CREATE TABLE sales_hash ( prod_id NUMBER , cust_id NUMBER , time_id DATE , channel_id NUMBER , promo_id NUMBER , quantity_sold NUMBER(10,2) , amount_sold NUMBER(10,2) )PCTFREE 5 NOLOGGING PARTITION BY RANGE (time_id) SUBPARTITION BY HASH (cust_id) SUBPARTITIONS 16 (PARTITION SALES_HASH_1995 VALUES LESS THAN (TO_DATE('01-JAN-1996','DD-MON-YYYY')), PARTITION SALES_HASH_1996 VALUES LESS THAN (TO_DATE('01-JAN-1997','DD-MON-YYYY')), PARTITION SALES_HASH_1_1997 VALUES LESS THAN (TO_DATE('01-JUL-1997','DD-MON-YYYY')), PARTITION SALES_HASH_2_1997 VALUES LESS THAN (TO_DATE('01-JAN-1998','DD-MON-YYYY')), PARTITION SALES_HASH_Q1_1998 VALUES LESS THAN (TO_DATE('01-APR-1998','DD-MON-YYYY')), PARTITION SALES_HASH_Q2_1998 VALUES LESS THAN (TO_DATE('01-JUL-1998','DD-MON-YYYY')), PARTITION SALES_HASH_Q3_1998 VALUES LESS THAN (TO_DATE('01-OCT-1998','DD-MON-YYYY')), PARTITION SALES_HASH_Q4_1998 VALUES LESS THAN (TO_DATE('01-JAN-1999','DD-MON-YYYY')), PARTITION SALES_HASH_Q1_1999 VALUES LESS THAN (TO_DATE('01-APR-1999','DD-MON-YYYY')), PARTITION SALES_HASH_Q2_1999 VALUES LESS THAN (TO_DATE('01-JUL-1999','DD-MON-YYYY')), PARTITION SALES_HASH_Q3_1999 VALUES LESS THAN (TO_DATE('01-OCT-1999','DD-MON-YYYY')), PARTITION SALES_HASH_Q4_1999 VALUES LESS THAN (TO_DATE('01-JAN-2000','DD-MON-YYYY')), PARTITION SALES_HASH_Q1_2000 VALUES LESS THAN (TO_DATE('01-APR-2000','DD-MON-YYYY')), PARTITION SALES_HASH_Q2_2000 VALUES LESS THAN (TO_DATE('01-JUL-2000','DD-MON-YYYY')), PARTITION SALES_HASH_Q3_2000 VALUES LESS THAN (TO_DATE('01-OCT-2000','DD-MON-YYYY')), PARTITION SALES_HASH_Q4_2000 VALUES LESS THAN (TO_DATE('01-JAN-2001','DD-MON-YYYY')), PARTITION SALES_HASH_Q1_2001 VALUES LESS THAN (TO_DATE('01-APR-2001','DD-MON-YYYY')), PARTITION SALES_HASH_Q2_2001 VALUES LESS THAN (TO_DATE('01-JUL-2001','DD-MON-YYYY')), PARTITION SALES_HASH_Q3_2001 VALUES LESS THAN (TO_DATE('01-OCT-2001','DD-MON-YYYY')), PARTITION SALES_HASH_Q4_2001 VALUES LESS THAN (TO_DATE('01-JAN-2002','DD-MON-YYYY'))) ;
CREATE BITMAP INDEX sales_cust_hash_bix ON sales_hash (cust_id) LOCAL NOLOGGING;
You have set up two additional tables, customers_hash
and sales_hash, but they
do not contain any data. Instead of duplicating data from the source tables,
customers and sales, you will use export and import to create table statistics
without any data. This functionality was introduced with Oracle8i.
Object statistics are used by the Oracle optimizer to evaluate
execution plans. Being able to export and import statistics without the appropriate
data enables you to get identical optimizer behavior for a test or development
environment as in the large production system without the necessity of having
the same size and data.
Note: Another means of getting identical optimizer
behavior is to use Oracles plan stability capabilities. See the Performance
Guide and Reference.
1.
You havent gathered any statistics for the two
new objects yet:
@show_tab_stats.sql
SELECT table_name, num_rows
FROM user_tables
WHERE table_name in ('SALES_HASH','CUSTOMERS_HASH');
2.
Because you dont have any data
in those objects, and you dont plan to insert any, you need to import
existing statistics for those two objects.
@imp_tab_stats.sql
Rem import those statistics from STAT_TABLE
Rem file stat_table.dmp, must be imported at the beginning
exec dbms_stats.import_table_stats('sh','sales_hash',
stattab =>'stat_table', statid =>'HANDS_ON');
exec dbms_stats.import_table_stats('sh','customers_hash',
stattab =>'stat_table', statid =>'HANDS_ON')
3.
Examine the statistics for those tables
as if they contain a lot of data.
@show_tab_stats.sql
SELECT table_name, num_rows
FROM user_tables
WHERE table_name IN ('SALES_HASH','CUSTOMERS_HASH');
Note: The NOREWRITE hint is used to avoid any interference with possibly existing materialized views.
Alternatively, you can disable query_rewrite for this particular session, or for the complete instance. This is true for
all subsequent statements.
From a SQL*Plus session logged on to the SH schema,
run set_noparallel.sql,
or copy the following SQL statements into your SQL*Plus session to ensure
that you get serial execution plans:
@set_noparallel
ALTER TABLE sales NOPARALLEL;
ALTER TABLE sales_hash NOPARALLEL;
ALTER TABLE customers NOPARALLEL;
ALTER TABLE customers_hash NOPARALLEL;
Serial Non-Partitionwise Joins
Now join your sales range partitioned table sales
with the non-partitioned table customers.
1.
From a SQL*Plus session logged on to the SH schema,
run serial_nopwj.sql,
or copy the following SQL statements into your SQL*Plus session:
@serial_nopwj.sql
DELETE FROM plan_table; COMMIT;
explain plan for select /*+ use_hash(c,s) norewrite */ cust_last_name, sum(amount_sold) from sales s, customers c where s.cust_id = c.cust_id -- and s.cust_id in (10005,10004,10003) group by cust_last_name;
set linesize 140 Rem show the plan with non partition-wise join select * from table(dbms_xplan.display);
The plan shows full table access for both tables, sales and customers, and
a hash join. The row source PARTITION RANGE ALL is inside the HASH JOIN,
which means that the customers
table is joined with all partitions.
A full partitionwise join divides a large join into smaller
joins between a pair of partitions from the two joined tables. To use this feature,
you must equipartition both tables on their join keys by joining your composite
(range-hash) partitioned table, sales_hash,
with the hash-partitioned table, customers_hash.
Note that those two tables are equi-partitioned on the cust_id
join column.
1.
From a SQL*Plus session logged on to the SH schema,
run serial_pwj.sql, or copy the following SQL statements into your SQL*Plus session:
@serial_pwj.sql
DELETE FROM plan_table; COMMIT;
explain plan for select /*+ norewrite */ cust_last_name, sum(amount_sold) from sales_hash s, customers_hash c where s.cust_id = c.cust_id -- and s.cust_id in (10005,10004,10003) group by cust_last_name;
set linesize 140 Rem show the plan with full partition-wise join select * from table(dbms_xplan.display);
The plan looks slightly different than the nonpartitioned
plan. You see an additional operation, ‘PARTITION
HASH ALL’, in the plan. The row source is outside the join. You
can read this operation as specifying how to process the hash-join, or,
in pseudocode:
FOR LOOP over all partitions of customers_hash
DO
hash-join equivalent partitions
DONE
The NOREWRITE
hint is used to avoid any interference with possibly existing materialized views.
Alternatively, you could disable query_rewrite for this particular session, or for the complete instance. This is true for
all subsequent statements.
From a SQL*Plus session logged on to the SH schema,
run set_parallel4.sql,
or copy the following SQL statements into your SQL*Plus session to ensure
that you get serial execution plans:
@set_parallel4.sql
ALTER TABLE sales PARALLEL 4;
ALTER TABLE sales_hash PARALLEL 4;
ALTER TABLE customers PARALLEL 4;
ALTER TABLE customers_hash PARALLEL 4
Parallel Nonpartitionwise Joins
Now join your range-partitioned table "sales"
with the non-partitioned table "customers."
1.
From a SQL*Plus session logged on to the SH schema,
run par_nopwj.sql, or copy
the following SQL statements into your SQL*Plus session:
@par_nopwj.sql
DELETE FROM plan_table; COMMIT;
explain plan for select /*+ norewrite */ cust_last_name, sum(amount_sold) from sales s, customers c where s.cust_id = c.cust_id -- and s.cust_id in (10005,10004,10003) group by cust_last_name;
set linesize 140 Rem show the plan with non partition-wise join select * from table(dbms_xplan.display);
You can see that the statement is executed in parallel;
because there is no existing physical partitioning of one of the tables
which could be used for this query, no partitionwise join takes place.
Both tables are scanned in parallel. The smaller table
customers is broadcasted to all slaves working on the next part of the
parallel plan (ID 9). The next part of the parallel plan is performing
the HASH JOIN followed by a first SORT GROUP BY. Because all the slaves
are working on the complete result set, you have to redistribute
it (HASH based, ID 5) for the final SORT GROUP BY operation (ID 3).
If the query contains an additional ORDER
BY, you could see a range-based redistribution for feeding the
last SORT GROUP BY (PQ
Distribution method of ID 5).
Parallel execution of a full partitionwise join is a straightforward
parallelization of the serial execution. Instead of joining one partition pair
at a time, the partition pairs are joined in parallel by n query servers.
The picture above shows a parallel partitionwise join for
a hash-hash partitionwise join. In this tutorial, you are joining the hash-partitioned
table, customers_hash, with the
composite partitioned table, sales_hash.
The hash partitions for the sales_hash
table are composed of a set of 16 subpartitions, one from each range partition.
1.
From a SQL*Plus session logged on to the SH schema,
run par_fullpwj.sql, or
copy the following SQL statements into your SQL*Plus session:
@par_fullpwj.sql
alter session set "_parallel_broadcast_enabled"=FALSE; DELETE FROM plan_table; COMMIT;
explain plan for select /*+ norewrite */ cust_last_name, sum(amount_sold) from sales_hash s, customers_hash c where s.cust_id = c.cust_id -- and s.cust_id in (10005,10004,10003) group by cust_last_name;
set linesize 160 Rem show the plan with full partition-wise join select * from table(dbms_xplan.display);
Due to the small size of the objects, you have to disable
an optimization for parallel execution of small objects, the broadcasting
of results. In situations with small objects, a broadcasting to all slaves
might be the cheaper execution plan; the benefit of avoiding a redistribution
is higher than having larger result sets
for the subsequent parallel operations.
You see the same
PARTITION HASH ALL operation you know from a serial
full partitionwise join.
Furthermore, this plan shows one of the benefits of
partitionwise joins: When executed in parallel, partitionwise joins reduce
communications overhead. The Oracle database can avoid redistributing
the partitions because the two tables are already partitioned on the join
column. This enables each parallel execution server to join a pair of
matching partitions.
You see that the same slave set (Q 1,00), which is doing
the first SORT GROUP BY
(ID 6) operation, is also doing the table scans and the hash-join operation
for a partition pair (PQ Distribution method Parallel Combined
With Parent, PCWP). There is no data redistribution up to
this point in the execution plan.
Unlike full partitionwise joins, partial partitionwise joins
require you to partition only one table on the join key, not both tables. The
partitioned table is referred to as the reference table. The other table may
or may not be partitioned. Partial partitionwise joins are more common than
full partitionwise joins.
To execute a partial partitionwise join, the Oracle database
dynamically repartitions the other table based on the partitioning of the reference
table. Once the other table is repartitioned, the execution is similar to a
full partitionwise join.
Partial partitionwise joins are executed only in parallel.
A partitionwise join is more a distribution method prior to the join to improve
and speed up the efficiency of the join operation. Because you only have one
process executing the join in serial, there is no need—and no benefit—to
redistribute the nonpartitioned table dynamically before the join.
1.
From a SQL*Plus session logged on to the SH schema,
run par_partpwj.sql,
or copy the following SQL statements into your SQL*Plus session:
@par_partpwj.sql
DELETE FROM plan_table; COMMIT;
explain plan for select /*+ norewrite */ cust_last_name, sum(amount_sold) from sales_hash s, customers c where s.cust_id = c.cust_id -- and s.cust_id in (10005,10004,10003) group by cust_last_name;
set linesize 140 Rem show the plan with partial partition-wise join select * from table(dbms_xplan.display);
You see that customers
table is dynamically redistributed based on the join key column, cust_id,
in the same way that the sales_hash
table is subpartitioned on this column.
This is shown in the fact that the result sets of the
parallel table scan of the customers
table are fed to the HASH JOIN
operation (Q 1,01 - ID 8), the same slave set that does the parallel (partitionwise)
scan of the composite partitioned table, sales_hash.
The hash partitions for the sales_hash
table are composed of a set of 16 subpartitions, one from each range partition.
The DBMS_STATS
package was introduced with Oracle8i.
This package simplifies the gathering of statistics for complete databases,
schemas, or specific objects; most of those operations can be done in parallel.
DBMS_STATS is the recommended way of gathering statistics beginning with Oracle8.0.
Statistics could be gathered not only for complete database,
schemas, or tables; Oracle8i
introduced the terms STALE and EMPTY objects to specify additional granularities
for the objects of interest, where you want to get new statistics.
An EMPTY table is a table without any statistics
at all. Using options=GATHER EMPTY gathers statistics of all
these tables and its dependent indexes.
A table and its dependent index structures are considered
STALE as soon as more than 10% of the total data volume has been changed.
Invoking DBMS_STATS with
the option=GATHER STALE collects statistics for all objects
that have been changed by more than 10%.
Beginning with Oracle Database 10g, Oracle automatically monitors
the staleness of a table. In older releases you had to actively enable table
monitoring, either by specifying it as part of the CREATE_TABLE command or later
with an ALTER TABLE command.
The threshold value of 10% for tables to be considered stale
is not changeable; it is an empirical value derived by internal testings, and
works well for most customer situations. The threshold value can be regarded
as conservative, which means that the Oracle database then tends to regard an
object as stale sometimes earlier than necessary. If a specific application
environment needs more appropriate statistics, you can leverage the internally
tracked information about the changes of an object and implement your own procedure
for gathering statistics for your need.
Besides the highly sophisticated customer environments that
want to control everything by themselves, theres a high customer need
for more simplicity. With Oracle Database 10g,
you keep all the statistics of your system up-to-date with one single command,
thus reducing your statistics maintenance efforts dramatically. Its not
difficult to provide the optimizer appropriate statistics. By default, a scheduled
job is set up with every Oracle Database 10g installation that gathers statistics
regularly in predefined maintenance windows (WEEKNIGH_WINDOW, WEEKEND_WINDOW).
Beginning with Oracle9i,an option setting for using the DBMS_STATS package was implemented. The option is GATHER AUTO. Apart from the
schema name, thats the only parameter you have to pass to Oracle, and
it determines all other settings such as estimate_percentage
or the need of histograms automatically for you. It will collect statistics
of all tables without statistics and all objects considered stale.
To use this option, perform the following steps:
1.
From a SQL*Plus session logged on to the SH schema,
run tab_status_mon.sql, or copy the following SQL statements into your SQL*Plus session:
@tab_status_mon.sql
Rem actual tables and monitoring status SELECT table_name,
to_char(last_analyzed,'dd.mm.yy hh24:mi:ss') la,
num_rows, monitoring
FROM user_tables;
You see that with some exceptions (i.e. External Tables),
all tables are monitored by default. This is a new behavior in Oracle Database
10g. The monitoring is completely done in memory and does not impose any
overhead on the tables; you cannot switch off table monitoring.
3.
The user_tab_modifications data dictionary view lists all changes to your table with monitoring enabled.
@show_tab_modifications.sql SELECT * FROM user_tab_modifications;
Depending on what happened before on your system, you
see either more entries or less entries. Now you can use the single command for keeping
statistics up-to-date and see how the timing and number of objects, where
statistics are gathered, is changing.
4.
From a SQL*Plus session logged on to the SH schema,
run gather_auto.sql, or copy the following SQL statements into your SQL*Plus
session:
@gather_auto.sql
set serveroutput on
declare
list_of_objects dbms_stats.objectTab := dbms_stats.objectTab();
begin
dbms_output.enable(200000);
dbms_stats.gather_schema_stats('SH',options=>'GATHER AUTO',
objlist=>list_of_objects);
for i in 1 ..list_of_objects.count loop
dbms_output.put_line('updated:'||list_of_objects(i).objtype||' '||
list_of_objects(i).objname||' '||list_of_objects(i).partname);
end loop;
end;
/
Depending on the number of objects without any statistics
that you might have left over from previous tutorials, this may take up to
a couple of minutes.
Note that the schema name and the request for automatic
gathering of statistics to Oracle is passed. All other parameters are
derived internally.
5.
Perform DML on one of the tables.
@stat_dml1.sql
Rem some DML
Rem 499 rows
update products
set prod_min_price=prod_min_price-1
where prod_id < 2500;
commit;
You are modifying less than 10% of the data, so that
the threshold for gathering new statistics has not been reached.
6.
Take a look at what is in user_tab_modifications now. The
information about table modifications is flushed periodically from the
SGA unless manually requested. You won’t see any changes yet (unless you
did the last operations shortly before a periodic flush).
@show_tab_modifications.sql SELECT * FROM user_tab_modifications;
7.
Although the database periodically flushes the information
about modified objects from the SGA (and also internally every time when
DBMS_STATS is invoked), there’s a customer requirement to get the most
actual information for customer-specific usage. You can flush this information
manually "on demand" by issuing the following procedure:
@flush_monitoring.sql
Rem enforce flush of the modification info
Rem is called internally every time before GATHER statistics
exec dbms_stats.flush_database_monitoring_info
8.
The information about table modifications was now flushed
manually. You will now see the changes reflected.
@show_tab_modifications.sql
SELECT * FROM user_tab_modifications;
9.
Gather the statistics again.
@gather_auto.sql
set serveroutput on
declare
list_of_objects dbms_stats.objectTab := dbms_stats.objectTab();
begin
dbms_output.enable(200000);
dbms_stats.gather_schema_stats('SH',options=>'GATHER AUTO',
objlist=>list_of_objects);
for i in 1 ..list_of_objects.count loop
dbms_output.put_line('updated:'||list_of_objects(i).objtype||
' '||list_of_objects(i).objname||' '||list_of_objects(i).partname);
end loop;
end;
/
10.
Because the ratio of changed rows is below the threshold
of 10%, it runs very quickly and does not invoke any statistics gathering.
The modifications are still there.
@show_tab_modifications.sql
SELECT * FROM user_tab_modifications;
If you want to set up your own statistics gathering
mechanism based on application specific rules, you can do so and use
the information in user_tab_modifications. As soon as the shown objects
get new statistics, their entries will be deleted from the list of potentially
stale tables.
11.
Perform some more DML to pass the threshold of 10%.
@stat_dml2.sql
Rem note that the monitoring mechanism is agnostic about the content
- the following statement reverts the subsequent on update customers
set cust_credit_limit=cust_credit_limit-1
where country_id =
(select country_id from countries where country_name='France'); commit;
12.
Flush the information about monitored tables from the
SGA
The information about table modifications was now flushed
manually. You will see the changes reflected.
@show_tab_modifications.sql
SELECT * FROM user_tab_modifications;
The threshold of 10% modified data is now exceeded.
As soon as the automatic gathering of the statistics is run again, Oracle
automatically gathers the appropriate new set of statistics for the table
customers.
Before the statistics are gathered, you update the customers table
again and use Oracle9i Release 2 functionality of parallel DML capabilities
for nonpartitioned tables.
Oracle enhanced its parallel DML capabilities with every release
since its introduction in Oracle 8.0. The only restriction "left over" is that
an arbitrary DML operation won’t work in the presence of bitmap.
1.
Create a temporary table. From a SQL*Plus session, execute
the following script:.
@cr_cust_dml.sql
DROP TABLE cust_dml; CREATE TABLE cust_dml PARALLEL AS SELECT /*+ PARALLEL(c) */ * FROM customers c;
2.
Now issue a parallel UPDATE command against the nonpartitioned
table products.
@xpdml_on_cust.sql
ALTER SESSION ENABLE PARALLEL DML; COMMIT; EXPLAIN PLAN FOR update cust_dml set cust_credit_limit=cust_credit_limit-1;
SELECT * FROM TABLE(dbms_xplan.display);
Note that the UPDATE command is part of the parallel
operation (ID 3).
3.
Perform a parallel DML and control it with V$PO_SESSTAT.
@pdml_on_cust.sql
PROMPT Parallel DML against new table update cust_dml set cust_credit_limit=cust_credit_limit-1; COMMIT;
SELECT * FROM v$pq_sesstat
WHERE statistic in ('DML Parallelized','Allocation Height');
Finally, you use the GATHER AUTO functionality.
1.
Invoke the automated gathering of statistics again.
See that it takes longer, and new statistics for table customers is gathered.
@gather_auto.sql
set serveroutput on
declare
list_of_objects dbms_stats.objectTab := dbms_stats.objectTab();
begin
dbms_output.enable(200000);
dbms_stats.gather_schema_stats('SH',options=>'GATHER
AUTO',objlist=>list_of_objects);
for i in 1 ..list_of_objects.count loop
dbms_output.put_line('updated:'||list_of_objects(i).objtype||'
'||list_of_objects(i).objname||' '||list_of_objects(i).partname);
end loop;
end;
/
2.
The customers
table does not show up as a table with outstanding modifications anymore.
@show_tab_modifications.sql
SELECT *
FROM user_tab_modifications