OBE Home > 10gR2 Single > Business Intelligence

Using Basic Database Functionality for Data Warehousing

Purpose

In this tutorial, you use basic Oracle Database 10g functionality to query and improve performance in a data warehouse.

Time to Complete

Approximately 1 hour

Topics

This tutorial covers the following topics:

 Overview
 Prerequisites
 Implementing Schema Changes for the Sales History Schema
 Comparing Oracle's Star Query Transformations with Other Access Plans
 Using Bitmap Join Indexes
 Examining the Differences Between B-tree and Bitmap Indexes for Data Warehousing
 Improving Query Performance with Oracle Advanced Partitioning
 Using Partition Pruning
 Using Partitionwise Joins
 Using Automatic Gathering of Statistics
 Cleanup
 Summary

Viewing Screenshots

 Place the cursor over this icon to load and view all the screenshots for this tutorial . (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)

Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.

Overview

Oracle Database 10 g is the leading relational database for data warehousing and the first complete Business Intelligence platform. It not only addresses the basic core requirements of performance, scalability, and manageability, but also other data-relevant, back-end functionality around ETL processing, data analysis (OLAP), and data mining.

Oracle Database 10g Release2 is the robust and enhanced successor and provides significant enhancements to every facet of Oracle's relational capabilities, extending Oracle's lead in providing the most complete Business Intelligence platform.

Back to Topic List

Prerequisites

Before starting this tutorial, you should:

1.

Perform the Installing Oracle Database 10g on Windows tutorial.

2.

Download and unzip bdf.zip into your working directory (i.e. c:\wkdir).

Back to Topic List

Implementing Schema Changes for the Sales History Schema

Before starting the tasks for this tutorial, you need to implement some changes on the existing Sales History schema. Additional objects are necessary, and additional system privileges must be granted to the user SH. The SQL file for applying those changes is modifySH_10gR2.sql.

1.

Start a SQL *Plus session. Select Start > Programs > Oracle-OraDB10g_home > Application Development > SQL Plus.

(Note: This tutorial assumes you have an c:\wkdir folder. If you do not, you need to create one and unzip the contents of bdf.zip into this folder. While executing the scripts, paths are specified).

 

2. Log in as the SH user. Enter SH as the User Name and SH as the Password. Then click OK.


3.

Run the modifySH_10gR2.sql script from your SQL*Plus session.

@ c:\wkdir\modifySH_10gR2.sql

The bottom of your output should match the image below.

 

Back to Topic List

Comparing Oracle's Star Query Transformations with Other Access Plans

In this topic, you examine Oracle’s superior query execution capabilities for data warehousing, namely star transformation, which was first introduced in Oracle8.0. You compare the execution plans for a typical query in a star/snowflake-like environment and discuss the advantages of star transformation over other plans less suitable for handling large volumes of data in order of decreasing performance.

Note: Because of the small amount of data used in the hands-on activity, the great benefit of this transformation is not apparent. This is because the database or file system cache is hardly exceeded, so that I/O disadvantages of the other mechanisms are mostly eliminated. In addition, due to potentially different init.ora settings and statistics, you may encounter slightly different execution plans and costs in the plan output.

Understanding the Basic Mechanism of Oracle's Star Query Transformation

1.

From a SQL*Plus session logged on to the SH schema, run show_star1.sql , or copy SQL statements below into your SQL*Plus session.

The ALTER SESSION command shown below enables Oracle’s basic star query transformation capabilities without the use of temporary tables. Beginning with Oracle8 i , temporary tables might be used by the optimizer to further improve a star transformation. The next example illustrates the difference between a true star transformation (introduced in Oracle8 i ) and the behavior shown here.

Note: The NOREWRITE hint is used to avoid any interference with possibly existing materialized views. Alternatively, you could disable query_rewrite for this particular session, or the complete instance. This is true for all subsequent statements.

The STAR_TRANSFORMATION hint is used to enforce the ability within Oracle to use star transformation even with a small set of data, where another plan might be better.

                               
                                 
@c:\wkdir\show_star1.sql
                              
                              

PROMPT   let's disable the usage of TEMP TABLES to show simple star first
ALTER SESSION SET star_transformation_enabled=TEMP_DISABLE;
                                
show parameters star_transformation
DELETE FROM plan_table;
COMMIT;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */
t.calendar_month_desc
, p.prod_subcategory
, c.cust_city
, sum(s.amount_sold) AS dollars
FROM sales s
, times t
, products p
, customers c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.cust_id = c.cust_id
AND c.cust_city='Ravensburg'
AND p.prod_category in ('Hardware','Photo')
AND t.fiscal_year in (2000,1999)
GROUP BY p.prod_subcategory, calendar_month_desc
, c.cust_city;
PROMPT show plan
set linesize 150 set pagesize 100
select * from table(dbms_xplan.display);

Note: You can disregard the PSTART and PSTOP columns in the output for the moment. (These are located to the right of the Time column.) They are discussed later.

The plan shown above represents a typical star query transformation. The records in the fact table, those satisfying the query WHERE condition, are found by scanning only the bitmap index structures rather than the whole large sales fact table.

In a first, internal recursive step, the Oracle database selects all records of the three dimension tables (products, times, and channels), which satisfy the WHERE condition. You can see this table access of the dimension tables underneath the BITMAP KEY ITERATION row sources in the " Operation" column.

The Oracle database then uses the key values of these records as predicates for probing against the bitmap index structures of the sales fact table itself. You see that the predicates of the query—on the customers, products, and times dimension tables—are used for a high-selective preselection of only the relevant records of the sales table, using the bitmap index structures sales_prod_bix, sales_cust_bix, and sales_time_bix.

In a second step, the result set is joined back with all the dimension tables to get the final query result. These are all operations after PARTITION RANGE SUBQUERY shown in the " Operation" column.

 

2.

In this step, you are shown the basic mechanism of Oracle's Star Query Transformation with TEMP table Transformation.

From a SQL*Plus session logged on to the SH schema, run show_star2.sql, or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\show_star2.sql
                              
                              

                                 
ALTER SESSION SET star_transformation_enabled=TRUE;
show parameters star_transformation
DELETE FROM plan_table;
COMMIT;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */
t.calendar_month_desc
, p.prod_subcategory
, c.cust_city
, sum(s.amount_sold) AS dollars
FROM sales s
, times t
, products p
, customers c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.cust_id = c.cust_id
AND c.cust_city='Ravensburg'
AND p.prod_category in ('Hardware','Photo')
--AND p.prod_category in ('Hardware')
AND t.fiscal_year in (2000,1999)
GROUP BY p.prod_subcategory, calendar_month_desc
, c.cust_city;

set linesize 140
select * from table(dbms_xplan.display);

This plan looks similar to the one shown before, although it is not identical. The difference between the plans is that the Oracle database now uses a table named sys_temp_xxx (which is not part of the query) for satisfying the SQL statement. Note that this temporary table name is system-generated and will vary.

The optimizer evaluated the selectivity of the WHERE conditions and the size of the dimension tables. It detected that there is a high selectivity on the customers table and that because this table is "large" enough", the costs for creating a temporary table, consisting of the result set for the predicate on the customers table, is cheaper than accessing the customers table twice, as in the first plan without temporary table usage. The temp table is then used instead of the customers table itself.

The creation of this temporary table and the data insertion are automatic and shown in the plan itself.

 

3.

Next, you use the STAR hint to enforce Oracle 7 behavior.

From a SQL*Plus session logged on to the SH schema, run show_star3.sql, or copy the following SQL statements into your SQL*Plus session.

                               
                                 
@c:\wkdir\show_star3.sql
                                
                              

PROMPT   STAR JOIN TRANSFORMATION - 7.3 BEHAVIOR
                                
Rem show plan with star join transformation and discuss it
DELETE FROM plan_table;
COMMIT;
EXPLAIN PLAN FOR
SELECT /*+ norewrite STAR */
t.calendar_month_desc
, p.prod_subcategory
, c.cust_city
, sum(s.amount_sold) AS dollars
FROM sales s
, times t
, products p
, customers c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.cust_id = c.cust_id
AND c.cust_city='Ravensburg'
--AND p.prod_category in ('Hardware')
AND p.prod_category in ('Hardware','Photo')
AND t.fiscal_year in (2000,1999)
GROUP BY p.prod_subcategory, calendar_month_desc
, c.cust_city;
Rem show plan
set linesize 140
select * from table(dbms_xplan.display);

The above plan output shows the so-called star query optimization, which was introduced in Oracle7. To avoid several joins with the large fact table, the optimizer builds a Cartesian product of the times, products, and customers dimension tables and joins the Cartesian result once with the sales fact table.

Note: This star optimization technique has nothing to do with the current star query transformation, and its usage is not dependent on the setting of the star_transformation_enabled parameter.

 

4.

Next, you issue a typical star query without any SQL processing optimization.

From a SQL*Plus session logged on to the SH schema, run show_star4.sql, or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\show_star4.sql
                              
                              

PROMPT   NO STAR TRANSFORMATION - WORST CASE
                                
Rem show plan without star transformation and discuss it
alter session set star_transformation_enabled=false;
set sqlprompt "no STAR - SQL> "
DELETE FROM plan_table;
COMMIT;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */
t.calendar_month_desc
, p.prod_subcategory
, c.cust_city
, sum(s.amount_sold) AS dollars
FROM sales s
, times t
, products p
, customers c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.cust_id = c.cust_id
AND c.cust_city='Ravensburg'
--AND p.prod_category in ('Hardware')
AND p.prod_category in ('Hardware','Photo')
AND t.fiscal_year in (2000,1999)
GROUP BY p.prod_subcategory, calendar_month_desc
, c.cust_city;
PROMPT show plan
set linesize 140
select * from table(dbms_xplan.display);

From a join perspective, this plan has no optimizations for data warehousing. Subsequently, it joins the four tables. Please note that the complete sales fact table needs to be processed because there are no predicates defined on the table.

 

5.

From a SQL*Plus session logged on to the SH schema, run reset_star_test.sql, or copy SQL statements below into your SQL*Plus session.

This script resets the session environment to your initial settings with an enabled star query
transformation, which is the recommended setting for a data warehousing environment.

                               
                                 
@c:\wkdir\reset_star_test.sql
                              
                            
                               
PROMPT   BACK TO NORMALITY
                                
alter session set star_transformation_enabled=TRUE;
set sqlprompt "SQL> "

 

Back to Topic List

Using Bitmap Join Indexes

Bitmap join indexes were introduced in Oracle9 i . A join index is an index on one table that uses columns of one or more different tables through a join.

The volume of data that must be joined can be reduced if join indexes are used because the joins have already been precalculated. In addition, join indexes that contain multiple dimension tables can eliminate bitwise operations that are necessary in the star transformation with existing bitmap indexes.

For more information on about bitmap join indexes, refer to the Oracle Data Warehousing Guide.

1.

Create a bitmap join index on fact table sales, for a joined attribute from the products dimension table. From a SQL*Plus session logged on to the SH schema, run cr_bj_idx.sql, or copy the following SQL statements into your SQL*Plus session:

(Note: This script may take a few minutes to run.)

                               
                                 
@c:\wkdir\cr_bj_idx.sql

                                                            
DROP INDEX bji_sales_cust_city;
                                 
CREATE BITMAP INDEX bji_sales_cust_city on sales(c.cust_city)
FROM sales s, customers c
WHERE s.cust_id = c.cust_id
LOCAL NOLOGGING COMPUTE STATISTICS;

 

2.

Look at the execution plan of your query again and compare it with the original star transformation plan in the previous section. For the sake of simplicity, the usage of temp tables for star transformation is disabled.

                               
                                 
@c:\wkdir\plan_bj_idx.sql

ALTER SESSION SET star_transformation_enabled=TEMP_DISABLE;
                                
DELETE FROM plan_table;
COMMIT;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */
t.calendar_month_desc
, p.prod_subcategory
, c.cust_city
, sum(s.amount_sold) AS dollars
FROM sales s
, times t
, products p
, customers c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.cust_id = c.cust_id
AND c.cust_city='Ravensburg'
AND p.prod_category in ('Hardware','Photo')
--AND p.prod_category in ('Hardware')
AND t.fiscal_year in (2000,1999)
GROUP BY p.prod_subcategory, calendar_month_desc
, c.cust_city;
Rem show plan
set linesize 140
select * from table(dbms_xplan.display);
ALTER SESSION SET star_transformation_enabled=TRUE;

When comparing this plan with the equivalent star transformation plan, notice that you do not have to query the products dimension table for probing the bitmap index on the sales fact table. The predicate in the query is on the prod_category column, which is stored in the bitmap join index, thus making the join for the inner part of the star transformation.

Another benefit of the bitmap join index is the CPU (and IO) savings gained by the removal of the bitmap key iteration work (which has to merge all of the bitmaps together).

 

Back to Topic List

Examining the Differences Between B-tree and Bitmap Indexes for Data Warehousing

Real (persistent) bitmap indexes were introduced with Oracle 7.3 and represent the core foundation for Oracle's star query transformation, being optimized for set operations. A B-tree index stores a list of row IDs for each key corresponding to the rows with that key value; a bitmap index, however, stores a bitmap for each key value instead of a list of row IDs.

Bitmap indexes are stored in a compressed format; if the number of different key values (cardinality) is small, then bitmap indexes will be very space efficient and on average 20 to 30 times smaller compared to the equivalent B-tree index structure.

Bitmap Indexes provide the following benefits for data warehousing:

 Reduced response time for large classes of ad hoc queries
 Substantial reduction of space usage compared to other indexing techniques
 Dramatic performance gains even on very low-end hardware
 Very efficient parallel DML and loads

In this section, you create a new table with the structure of a typical fact table in a data warehouse environment, representing a subset of our sales transaction fact data. To experience the advantages of bitmap indexes you will measure the time for the index creation as well as the space usage of bitmap indexes versus B-tree indexes. Additionally, you create an index for each dimension join column, once as a B-tree and again as a bitmap index.

1.

From a SQL*Plus session logged on to the SH schema, run create_stage_table.sql, or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\create_stage_table.sql

DROP TABLE sales_delta;

CREATE TABLE sales_delta 
   NOLOGGING AS 
   SELECT * 
   FROM   sales 
   WHERE  1=0;
                              
                            

 

2.

If you have not completed the High Speed Data Loading and Rolling Window Operations tutorial yet, you need to create an external table to be able to load some test data. From a SQL*Plus session logged on to the SH schema, run create_ext_tab_for_bdf.sql, or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\create_ext_tab_for_bdf.sql

CREATE OR REPLACE DIRECTORY log_dir AS '/tmp';
                                
CREATE TABLE sales_delta_XT
(
PROD_ID NUMBER,
CUST_ID NUMBER,
TIME_ID DATE,
CHANNEL_ID CHAR(2),
PROMO_ID NUMBER,
QUANTITY_SOLD NUMBER(3),
AMOUNT_SOLD NUMBER(10,2)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY data_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
BADFILE log_dir:'sh_sales.bad'
LOGFILE log_dir:'sh_sales.log_xt'
FIELDS TERMINATED BY "|" LDRTRIM
(prod_id, cust_id,
time_id CHAR(11) DATE_FORMAT DATE MASK "DD-MON-YYYY",
channel_id, promo_id, quantity_sold, amount_sold
)
)
location
(
'salesDec01.dat'
)
)REJECT LIMIT UNLIMITED NOPARALLEL;
ALTER TABLE sales_delta_xt location ( 'salesQ1.dat' );

 

3.

Fill the table with some data from the sales fact table.

                               
                                 
@c:\wkdir\load_stage_table3.sql
                              
                              


INSERT /*+ APPEND */ INTO sales_delta
                                
SELECT
PROD_ID,
CUST_ID,
TIME_ID,
case CHANNEL_ID
when 'S' then 3
when 'T' then 9
when 'C' then 5
when 'I' then 4
when 'P' then 2
else 99
end,
PROMO_ID,
sum(QUANTITY_SOLD) quantity_sold,
sum(AMOUNT_SOLD) amount_sold
FROM SALES_DELTA_XT
GROUP BY prod_id,time_id,cust_id,channel_id,promo_id;

 

4.

Create B-tree indexes on all dimension join columns. Note the time needed for creation.

                               
                                 
@c:\wkdir\cr_btree_idx.sql
                              
                              

set timing on
                                
CREATE INDEX sales_prod_local_bix
ON sales_delta (prod_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE INDEX sales_cust_local_bix
ON sales_delta (cust_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE INDEX sales_time_local_bix
ON sales_delta (time_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE INDEX sales_channel_local_bix
ON sales_delta (channel_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE INDEX sales_promo_local_bix
ON sales_delta (promo_id)
NOLOGGING COMPUTE STATISTICS ;

On your system, you can see that you needed 3 to 7 seconds to create the B-tree indexes.

 

5.

Use SQL to store the SIZES in a table for a direct comparison. To do this, select the actual segment sizes of the B-tree indexes from the data dictionary.

                               
                                 
@c:\wkdir\cr_compare_tab.sql
                              
                              

DROP TABLE compare_idx_size;

CREATE TABLE compare_idx_size
   AS
   SELECT   segment_name index_name,'STANDARD BTREE' index_type,
            sum(bytes)/(1024*1024) index_size
   FROM     user_segments us, user_indexes ui
   WHERE    us.segment_name=ui.index_name
   AND      ui.table_name='SALES_DELTA'
   GROUP BY segment_name, index_type;
                            

6.

Determine what the size of the indexes would be if you had used static bitmap indexes. First you have to clean up the environment by running cleanup_idx_comp.sql.

                               
                                 
@c:\wkdir\cleanup_idx_comp.sql
                                
                              
          

DROP INDEX sales_prod_local_bix;
DROP INDEX sales_cust_local_bix;
DROP INDEX sales_time_local_bix;
DROP INDEX sales_channel_local_bix;
DROP INDEX sales_promo_local_bix;
                            

 

7.

Create static bitmap indexes on all dimension join columns. Note that the time required for creation is considerably less than that of B-tree indexes

                               
                                 
@c:\wkdir\cr_bitmap_idx.sql
                              
                              

Set timing on
CREATE BITMAP INDEX sales_prod_local_bix
   ON sales_delta (prod_id)
   NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_cust_local_bix
   ON sales_delta (cust_id)
   NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_time_local_bix
   ON sales_delta (time_id)
   NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_channel_local_bix
   ON sales_delta (channel_id)
   NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_promo_local_bix
   ON sales_delta (promo_id)
NOLOGGING COMPUTE STATISTICS ;
                            

Note: The creation of the bitmap indexes is much faster than the creation of the equivalent B-tree indexes.

 

8.

Fill the former created comparison table, so that you can use SQL to investigate the differences between B-tree and bitmap indexes.

                               
                                 
@c:\wkdir\fill_comp_table.sql
                              
                              

INSERT INTO compare_idx_size
   SELECT   segment_name index_name, 'BITMAP',sum(bytes)/(1024*1024)
   FROM     user_segments us, user_indexes ui
   WHERE    us.segment_name=ui.index_name 
   AND      ui.table_name='SALES_DELTA'
   GROUP BY segment_name;

COMMIT;
                            

 

9.

Compare the different sizes of those indexes. To show all the facts, run the script comp_idx1.sql from your SQL*Plus session.

                               
                                 
@c:\wkdir\comp_idx1.sql
                              
                              

COLUMN "Index Name" format a23
COLUMN "Index Type" format a14
COLUMN "btree X times larger" format a36

SELECT substr(a.index_name,1,23) "Index Name",
       a.index_type "Index Type",
       a.index_size "Size [MB]",
       b.index_type "Index Type",
       b.index_size "Size [MB]",
       'btree ' || trunc((b.index_size/a.index_size),2) ||
       ' times bigger than bitmap' "btree X times larger"
FROM   compare_idx_size a, compare_idx_size b
WHERE  a.index_name=b.index_name
AND    a.index_type='BITMAP'
AND    b.index_type='STANDARD BTREE'
ORDER BY 6 asc;
                            

For a shorter version, run the following:

                               
                                 
                                   
@c:\wkdir\comp_idx2.sql
                                
                                


SELECT substr(a.index_name,1,23) "Index Name",
       'btree ' || trunc((b.index_size/a.index_size),2) || 
       ' times bigger than bitmap' "btree X times larger"
FROM   compare_idx_size a, compare_idx_size b
WHERE  a.index_name=b.index_name 
AND    a.index_type='BITMAP' 
AND    b.index_type='STANDARD BTREE' 
ORDER BY 2 asc;
                              
                            

Depending on the cardinality of the indexed column, a bitmap index is normally up to 30 times smaller than the equivalent B-tree index; it can go even up to a factor of 50 to 60. Consider a data warehousing system in Terabyte ranges, even an average factor of five saves a tremendous amount of disk space and potential work load. Consider scanning a 250 GB index versus a 50 GB index.

Furthermore, the creation time is much less than that of B-tree indexes.

Providing real bitmap indexes is a crucial component for any data warehousing environment where the data model has start or snowflake schemata.

 

10.

Clean up the environment.

                               
                                 
@c:\wkdir\cleanup_idx_test.sql
                              
                              

DROP TABLE compare_idx_size;
DROP TABLE sales_delta;
                            

 

Back to Topic List

Improving Query Performance Using Oracle Advanced Partitioning

Oracle Partitioning not only has benefits from a table maintenance perspective, but also has a lot of mechanisms available to speed up query performance transparently up to an order of magnitudes.

Query performance normally does not represent the main decision criteria for designing the partitioned objects in your system. Oracle recommends to design your partitioned objects to satisfy your system maintenance requirements, such as load windows, backup and recovery time and volume constraints, or common data warehousing tasks, such as rolling window operations. In many cases, such a partitioning approach is very close or even identical to the one that would have been chosen for a performance driven strategy.

Note: For more information about common rolling window operations in data warehousing, see the High-Speed Data Load and Rolling Window Operations tutorial.

 Using Partition Pruning

Using Partitionwise Joins

Using Partition Pruning

Partition pruning, also called partition elimination, is a very important optimization method used in big data warehouse projects. Large tables are partitioned into smaller fragments called partitions. Oracle ensures that only partitions that are relevant to the user's statement are accessed and processed whenever possible.

Partition pruning is the process to identify only the necessary partitions of a partitioned object that are needed to satisfy a query, either a query compilation time(static pruning) or dynamically at runtime.

Static partition pruning takes place when the optimizer can eliminate specific partitions at parse time, for example a query predicate on the partitioning key column.

Dynamic partition pruning can be as simple as a bind variable replacement at runtime or as complex as spawning additional recursive SQL to identify the appropriate partitions. Dynamic partition pruning takes place at query run time.

Generically, the cases of advanced pruning are taking place when the following criteria are satisfied:

A predicate condition on a table which is joined to the partitioned object on the partition key column(s)

A sufficient selectivity for the predicate - and therefore for the partition pruning - to make the additional
recursive SQL cheaper than processing the query without dynamic partition pruning.

Dynamic partition pruning also takes place when the predicate on the join column cannot be
determined at parse time and needs simple additional recursive SQL. This happens for example when you are using an ‘incomplete’ DATE value, such as TO_DATE( ‘01-JAN- 00’,’DD-MON-RR’), or when you rely on implicit data type conversion for the DATE data type. In this situation, a recursive statement is needed to determine the century to complete the DATE value.

Static Partition Pruning

The following example demonstrates static partition pruning with a predicate condition on the partition key column.

1.

From a SQL*Plus session logged on to the SH schema, run plan_static_pruning.sql , or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\plan_static_pruning.sql
                              
                              

truncate table plan_table;
DELETE FROM plan_table;
                                
COMMIT;
explain plan for
SELECT c.channel_desc, sum(amount_sold)
FROM sales s, channels c
WHERE s.channel_id = c.channel_id
AND s.time_id >= to_date('04-JAN-2000','DD-MON-YYYY')
AND s.time_id <= to_date('22-FEB-2000','DD-MON-YYYY')
GROUP BY channel_desc;
Rem show plan
set linesize 140
select * from table(dbms_xplan.display);

Look at the PSTART and PSTOP columns to see information about partition pruning. You see that the query has to access only one partition for getting the result, namely partition # 13.

 

2.

To obtain more information about the partition that must be accessed, you can query the data dictionary with the appropriate partition number.

                               
                                 
@c:\wkdir\select_part_name.sql

SELECT partition_name
FROM   user_tab_partitions 
WHERE  table_name='SALES' 
AND    partition_position=13;
                              
                            

 

Back to Topic

Dynamic Partition Pruning

The sales fact table of the sample schema is range partitioned on the time_id column. Joining the sales fact table with the times dimension table over time_id satisfies the first criteria, so that dynamic partition pruning can take place.

1.

From a SQL*Plus session logged on to the SH schema, run plan_dyn_pruning.sql, or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\plan_dyn_pruning.sql
                              
                              

DELETE FROM plan_table;
                                
COMMIT;
EXPLAIN PLAN FOR
SELECT /*+ norewrite */
t.calendar_month_desc
, p.prod_subcategory
, c.cust_city
, sum(s.amount_sold) AS dollars
FROM sales s
, times t
, products p
, customers c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND s.cust_id = c.cust_id
AND c.cust_city='Ravensburg'
AND p.prod_category in ('Hardware')
AND t.fiscal_year in (2000,1999)
GROUP BY p.prod_subcategory, calendar_month_desc
, c.cust_city;
set linesize 140
Rem show the plan with dynamic partition pruning
select * from table(dbms_xplan.display);

Unlike with static partition pruning, you will not see any absolute numbers in the PSTART and PSTOP columns, but the word KEY (SQ). This indicates that dynamic partition pruning will take place for this query at run time.

The results show you not only what kind of dynamic pruning has happened, but also how the dynamic pruning is taken place.

 

2.

To show the recursive SQL statement that is issued for dynamic partition pruning, run the following query against the plan_table:

                               
                                 
@c:\wkdir\select_other.sql
                              
                              

set long 400

SELECT other 
FROM   plan_table 
WHERE  other IS NOT NULL;
                            

You can see how the Oracle database translates the original predicate condition into an inline view for getting the appropriate partition information.

You can see that the results not only tells you what kind of dynamic pruning has happenend but also HOW the dynamic pruning is taken place. In this example, the dynamic pruning is based on a SUBQUERY, which can be identified either in the plan (Id 7) or in the PSTART/PSTOP column; KEY(SQ) is the abbreviation for KEY(SUBQUERY).

 

Back to Topic

Back to Topic List

Using Partitionwise Joins

Partitionwise joins reduce query response time by minimizing the amount of data exchanged among parallel execution servers when joins execute in parallel. This significantly reduces response time and improves the use of both CPU and memory resources. In Oracle Real Application Cluster environments, partitionwise joins also avoid or at least limit the data traffic over the interconnect, which is the key to achieving good scalability for massive join operations. Partitionwise joins can be full or partial. The Oracle server decides which type of join to use.

Note: This tutorial does not cover all possibilities of partitionwise operations in detail. It should give you a basic understanding of partitionwise operations, and an understanding of how to read an execution plan appropriately. For detailed information about partitionwise joins, see the Data Warehousing Guide.

Steps

1. Create two additional hash-partitioned tables and indexes.
2.

Import statistics for those two tables.

3.

Examine serial partitionwise joins.

4.

Examine parallel partitionwise joins.

5.

Clean up.

Benefits of Partitionwise Joins

Partitionwise joins have the following benefits:

Reduction of Communications Overhead

When executed in parallel, partitionwise joins reduce communications overhead. This is because, in the default case, parallel execution of a join operation by a set of parallel execution servers requires the redistribution of each table on the join column into disjointed subsets of rows. These disjointed subsets of rows are then joined pairwise by a single parallel execution server.

The Oracle database can avoid redistributing the partitions because the two tables are already partitioned on the join column. This enables each parallel execution server to join a pair of matching partitions.

This improved performance from using parallel execution is even more noticeable in Oracle Real Application Cluster configurations with internode parallel execution. Partitionwise joins dramatically reduce interconnect traffic. This feature is useful for large DSS configurations that use Oracle Real Application Clusters.

Reduction of Memory Requirements

Partitionwise joins require less memory than the equivalent join operation of the complete data set of the tables being joined.

In the case of serial joins, the join is performed at the same time on a pair of matching partitions. If data is evenly distributed across partitions, the memory requirement is divided by the number of partitions. There is no skew.

In the parallel case, memory requirements depend on the number of partition pairs that are joined in parallel. For example, if the degree of parallelism is 20 and the number of partitions is 100, five times less memory is required because only 20 joins of two partitions are performed at the same time. The fact that partitionwise joins require less memory has a direct effect on performance. For example, the join probably does not need to write blocks to disk during the build phase of a hash join.

Back to Topic List

1. Create Two Additional Hash-Partitioned Tables and Indexes

One of the driving factors (besides parallel execution) that the Oracle database uses to decide whether or not to perform partitionwise joins is the physical structure of the objects that are joined. Two additional tables are needed to demonstrate partitionwise joins.

1.

Create a hash-partitioned table with the same structure as the customers dimension table. From a SQL*Plus session logged on to the SH schema, run the following SQL statements in your SQL*Plus session to create the additional structures:

                               
@
                                
c:\wkdir\create_cust_hash.sql

DROP TABLE customers_hash;
                                
CREATE TABLE customers_hash
( cust_id NUMBER
, cust_first_name VARCHAR2(20)
, cust_last_name VARCHAR2(40)
, cust_gender CHAR(1)
, cust_year_of_birth NUMBER(4)
, cust_marital_status VARCHAR2(20)
, cust_street_address VARCHAR2(40)
, cust_postal_code VARCHAR2(10)
, cust_city VARCHAR2(30)
, cust_city_id number
, cust_state_province VARCHAR2(40)
, cust_state_province_id number
, country_id number
, cust_main_phone_number VARCHAR2(25)
, cust_income_level VARCHAR2(30)
, cust_credit_limit NUMBER
, cust_email VARCHAR2(30)
, cust_total varchar2(14)
, cust_total_id number
, cust_src_id number
, cust_eff_from date
, cust_eff_to date
, cust_valid varchar2(1)
)
PCTFREE 5
PARTITION BY HASH (cust_id)
(PARTITION h1_cust, PARTITION h2_cust,PARTITION h3_cust,PARTITION h4_cust,
PARTITION h5_cust,PARTITION h6_cust,PARTITION h7_cust,PARTITION h8_cust,
PARTITION h9_cust,PARTITION h10_cust,PARTITION h11_cust,PARTITION h12_cust,
PARTITION h13_cust,PARTITION h14_cust,PARTITION h15_cust,PARTITION h16_cust);
CREATE UNIQUE INDEX customers_hash_pk
ON customers_hash (cust_id) ;
ALTER TABLE customers_hash
ADD ( CONSTRAINT customers_hash_pk
PRIMARY KEY (cust_id) RELY ENABLE VALIDATE
) ;

 

2.

Create a composite range-hash-partitioned table with the same structure as the sales fact table.

                               
                                 
@c:\wkdir\create_sales_hash.sql
                              
                              

DROP TABLE sales_hash;
                                
CREATE TABLE sales_hash
( prod_id NUMBER
, cust_id NUMBER
, time_id DATE
, channel_id NUMBER
, promo_id NUMBER
, quantity_sold NUMBER(10,2)
, amount_sold NUMBER(10,2)
)PCTFREE 5 NOLOGGING
PARTITION BY RANGE (time_id)
SUBPARTITION BY HASH (cust_id)
SUBPARTITIONS 16
(PARTITION SALES_HASH_1995 VALUES LESS THAN (TO_DATE('01-JAN-1996','DD-MON-YYYY')),
PARTITION SALES_HASH_1996 VALUES LESS THAN (TO_DATE('01-JAN-1997','DD-MON-YYYY')),
PARTITION SALES_HASH_1_1997 VALUES LESS THAN (TO_DATE('01-JUL-1997','DD-MON-YYYY')),
PARTITION SALES_HASH_2_1997 VALUES LESS THAN (TO_DATE('01-JAN-1998','DD-MON-YYYY')),
PARTITION SALES_HASH_Q1_1998 VALUES LESS THAN (TO_DATE('01-APR-1998','DD-MON-YYYY')),
PARTITION SALES_HASH_Q2_1998 VALUES LESS THAN (TO_DATE('01-JUL-1998','DD-MON-YYYY')),
PARTITION SALES_HASH_Q3_1998 VALUES LESS THAN (TO_DATE('01-OCT-1998','DD-MON-YYYY')),
PARTITION SALES_HASH_Q4_1998 VALUES LESS THAN (TO_DATE('01-JAN-1999','DD-MON-YYYY')),
PARTITION SALES_HASH_Q1_1999 VALUES LESS THAN (TO_DATE('01-APR-1999','DD-MON-YYYY')),
PARTITION SALES_HASH_Q2_1999 VALUES LESS THAN (TO_DATE('01-JUL-1999','DD-MON-YYYY')),
PARTITION SALES_HASH_Q3_1999 VALUES LESS THAN (TO_DATE('01-OCT-1999','DD-MON-YYYY')),
PARTITION SALES_HASH_Q4_1999 VALUES LESS THAN (TO_DATE('01-JAN-2000','DD-MON-YYYY')),
PARTITION SALES_HASH_Q1_2000 VALUES LESS THAN (TO_DATE('01-APR-2000','DD-MON-YYYY')),
PARTITION SALES_HASH_Q2_2000 VALUES LESS THAN (TO_DATE('01-JUL-2000','DD-MON-YYYY')),
PARTITION SALES_HASH_Q3_2000 VALUES LESS THAN (TO_DATE('01-OCT-2000','DD-MON-YYYY')),
PARTITION SALES_HASH_Q4_2000 VALUES LESS THAN (TO_DATE('01-JAN-2001','DD-MON-YYYY')),
PARTITION SALES_HASH_Q1_2001 VALUES LESS THAN (TO_DATE('01-APR-2001','DD-MON-YYYY')),
PARTITION SALES_HASH_Q2_2001 VALUES LESS THAN (TO_DATE('01-JUL-2001','DD-MON-YYYY')),
PARTITION SALES_HASH_Q3_2001 VALUES LESS THAN (TO_DATE('01-OCT-2001','DD-MON-YYYY')),
PARTITION SALES_HASH_Q4_2001 VALUES LESS THAN (TO_DATE('01-JAN-2002','DD-MON-YYYY')))
;
CREATE BITMAP INDEX sales_cust_hash_bix
ON sales_hash (cust_id)
LOCAL NOLOGGING;

 

Back to Topic

2. Import Statistics for Those Two Tables

You have set up two additional tables, customers_hash and sales_hash, but they do not contain any data. Instead of duplicating data from the source tables, customers and sales, you will use export and import to create table statistics without any data. This functionality was introduced with Oracle8 i.

Object statistics are used by the Oracle optimizer to evaluate execution plans. Being able to export and import statistics without the appropriate data enables you to get identical optimizer behavior for a test or development environment as in the large production system without the necessity of having the same size and data.

Note: Another means of getting identical optimizer behavior is to use Oracle’s plan stability capabilities. See the Performance Guide and Reference.

1.

You haven’t gathered any statistics for the two new objects yet:

                               
                                 
@c:\wkdir\show_tab_stats.sql
                              
                              

SELECT table_name, num_rows 
FROM user_tables
WHERE  table_name in ('SALES_HASH','CUSTOMERS_HASH');
                            

 

2. Because you don’t have any data in those objects, and you don’t plan to insert any, you need to import existing statistics for those two objects.
                               
                                 
@c:\wkdir\imp_tab_stats.sql
                              
                              

Rem import those statistics from STAT_TABLE
Rem file stat_table.dmp, must be imported at the beginning
exec dbms_stats.import_table_stats('sh','sales_hash', -
            stattab =>'stat_table', statid =>'HANDS_ON');

exec dbms_stats.import_table_stats('sh','customers_hash', -
            stattab =>'stat_table', statid =>'HANDS_ON')
                            

 

3. Examine the statistics for those tables as if they contain a lot of data.
                               
                                 
@c:\wkdir\show_tab_stats.sql
                              
                              


SELECT table_name, num_rows 
FROM   user_tables
WHERE  table_name IN ('SALES_HASH','CUSTOMERS_HASH');
                            

 

Back to Topic

3. Examine Serial Partitionwise Joins

There are two ways to use serial partitionwise joins. These include:

 Serial Non-Partitionwise Joins

Serial Full Partitionwise Joins

Note: The NOREWRITE hint is used to avoid any interference with possibly existing materialized views. Alternatively, you can disable query_rewrite for this particular session, or for the complete instance. This is true for all subsequent statements.

From a SQL*Plus session logged on to the SH schema, run set_noparallel.sql, or copy the following SQL statements into your SQL*Plus session to ensure that you get serial execution plans:

                               
                                 
@c:\wkdir\set_noparallel

ALTER TABLE sales NOPARALLEL;
ALTER TABLE sales_hash NOPARALLEL;
ALTER TABLE customers NOPARALLEL;
ALTER TABLE customers_hash NOPARALLEL;
                              
                            

 

Serial Non-Partitionwise Joins

Now join your sales range partitioned table sales with the non-partitioned table customers.

1.

From a SQL*Plus session logged on to the SH schema, run serial_nopwj.sql, or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\serial_nopwj.sql
                              
                              

DELETE FROM plan_table;
                                
COMMIT;
explain plan for
select /*+ use_hash(c,s) norewrite */ cust_last_name, sum(amount_sold)
from sales s, customers c
where s.cust_id = c.cust_id
-- and s.cust_id in (10005,10004,10003)
group by cust_last_name;
set linesize 140
Rem show the plan with non partition-wise join
select * from table(dbms_xplan.display);

T he plan shows full table access for both tables, sales and customers, an d a hash join. The row source PARTITION RANGE ALL is inside the HASH JOIN, which means that the customers table is joined with all partitions.

Back to Subtopic

Serial Full Partitionwise Joins

A full partitionwise join divides a large join into smaller joins between a pair of partitions from the two joined tables. To use this feature, you must equipartition both tables on their join keys by joining your composite (range-hash) partitioned tabl e, sales_hash, with the hash-partitioned table, customers_hash. Note that those two tables are equi-partitioned on the cust_id join column.

1.

From a SQL*Plus session logged on to the SH schema, run serial_pwj.sql, or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\serial_pwj.sql
                              
                              

DELETE FROM plan_table;
                                
COMMIT;
explain plan for
select /*+ norewrite */ cust_last_name, sum(amount_sold)
from sales_hash s, customers_hash c
where s.cust_id = c.cust_id
-- and s.cust_id in (10005,10004,10003)
group by cust_last_name;
set linesize 140
Rem show the plan with full partition-wise join
select * from table(dbms_xplan.display);

The plan looks slightly different than the nonpartitioned plan. You see an additional operation, � PARTITION HASH ALL�, in the plan. The row source is outside the HASH JOIN. You can read this operation as specifying how to process the hash-join, or, in pseudocode:

                               
                                 
FOR LOOP over all partitions of customers_hash
DO 
  hash-join equivalent partitions 
DONE
                              
                            

This full partitionwise join is done in serial.

Back to Subtopic

Back to Topic

4. Examine Parallel Partitionwise Joins

There are three ways to use parallel partitionwise joins. These include:

 Non-Partitionwise Joins

Full Partitionwise Joins

Partial Partitionwise Joins

The NOREWRITE hint is used to avoid any interference with possibly existing materialized views. Alternatively, you could disable query_rewrite for this particular session, or for the complete instance. This is true for all subsequent statements.

From a SQL*Plus session logged on to the SH schema, run set_parallel4.sql, or copy the following SQL statements into your SQL*Plus session to ensure that you get serial execution plans:

                               
                                 
@set_parallel4.sql
                              
                              

ALTER TABLE sales PARALLEL 4;
ALTER TABLE sales_hash PARALLEL 4;
ALTER TABLE customers PARALLEL 4;
ALTER TABLE customers_hash PARALLEL 4
                            

Nonpartitionwise Joins

Now join your range-partitioned table " sales" with the non-partitioned table " customers ."

1.

From a SQL*Plus session logged on to the SH schema, run par_nopwj.sql, or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\par_nopwj.sql
                              
                              
                                 
explain plan for
select /*+ norewrite */ cust_last_name, sum(amount_sold)
from sales s, customers c
where s.cust_id = c.cust_id
-- and s.cust_id in (10005,10004,10003)
group by cust_last_name;
set linesize 140
Rem show the plan with non partition-wise join
select * from table(dbms_xplan.display);

You can see that the statement is executed in parallel; because there is no existing physical partitioning of one of the tables which could be used for this query, no partitionwise join takes place.

Both tables are scanned in parallel. The smaller table customers is broadcasted to all slaves working on the next part of the parallel plan (Id 9). The next part of the parallel plan is performing the HASH JOIN followed by a first HASH GROUP BY. Because all the slaves are working on the complete result set, you have to redistribute it ( HASH based, Id 5) for the final HASH GROUP BY operation ( Id 3).

If the query contains an additional ORDER BY, you see a RANGE based redistribution for feeding the last SORT GROUP BY (PQ Distribution method of ID 5). Redistributing the data RANGE based optimizes the plan by eliminating a final order operation - the order is guaranteed by concatenating the results of the parallel slaves in a specific order ( Id 2).

 

Back to Subtopic

Full Partitionwise Joins

Parallel execution of a full partitionwise join is a straightforward parallelization of the serial execution. Instead of joining one partition pair at a time, the partition pairs are joined in parallel by n query servers.

The picture above shows a parallel partitionwise join for a hash-hash partitionwise join. In this tutorial, you are joining the hash-partitioned table, customers_hash, with the composite partitioned table, sales_hash. The hash partitions for the sales_hash table are composed of a set of 16 subpartitions, one from each range partition.

1.

From a SQL*Plus session logged on to the SH schema, run par_fullpwj.sql, or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\par_fullpwj.sql
                              
                              

alter session set "_parallel_broadcast_enabled"=FALSE;
                                
DELETE FROM plan_table;
COMMIT;
explain plan for
select /*+ norewrite */ cust_last_name, sum(amount_sold)
from sales_hash s, customers_hash c
where s.cust_id = c.cust_id
-- and s.cust_id in (10005,10004,10003)
group by cust_last_name;
set linesize 160
Rem show the plan with full partition-wise join
select * from table(dbms_xplan.display);

Due to the small size of the objects, you have to disable an optimization for parallel execution of small objects, the broadcasting of results. In situations with small objects, a broadcasting to all slaves might be the cheaper execution plan; the benefit of avoiding a redistribution is higher than having larger result sets for the subsequent parallel operations.

You see the same 'PARTITION HASH ALL' operation you know from a serial full partitionwise join.

Furthermore, this plan shows one of the benefits of partitionwise joins: When executed in parallel, partitionwise joins reduce communications overhead. The Oracle database can avoid redistributing the partitions because the two tables are already partitioned on the join column. This enables each parallel execution server to join a pair of matching partitions.

You see that the same slave set (Q 1,00), which is doing the first HASH GROUP BY ( ID 6) operation, is also doing the table scans and the hash-join operation for a partition pair ( PQ Distribution method Parallel Combined With Parent/Child, PCWP or PCWC). There is no data redistribution up to this point in the execution plan.

 

Back to Subtopic

Partial Partitionwise Joins

Unlike full partitionwise joins, partial partitionwise joins require you to partition only one table on the join key, not both tables. The partitioned table is referred to as the reference table. The other table may or may not be partitioned. Partial partitionwise joins are more common than full partitionwise joins and do not require an equipartitioning of the tables.

To execute a partial partitionwise join, the Oracle database dynamically repartitions the other table based on the partitioning of the reference table. Once the other table is repartitioned, the execution is similar to a full partitionwise join.

Partial partitionwise joins are executed only in parallel. A partitionwise join is more a distribution method prior to the join to improve and speed up the efficiency of the join operation. Because you only have one process executing the join in serial, there is no need&mdash;and no benefit&mdash;to redistribute the nonpartitioned table dynamically before the join.

1.

From a SQL*Plus session logged on to the SH schema, run par_partpwj.sql, or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\par_partpwj.sql
                              
                              


explain plan for
                                
select /*+ norewrite */ cust_last_name, sum(amount_sold)
from sales_hash s, customers c
where s.cust_id = c.cust_id
-- and s.cust_id in (10005,10004,10003)
group by cust_last_name;
set linesize 140
Rem show the plan with partial partition-wise join
select * from table(dbms_xplan.display);

You see that customers table is dynamically redistributed based on the join key column, cust_id, in the same way that the sales_hash table is subpartitioned on this column.

This is shown in the fact that the result sets of the parallel table scan of the customers table are fed to the HASH JOIN operation (Q 1,01 - ID 7), the same slave set that does the parallel (partitionwise) scan of the composite partitioned table, sales_hash.

The hash partitions for the sales_hash table are composed of a set of 16 subpartitions, one from each range partition.

 

Back to Subtopic

Back to Topic

5. Clean up

Before you move to the next topic, you need to clean up the environment. Perform the following step:

From a SQL*Plus session logged on to the SH schema, run cleanup_mod3.sql , or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\cleanup_mod3.sql
                              
                              

ALTER TABLE sales NOPARALLEL;
                                
DROP TABLE sales_hash;
ALTER TABLE customers NOPARALLEL;
DROP TABLE customers_hash;

 

Back to Topic List

Using Automatic Gathering of Statistics

The DBMS_STATS package was introduced with Oracle8 i. This package simplifies the gathering of statistics for complete databases, schemas, or specific objects; most of those operations can be done in parallel. DBMS_STATS is the recommended way of gathering statistics and to be preferred over the old ANALYZE command.

Statistics could be gathered not only for complete database, schemas, or tables; Oracle8 i introduced the terms STALE and EMPTY objects to specify additional granularities for the objects of interest, where you want to get new statistics.

 An EMPTY table is a table without any statistics at all. Using options=’ GATHER EMPTY’ gathers statistics of all these tables and its dependent indexes.

A table and its dependent index structures are considered STALE as soon as more than 10% of the total data volume has been changed. Invoking DBMS_STATS with the option=’ GATHER STALE’ collects statistics for all objects that have been changed by more than 10% .

Beginning with Oracle Database 10 g , Oracle automatically monitors the staleness of a table. In older releases you had to actively enable table monitoring, either by specifying it as part of the CREATE_TABLE command or later with an ALTER TABLE command.

The threshold value of 10% for tables to be considered stale is not changeable; it is an empirical value derived by internal testings, and works well for most customer situations. The threshold value can be regarded as conservative, which means that the Oracle database then tends to regard an object as stale sometimes earlier than necessary. If a specific application environment needs more appropriate statistics, you can leverage the internally tracked information about the changes of an object and implement your own procedure for gathering statistics for your need.

Besides the highly sophisticated customer environments that want to control everything by themselves, there is a high customer need for more simplicity. With Oracle Database 10g, you keep all the statistics of your system up-to-date with one single command, thus reducing your statistics maintenance efforts dramatically. It’s not difficult to provide the optimizer appropriate statistics. By default, a scheduled job is set up with every Oracle Database 10g installation that gathers statistics regularly in predefined maintenance windows ( WEEKNIGH_WINDOW, WEEKEND_WINDOW).

Beginning with Oracle9 i, an option setting for using the DBMS_STATS package was implemented. The option is ‘ GATHER AUTO’. Apart from the schema name, that’s the only parameter you have to pass to Oracle, and it determines all other settings such as estimate_percentage or the need of histograms automatically for you. It will collect statistics of all tables without statistics and all objects considered stale.

To use this option, perform the following steps:

1.

From a SQL*Plus session logged on to the SH schema, run tab_status_mon.sql, or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\tab_status_mon.sql
                              
                              

Rem actual tables and monitoring status
                                
SELECT table_name, to_char(last_analyzed,'dd.mm.yy hh24:mi:ss') la, num_rows, monitoring FROM user_tables;

You see that with some exceptions (i.e. External Tables), all tables are monitored by default. This is a new behavior in Oracle Database 10g. The monitoring is completely done in memory and does not impose any overhead on the tables; you cannot switch off table monitoring.

 

2.

Th e user_tab_modifications d ata dictionary view lists all changes to your table with monitoring enabled.

                               
                                 
@c:\wkdir\show_tab_modifications.sql
                              
                              
                                 
COLUMN table_name FORMAT a20 COLUMN subpartion_name FORMAT a20 SELECT * FROM user_tab_modifications;

Depending on what happened before on your system, you see either more entries or less entries. Now you can use the single command for keeping statistics up-to-date and see how the timing and number of objects, where statistics are gathered, is changing.

 

3.

From a SQL*Plus session logged on to the SH schema, run gather_auto.sql, or copy the following SQL statements into your SQL*Plus session:

                               
                                 
@c:\wkdir\gather_auto.sql
                              
                              

set serveroutput on
declare
   list_of_objects dbms_stats.objectTab := dbms_stats.objectTab();
begin
   dbms_output.enable(200000);
   dbms_stats.gather_schema_stats('SH',options=>'GATHER AUTO',
      objlist=>list_of_objects);
   for i in 1 ..list_of_objects.count loop
   dbms_output.put_line('updated:'||list_of_objects(i).objtype||' '||
   list_of_objects(i).objname||' '||list_of_objects(i).partname);
   end loop;
end;
/
                            

Depending on the number of objects without any statistics that you might have left over from previous tutorials, this may take up to a couple of minutes.

Note: The schema name and the request for automatic gathering of statistics to Oracle is passed. All other parameters are derived internally.

 

4.

Perform DML on one of the tables.

                               
                                 
@c:\wkdir\stat_dml1.sql
                              
                              

                                 
Rem 3833 rows
                                  
UPDATE customers
SET cust_credit_limit=cust_credit_limit+1
WHERE country_id =
(SELECT country_id
FROM countries
WHERE country_name='France'); COMMIT;

You are modifying less than 10% of the data, so that the threshold for gathering new statistics has not been reached.

 

5.

Take a look at what is in user_tab_modifications now. The information about table modifications is flushed periodically from the SGA unless manually requested. You won�t see any changes yet (unless you did the last operations shortly before a periodic flush).

                               
                                 
@c:\wkdir\show_tab_modifications.sql
                              
                              
                                 
COLUMN table_name FORMAT a20 COLUMN subpartion_name FORMAT a20 SELECT * FROM user_tab_modifications;

 

6.

Although the database periodically flushes the information about modified objects from the SGA (and also internally every time when DBMS_STATS is invoked), there�s a customer requirement to get the most actual information for customer-specific usage. You can flush this information manually "on demand" by issuing the following procedure:

                               
                                 
@c:\wkdir\flush_monitoring.sql
                              
                              

Rem enforce flush of the modification info
Rem is called internally every time before GATHER statistics
exec dbms_stats.flush_database_monitoring_info
                            

 

7.

The information about table modifications was now flushed manually. You will now see the changes reflected.

                               
                                 
@c:\wkdir\show_tab_modifications.sql
                              
                              

                                 
COLUMN table_name FORMAT a20
COLUMN subpartion_name FORMAT a20

SELECT * FROM user_tab_modifications;
                                

 

8.

Gather the statistics again.

                               
                                 
@c:\wkdir\gather_auto.sql
                              
                              

set serveroutput on
declare
   list_of_objects dbms_stats.objectTab := dbms_stats.objectTab();
begin
   dbms_output.enable(200000);
   dbms_stats.gather_schema_stats('SH',options=>'GATHER AUTO',
   objlist=>list_of_objects);
   for i in 1 ..list_of_objects.count loop
   dbms_output.put_line('updated:'||list_of_objects(i).objtype||
   ' '||list_of_objects(i).objname||' '||list_of_objects(i).partname);
   end loop;
end;
/
                            

 

9.

Because the ratio of changed rows is below the threshold of 10%, it runs very quickly and does not invoke any statistics gathering. The modifications are still there.

                               
                                 
@c:\wkdir\show_tab_modifications.sql
                              
                              

                                 
                                   
COLUMN table_name FORMAT a20
COLUMN subpartion_name FORMAT a20
                                

SELECT * FROM user_tab_modifications;
                              
                            

If you want to set up your own statistics gathering mechanism based on application specific rules, you can do so and use the information in user_tab_modifications. As soon as the shown objects get new statistics, their entries will be deleted from the list of potentially stale tables.

 

10.

Perform some more DML to pass the threshold of 10%.

                               
                                 
@c:\wkdir\stat_dml2.sql
                              
                              

Rem note that the monitoring mechanism is agnostic about the content
 - the following statement reverts the subsequent on
                                
update customers set cust_credit_limit=cust_credit_limit-1 where country_id = (select country_id from countries where country_name='France');
commit;

11.

Flush the information about monitored tables from the SGA

                               
                                 
@c:\wkdir\flush_monitoring.sql
                              
                              

exec dbms_stats.flush_database_monitoring_info
                            

Note that the last DML reverted the previous one.

 

12.

The information about table modifications was now flushed manually. You see the changes reflected.

                               
@
                              
                                 
c:\wkdir\show_tab_modifications.sql
                              
                              

                                 
                                   
                                     
COLUMN table_name FORMAT a20
COLUMN subpartion_name FORMAT a20
                                  
                                

SELECT * FROM user_tab_modifications;
                              
                            

The threshold of 10% modified data is now exceeded. As soon as the automatic gathering of the statistics is run again, Oracle automatically gathers the appropriate new set of statistics for the table customers.

 

Before the statistics are gathered, you update the customers table again and use Oracle functionality of parallel DML capabilities for nonpartitioned tables. With the 10g Release 2, the limit of having a maximum DOP on a per-segment base in the presence of bitmap indexes is lifted.

1.

Create a temporary table. From a SQL*Plus session, execute the following script: .

                               
                                 
@c:\wkdir\cr_cust_dml.sql
                                
DROP TABLE cust_dml;
CREATE TABLE cust_dml PARALLEL AS SELECT /*+ PARALLEL(c) */ * FROM customers c;

 

2.

Now issue a parallel UPDATE command against the nonpartitioned table products.

                               
                                 
@c:\wkdir\xpdml_on_cust.sql
                              
                              

ALTER SESSION ENABLE PARALLEL DML;
                                
COMMIT;
EXPLAIN PLAN FOR update cust_dml set cust_credit_limit=cust_credit_limit-1;
SELECT * FROM TABLE(dbms_xplan.display);

Note that the UPDATE command is part of the parallel operation (Id 3).The plan also shows you that the index maintenance is done in parallel as part of the DML operation.

 

3.

Perform a parallel DML and control it with V$PO_SESSTAT.

                               
@c:\wkdir\pdml_on_cust.sql
                              

PROMPT Parallel DML against new table
update cust_dml set cust_credit_limit=cust_credit_limit-1;
COMMIT;
SELECT * FROM v$pq_sesstat WHERE statistic in ('DML Parallelized','Allocation Height');

 

4.

Finally, you use the GATHER AUTO functionality.

Invoke the automated gathering of statistics again. See that it takes longer, and new statistics for table customers is gathered.

                               
                                 
@c:\wkdir\gather_auto.sql
                              
                              

set serveroutput on
declare
   list_of_objects dbms_stats.objectTab := dbms_stats.objectTab();
begin
   dbms_output.enable(200000);
   dbms_stats.gather_schema_stats('SH',options=>'GATHER
      AUTO',objlist=>list_of_objects);
   for i in 1 ..list_of_objects.count loop
   dbms_output.put_line('updated:'||list_of_objects(i).objtype||'
   '||list_of_objects(i).objname||' '||list_of_objects(i).partname);
   end loop;
end;
/
                            

 

5.

The customers table does not show up as a table with outstanding modifications anymore.

                               
                                 
@c:\wkdir\show_tab_modifications.sql
                              
                              

                                 
                                   
                                     
                                       
COLUMN table_name FORMAT a20
COLUMN subpartion_name FORMAT a20
                                    
                                  
                                

SELECT * FROM   user_tab_modifications
                              
                            

 

 

Back to Topic List

Cleanup

To clean up your environment, you need to perform the following step:

From a SQL*Plus session logged on to the SH schema, execute the following commands:

SET SERVEROUTPUT ON

EXEC dw_handsOn.cleanup_modules

 

Back to Topic List

Summary

In this tutorial, you learned how to:

 Compare Oracle's Star Query Transformations with other access plans
 Use bitmap join indexes
 Examine differences between B-tree and bitmap Indexes for datawarehousing
 Become familiar with Oracle advanced partitioning mechanisms for improved query performance
 Use automatic gathering of statistics

Back to Topic List

 Place the cursor on this icon to hide all screenshots.

 



In-Memory Replay Banner