Legal | Privacy
High-speed Data Loading and Rolling Window Operations
 
 

High-speed Data Loading and Rolling Window Operations

Module Objectives

Purpose

In this module, you will learn how to use Oracle9i for high-speed data loading and leveraging Oracle Partitioning for a rolling window operation.

Objectives

After completing this module, you should be able to:

Load and transform data in parallel using External Tables, introduced in Oracle9i Release 1
Compare the usage of SQL*Loader to external tables for a simple load and transformation process
Perform a Rolling Window Operation leveraging Oracle Partitioning
As part of the Rolling Window Operation we will compress the oldest partitions to save space in our data warehouse environment. This is new Oracle9i Release 2 functionality

Prerequisites

Before starting this lesson, you should have:

Completed the Preinstallation module

Completed the Install Oracle9i Database module

Completed the Postinstallation module

Completed the Review the Sample Schema module
Completed the Setup Data Warehousing lesson
Downloaded etl1.zip and unzipped it into your working directory

Reference Material

The following is a list of useful reference materials if you want additional information about the topics in this module:

Documentation: Data Warehousing Guide


Overview

Most of the time, OLTP (Source) systems, which feed the Data Warehouse, are not directly connected to the data warehousing system for extracting new data. Commonly, those OLTP systems send data feeds in the form of external files. This data must be loaded into the Data Warehouse, preferable in parallel, thus leveraging the existing resources.

Furthermore, due to the business needs of our sample company MyCompany, only the data of the last 3 years are relevant for the analysis needs. This means that with the insertion of new data, old data has to be purged. The maintenance of this so-called rolling window operation is done with Oracle Partitioning.

External Tables

To load the external files into their data warehouse, MyCompany uses the Oracle9i external table feature, which allows external data, such as flat files, to be exposed within the database just like a regular database table. External tables can be accessed that SQL, so that external files can be queried directly and in parallel using the full power of SQL, PL/SQL, and Java. External tables are often used in the Extraction, Transformation, and Loading (ETL) process to combine data-transformations (through SQL) with data-loading into a single step. External tables are a very powerful feature with many possible applications in ETL and other database environments where flat files are processed; it is an alternative for using SQL*Loader.

Parallel Execution

Parallel execution dramatically reduces response time for data-intensive operations on large databases typically associated with decision support systems (DSS) and data warehouses. You can also implement parallel execution on certain types of online transaction processing (OLTP) and hybrid systems. Simply expressed, parallelism is the idea of breaking down a task so that, instead of one process doing all of the work in a query, many processes do part of the work at the same time. An example of this is when four processes handle four different quarters in a year instead of one process handling all four quarters by itself.

Rolling Window Operations

A very important task in the backoffice of a data warehouse is to keep it synchronized with the various changes which are taking place in the OLTP (source) systems. In addition, the life span of the data from an analysis perspective is very often limited, so that older data must be purged from the target system while new data is loaded; this operation is often called a rolling window operation. This should be done as fast as possible without any implication for the concurrent online access of the data warehousing system.


Loading Data by Using External Tables

In this section of the lesson, you will load data into the Data Warehouse using External Tables, introduced in Oracle9i Release 1.

Loading data using external tables

To show you how external tables can be created and used, perform the following steps:

1.

Create the necessary directory objects.

2.

Create the external table.

3. Change external file.
4. Select from the external table.
5. Parallel access of external tables.
6. Oracle9i’s new parallel insert capabilities.
7. Perform parallel insert.

1. Create the Necessary Directory Objects

Before you can create the external table, you will need to create a directory object in the database that will point to the directory on the file system where the data files will reside. Optionally, you can separate the location for the log, bad and discard files from the location of the data files. To create the directory, perform the following:

1.

From a SQL*Plus session logged on to the SH schema, run create_directory.sql, or copy the following SQL statement into your SQL*Plus session:

@create_directory.sql

DROP DIRECTORY data_dir;
DROP DIRECTORY log_dir;
CREATE DIRECTORY data_dir AS 'D:\HandsOn_STUDENTS\RDBMS\FILES';
CREATE DIRECTORY log_dir AS 'D:\TEMP';


2. Create the External Table

When creating an external table, you are defining two parts of information:

1.

The metadata information for the table representation inside the database

2.

The HOW access parameter definition to extract the data from the external file

After the creation of this meta information, the external data can be accessed from within the database, without the necessity of an initial load.

To create the external table, you perform the following steps:

1.

From a SQL*Plus session logged on to the SH schema, run @create_external_table.sql. The results are as follows:

@create_external_table
set echo on
Rem *****
Rem CREATE EXTERNAL TABLE
Rem *****

CREATE TABLE sales_delta_xt 
  (
   prod_id NUMBER(6),
   cust_id NUMBER,
   time_id DATE,
   channel_id CHAR(1),
   promo_id NUMBER(6),
   quantity_sold NUMBER(3),
   amount_sold NUMBER(10,2)
  )
  ORGANIZATION external 
  (
   TYPE oracle_loader
   DEFAULT DIRECTORY data_dir
   ACCESS PARAMETERS 
    (
     RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
     BADFILE log_dir:'sh_sales_%p.bad'
     LOGFILE log_dir:'sh_sales_%p.log_xt'
     FIELDS TERMINATED BY "|" LDRTRIM 
    )
   location 
    (
     'salesDelta.dat'
    )
   )REJECT LIMIT UNLIMITED NOPARALLEL;


3. Change External Files

Because an external table points to external data files without storing any actual data, the files can be changed without affecting the external table. In our lesson, you will use the same external table definition twice. You therefore must map the external file containing the data you want to use with the file name used in the external table definition.

To copy the actual files in your working directory, perform the following steps:

1.

Make sure the target file does not exist. Open a DOS prompt and navigate to the directory where your files are located and perform the following:

dir salesDelta.dat

If it exists, delete it.

 

2.

Copy the source file by performing the following:

copy salesDec00.dat salesDelta.dat

 

3.

You should now see the file:

dir salesDelta.dat


4. Select from the External Table

You can now access the data in the external file without any further action, as shown with the following SQL command:

1.

From a SQL*Plus session logged on to the SH schema, execute the following queries:

SELECT COUNT(*) FROM sales_delta_xt; 
SELECT MAX(time_id) FROM sales_delta_xt;        

If you copied the files correctly, the maximum TIME_ID is ‘30-DEC-00’.


5. Parallel Access of External Tables

Unlike SQL*Loader, the access of external tables can be done in parallel independent of the number of external files. SQL*Loader can only operate on a per-file bases; so that you have to split large source files manually if you want to parallelize. With external tables, the degree of parallelism is controlled exactly the same way than for a normal table. In our case, you defined the external table NOPARALLEL by default. The following section will show you how to control the degree of parallelism on statement level by using a hint.

1. The script parallel_select_from_ET.sql contains the SQL statements for the next three steps.

Run the following query, from a SQL*Plus session logged on to the SH schema, to see the current parallel session statistics:

SELECT * 
FROM   v$pq_sesstat 
WHERE  statistic in ('Queries Parallelized', 
                     'Allocation Height');
2.

Run the same query you did before to access the External Table with a parallel degree of 4, controlled with a hint. The select statement is:

select /*+ parallel(a,4) */ count(*) 
from sales_delta_XT a;

We’re selecting from our external table in parallel, although the external table points only to one input source file.

 

3. Now rerun the session stats again to see the differences. You will see the parallel session statistics have changed; it shows that the last query was parallelized, and it also show the degree of parallelism.
SELECT * 
FROM   v$pq_sesstat 
WHERE  statistic in 
       ('Queries Parallelized', 'Allocation Height');


6. Oracle9i’s New Parallel Insert Capabilities

Prior to Oracle9i, the direct path INSERT command for partitioned tables was parallelized on a per-partition base (Direct path insertion into non-partitioned tables or a single partition could be parallelized unrestricted since Oracle8i). Oracle9i lifts this restriction, providing unlimited parallel direct path INSERT capabilities within each partition.

Let’s first look how a serial plan looks like. Since none of the objects is defined in parallel, we will get serial execution unless we (a) either change the default degree of parallelism of one of the objects or (b) use a hint.

1.

To show the execution plan for SERIAL INSERT behavior, run show_serial_exec_plan.sql, or copy the following SQL statement into your SQL*Plus session:

@show_serial_exec_plan.sql

EXPLAIN PLAN FOR INSERT /*+ APPEND */ INTO sales
       (
       PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID,
       QUANTITY_SOLD, AMOUNT_SOLD
       )
       SELECT PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID,
              sum(QUANTITY_SOLD), sum(AMOUNT_SOLD)
       FROM   SALES_DELTA_XT
       GROUP BY prod_id,time_id,cust_id,channel_id,promo_id;

set linesize 120
select * from table(dbms_xplan.display); 

You can see that the 'LOAD AS SELECT' operation (shaded), representing the INSERT is done in serial.

 

2.

To show the PARALLEL INSERT execution plan you will need to run show_parallel_exec_plan.sql logged in to the SH schema. A parallel DML must always be the first statement of a transaction. Furthermore, a DML operation cannot run in the presence of primary keyforeign key constraints. We therefore have to disable the constraints prior to the parallel DML operation:

@show_parallel_exec_plan.sql

ALTER TABLE sales DISABLE CONSTRAINT sales_product_fk;
ALTER TABLE sales DISABLE CONSTRAINT sales_customer_fk;
ALTER TABLE sales DISABLE CONSTRAINT sales_time_fk;
ALTER TABLE sales DISABLE CONSTRAINT sales_channel_fk;
ALTER TABLE sales DISABLE CONSTRAINT sales_promo_fk;

COMMIT;

alter session enable parallel dml;

EXPLAIN PLAN FOR INSERT /*+ APPEND PARALLEL(SALES,4) */ INTO sales
   (
     PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID,
     QUANTITY_SOLD, AMOUNT_SOLD
   )
   SELECT /*+ parallel (sales_delta_XT,4) */
          PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID,
          sum(QUANTITY_SOLD), sum(AMOUNT_SOLD)
   FROM   SALES_DELTA_XT
   GROUP BY prod_id,time_id,cust_id,channel_id,promo_id;

set linesize 120
select * from table(dbms_xplan.display);

Unlike in the previous plan, the 'LOAD AS SELECT' operation is done in parallel.


7. Perform a Parallel Insert

In this step of the lesson you will do the parallel insert discussed before. Note that you not only select the data from the external table but also do an aggregate as part of the select prior to the insertion; something that is impossible with SQL*Loader only.

1.

You will now run the sql to perform a parallel insert. Run parallel_insert_file.sql or copy the following statement into your SQL*Plus session set timing on

@parallel_insert_file.sql

connect sh/sh@orcl.world

set timing on 
set echo on 

COLUMN statistic FORMAT a20

Rem ***** 
Rem PARALLEL INSERT 
Rem ***** 

INSERT /*+ APPEND PARALLEL(SALES,4) */ 
   INTO sales ( PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, 
                PROMO_ID, QUANTITY_SOLD, AMOUNT_SOLD ) 
   SELECT /*+ PARALLEL (sales_delta_XT,4) */ PROD_ID, CUST_ID, 
          TIME_ID, CHANNEL_ID, PROMO_ID, sum(QUANTITY_SOLD), 
          sum(AMOUNT_SOLD) 
   FROM   SALES_DELTA_XT 
   GROUP BY prod_id,time_id,cust_id,channel_id,promo_id;  

set timing off  

Rem   we can control the successful parallel DML 

SELECT * 
FROM   v$pq_sesstat 
WHERE statistic in ('DML Parallelized','Allocation Height'); 

Write down the execution time of this statement and compare it to the total amount of time you will need with SQL*Loader and a subsequent insertion. Note that we won’t see the full benefit of parallelizing the external table access and combining the transformation with the loading, since we’re accessing a very small amount of data in parallel on a single CPU machine with one disk.

2.

Row rollback. In the next lesson the same data is inserted using SQL*Loader. If you do not rollback now there will be duplicate rows in the table and other lessons will not work. You can also choose not to rollback and skip the SQL*Loader lesson.

Rollback;

3.

if you plan to SKIP the SQL*Loader section, you should COMMIT your transaction now:

COMMIT;

You can now continue with the section ROLLING WINDOW OPERATIONS

4. It is important that you either commit or rollback the operation before you can access the table again; otherwise you will get an Oracle error message.
5.

After issuing a commit or rollback, we will enable the constraints again.

@enable_cons.sql
          
ALTER TABLE sales 
   MODIFY CONSTRAINT sales_product_fk ENABLE NOVALIDATE; 
ALTER TABLE sales 
   MODIFY CONSTRAINT sales_customer_fk ENABLE NOVALIDATE; 
ALTER TABLE sales 
   MODIFY CONSTRAINT sales_time_fk ENABLE NOVALIDATE; 
ALTER TABLE sales 
   MODIFY CONSTRAINT sales_channel_fk ENABLE NOVALIDATE; 
ALTER TABLE sales 
   MODIFY CONSTRAINT sales_promo_fk ENABLE NOVALIDATE;

You have just loaded and transformed the data as one step. To accomplish the same functionality with SQL*Loader, you will need two process steps, which means more work effort and slower performance for the complete process.


Compare SQL*Loader to the Simple Loading and Transformation Process with External Tables

Before Oracle9i, you did the same operation by using SQL*Loader. The external table method you previously performed is the preferred method of data loading and transformation. However, to demonstrate the benefit of using external tables, you will perform the tasks to load and transform the data by using SQL*Loader.

To show how to load and transform data by using SQL*Loader, you will perform the following steps:

1.

Create a staging table.

2.

Load the data into the staging table by using SQL*Loader.

3. Load the staging table into the target database.
4. Drop the staging table.

1. Create the Staging Table

You need a staging table to load the data into so that you can transform it within the database in a second step.

1.

You will now run the sql to create a staging table:

create table sales_dec00 as
   select * 
   from   sales 
   where 1=0;

2. Load Data into the Staging Table by Using SQL*Loader

You will now load the data file using sales_dec00.ctl into the staging table by performing the following steps:

1.

You perform this operation from the OS command line, in the D:\wkdir directory..

sqlldr sh/sh@orcl.world control=sales_dec00.ctl direct=true



If your database alias is not orcl.world change the alias accordingly.
2.

Note that you cannot parallelize this task. Check the SQL*Loader log file, D:\wkdir\sales_dec00.log. Write down the execution time for the loading process.

Unlike with an External Table, space is consumed in the database for making the data accessible from within the database. The space consumed by the staging table is linearly dependent on the amount of data to be loaded for further transformation.

Also note that it is not reasonably possible to parallelize the loading with SQL*Loader without having several external files. You can use the SKIP option for several Loader processes, accessing the same file. However, this enforces every SQL*Loader process to scan the whole external file; this is detrimental to the overall performance.


3. Load the Staging Table into the Target Table

After loading the external data - making it accessible for the database - you can perform your transformation.

1.

You will now run the SQL to transform and insert the external data, which is already staged in the database, into the sales fact table by issuing @load_stage_table.sql

@load_stage_table.sql

Rem Load from staging table into target 

set timing on

INSERT /*+ APPEND PARALLEL(SALES,4) */ INTO sales
   ( PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID,
   QUANTITY_SOLD, AMOUNT_SOLD
   )
   SELECT /*+ parallel (sales_dec00,4) */
          PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID,
          sum(QUANTITY_SOLD), sum(AMOUNT_SOLD)
   FROM   sales_dec00
   GROUP BY prod_id, time_id, cust_id, channel_id, promo_id;
COMMIT;

Write down the execution time of this statement and add the time for the loading with SQL*Loader. Compare the total amount of time you will need with the time for the external table insertion. Note that we won’t see the full benefit of parallelizing the external table access and combining the transformation with the loading, since we’re accessing a very small amount of data in parallel on a single CPU machine with one disk.


Rolling Window Operations

Many data warehouses maintain a rolling window of data. For example, the data warehouse stores the most recent 12 months of sales data. Just as a new partition can be added to the sales table, an old partition can be quickly (and independently) removed from the sales table. Partitioning provides the ideal framework for those operations. The above two benefits (reduced resources utilization and minimal end user impact) are just as pertinent to removing a partition as they are to adding a partition.

Perform the Steps of the Rolling Windows Operation

To perform the steps of the rolling windows operation and learn about enhancements in Oracle9i, perform the following steps:

1.1 Rolling Window Part 1
Prepare a standalone table with the new data
   
1.2 Rolling Window Part 2
Add the new data to the fact table
   
1.3 Rolling Window Part 3a
Delete old data from the fact table
   
1.4 Rolling Window Part 3b
Instead of purging old data from the fact table you will use the table compression feature or Oracle9i Release 2 Enterprise Edition to reduce the space consumption of the old data
   
1.5 Oracle9i Enhancements for Global Index Maintenance
  Utilize the Oracle9i enhancement for Global Index Maintenance.

1.1 Prepare a standalone table with the new data

To perform the rolling window operation, you need to create and load a standalone table with the new data by performing the following steps. Note that you are going to use the external table you already defined, but pointing to a different external file:

1.

Create a link to the sales Q1 data.

2.

Create the table for the new sales Q1 data.

3. Load this table.
4. Create bitmap indexes for this table.
5. Create constraints for this table.

1. Change External File to the Sales Q1 Data

In this section, you use the external table you already defined. However, this time you will use a different external file, the sales Q1 data.

You rename the actual files by performing the following steps:

1.

Make sure the target file does not exist. Open a DOS prompt and navigate to the directory where your files are located (D:\wkdir) and perform the following:

dir salesDelta.dat

If it exists, delete it.

2.

Copy the source file by performing the following:

copy salesQ1_01.dat salesDelta.dat
3.

You should now see the file:

dir salesDelta.dat

4.

Let’s query our external table again and compare the number of records and MAX(time_id):

SELECT COUNT(*) FROM sales_delta_xt;
SELECT MAX(time_id) FROM sales_delta_xt;

Note that the number os rows as well as the maximum TIME_ID has changed. If your external table file is correct, the maximum TIME_ID is ‘30-MAR-2001’.


2. Create the Table for the New Sales Q1 Data

You will create an empty table for the new sales Q1 data. This table will be added to the already existing partitioned sales table later.

1.

Run the SQL to create the table:

create table sales_delta nologging as 
   select * 
   from   sales 
   where  1=0;


3. Load This Table

To load this table, you perform the following step:

1.

From a SQL*Plus session logged on to the SH schema, execute the following statement:

@load_stage_table2.sql

INSERT /*+ APPEND */ INTO sales_delta
   SELECT /*+ PARALLEL (SALES_DELTA_XT,4) */
          PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID,
          sum(QUANTITY_SOLD) quantity_sold,
          sum(AMOUNT_SOLD) amount_sold
   FROM   SALES_DELTA_XT
   GROUP BY prod_id,time_id,cust_id,channel_id,promo_id;

After loading the table sales_delta, we should gather statistics for this newly created table

 

.From a SQL*Plus session logged on to the SH schema, execute the following statement:

@gather_stat_stage_table.sql

exec dbms_stats.gather_table_stats('SH', 'sales_delta', 
   estimate_percent=>20);


4. Create Bitmap Indexes for This Table

Because you are going to exchange this standalone table with an empty partition of the sales table at a later point in time, you have to build exactly the same index structure as the existing SALES table to keep the local index structures of this particular table in a usable state after the exchange.

1.

From a SQL*Plus session logged on to the SH schema, execute the following statements:

@create_static_bitmap_index.sql

CREATE BITMAP INDEX sales_prod_local_bix
  ON sales_delta (prod_id)
  NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_cust_local_bix
  ON sales_delta (cust_id)
  NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_time_local_bix
  ON sales_delta (time_id)
  NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_channel_local_bix
   ON sales_delta (channel_id)
   NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_promo_local_bix
   ON sales_delta (promo_id)
   NOLOGGING COMPUTE STATISTICS ;

Note that the statistics for those indexes will be created as part of the index creation.


5. Create Constraints for This Table

The same is true for the existing constraints of the sales table.

1.

From a SQL*Plus session logged on to the SH schema, execute the following query:

@create_constraints.sql
set echo on
Rem *****
Rem CREATE CONSTRAINTS
Rem *****

ALTER TABLE sales_delta
  ADD ( CONSTRAINT sales_product_delta_fk
    FOREIGN KEY (prod_id) 
      REFERENCES products RELY ENABLE NOVALIDATE
    , CONSTRAINT sales_customer_delta_fk 
    FOREIGN KEY (cust_id)
      REFERENCES customers RELY ENABLE NOVALIDATE 
    , CONSTRAINT sales_time_delta_fk
    FOREIGN KEY (time_id)
      REFERENCES times RELY ENABLE NOVALIDATE
    , CONSTRAINT sales_channel_delta_fk
    FOREIGN KEY (channel_id)
      REFERENCES channels RELY ENABLE NOVALIDATE
    , CONSTRAINT sales_promo_delta_fk
    FOREIGN KEY (promo_id)
      REFERENCES promotions RELY ENABLE NOVALIDATE
   ) ;


1.2 Add the New Data to the Fact Table

The next task in performing a rolling window operation is to add the newly loaded and indexed data to the fact table. To do this, you perform the following steps:

1.

Create a new partition, if one does not already exist.

2.

Exchange the partition. This a data dictionary operation only and does not touch any data

3. Select from the partition to control the success.

1. Create a New Partition

You need to create a new, empty partition. You can either create the new partition with a distinct upper boundary or by choosing the keyword MAXVALUE. The latter one ensures that records violating the potential upper boundary condition won’t be rejected and the INSERT operation succeeds.

In our business scenario, we will issue a SPLIT PARTITION after the loading operation to identify any potential violations. All records violating our upper boundary will be ‘separated’ into an extra partition.

You need to create a new, empty partition. To do this, you perform the following step:

1.

From a SQL*Plus session logged on to the SH schema, execute the following query:

@create_partition_for_sales.sql

Rem *****
Rem current partitions
Rem *****
                 
select partition_name, high_value 
from   user_tab_partitions 
where  table_name='SALES'
order by partition_position;

Rem *****
Rem create ADDITIONAL PARTITION on sales
Rem *****

ALTER TABLE sales 
ADD PARTITION sales_q1_2001
VALUES LESS THAN (MAXVALUE);

Rem *****
Rem what is in the partition now?
Rem empty
Rem *****

SELECT COUNT(*) 
FROM   sales PARTITION (sales_q1_2001);


2. Exchange the Partition

You are now adding the newly loaded and indexed data to the real sales fact table by performing a PARTITION EXCHANGE command. Note that this is only a DDL command, which doesn't touch the actual data at all.. To do this, you perform the following step:

1.

From a SQL*Plus session logged on to the SH schema, execute the following query:

@exchange_partition_wo_gim.sql

set echo on

Rem *****
Rem EXCHANGE IT
Rem *****

ALTER TABLE sales EXCHANGE PARTITION sales_q1_2001 
   WITH TABLE sales_delta INCLUDING INDEXES;


3. Select from the Partition

Now you can select from the newly added and exchanged partition to experience how fast you inserted thousands of rows...

Note that, the more data you have to add to your partitioned fact table, the more time you’re saving with this metadata-only operation, and the more you will experience the benefit of minimal to zero user impact.

You need a logical partitioning such as RANGE Partitioning; HASH Partitioning cannot be used for the very common Rolling Window Operation.

Note that all indexes of the sales table are maintained and usable.

1.

From a SQL*Plus session logged on to the SH schema, execute the following queries. This will show you the number of rows in the exchanged partition and - now empty - standalone table:

SELECT COUNT(*) 
FROM   sales PARTITION (sales_q1_2001);

SELECT COUNT(*) 
FROM   sales_delta;          

 

Note that all local indexes of table sales are valid.

@show_sales_idx_status.sql

SELECT   ui.index_name,
         DECODE(uip.status,null,ui.status,uip.status) status, 
         count(*) num_of_part
FROM     user_ind_partitions uip, user_indexes ui
WHERE    ui.index_name=uip.index_name(+)
AND      ui.table_name='SALES'
GROUP BY ui.index_name, 
         DECODE(uip.status,null,ui.status,uip.status);

You can also use the ‘WITHOUT VALIDATION’ clause as part of the PARTITION EXCHANGE command. Oracle will then suppress the validity checking of the table which is going to be exchanged; otherwise Oracle guarantees that all values of the partition key match within the partition boundaries.

SPLIT THE MOST RECENT PARTITION TO ENSURE (BUSINESS) DATA INTEGRITY

As mentioned before, you decided to load the data into a partition with no fix upper boundary to avoid any potential errors. To identify any potential violation, you will the most recent partition, thus creating two partitions, one with a fix upper boundary (reflecting our business rule).

Prior to Oracle9i Release 2, the SPLIT PARTITION operation would have caused physical data movement and the creation of two new segments. With Oracle9i Release 2 introduces an enhanced fast split operation, where the RDBMS detects whether or not one of the two new partitions will be empty. If this is the case, Oracle will not create two new segments; it will only create one empty one, reflecting the new empty partition.

This enhancement is absolutely transparent. It improves the runtime of a SPLIT operation, saves system resources, and does not invalidate any existing local index structures.

1.

From a SQL*Plus session logged on to the SH schema, execute the following queries, @fast_split_sales.sql:

@fast_split_sales.sql

ALTER TABLE sales 
   SPLIT PARTITION sales_q1_2001 
      AT (TO_DATE('01-APR-2001', 'DD-MON-YYYY')) 
   INTO (PARTITION sales_q1_2001, 
         PARTITION sales_beyond_q1_2001);

PROMPT you should recognize that is was fast.
PROMPT Let's control the count in the most recent partition. Empty

SELECT COUNT(*) 
FROM   sales PARTITION (sales_beyond_q1_2001);

PROMPT now you can drop it

ALTER TABLE sales DROP PARTITION sales_beyond_q1_2001;

2.

Note that all local indexes of table sales are still valid.

@show_sales_idx_status.sql

SELECT   ui.index_name,
         DECODE(uip.status,null,ui.status,uip.status) status, 
         count(*) num_of_part
FROM     user_ind_partitions uip, user_indexes ui
WHERE    ui.index_name=uip.index_name(+)
AND      ui.table_name='SALES'
GROUP BY ui.index_name, 
         DECODE(uip.status,null,ui.status,uip.status);

 

1.3 Delete Old Data from the Fact Table

The next task to performing a Rolling Window Operation is to delete old data from the fact table. You only want to analyze the most recent data of the last 3 years. Therefore, since you added Q1-2001, you have to delete the data of Q1-1998.

Without Range Partitioning, you have to perform a DML operation against the table. With partitioning, you can leverage the PARTITION EXCHANGE command again to remove the data from the fact table. Like for the adding of new data, Hash Partitioning does not help you here either.

Note that you're not DELETING the data. Instead you are exchanging (logically replacing) the partition containing this data from the Sales fact table with an empty standalone table with the same logical structure. You can then archive this data or drop the exchanged partition, depending on your business needs. Alternatively, with Oracle9i Release 2, you can use the new feature table compression to reduce the consumed space for older, mostly read-only used, partitions. You will do this in the next step

1.

Create an empty standalone table.

2.

Create bitmap indexes for this table.

3. Create constraints for this table.
4. Show the data in the partition before the exchange.
5. Exchange the empty new table with the existing Q1-1998 partition.
6. Show the data in the partition after the exchange.

 

1. Create an Empty Standalone Table

You need to create an empty table in which to store the old 1998 data.

1.

From a SQL*Plus session logged on to the SH schema, execute the following statements:

DROP TABLE sales_old_q1_1998;
create table sales_old_q1_1998 nologging as 
   select * 
   from   sales 
   where  1=0;


2. Create Bitmap Indexes for This Table

Now create the local indexes.

1.

From a SQL*Plus session logged on to the SH schema, execute the following query:

@create_ndx.sql

set echo on

Rem *****
Rem CREATE LOCAL INDEXES
Rem *****

CREATE BITMAP INDEX sales_prod_old_bix
   ON sales_old_q1_1998 (prod_id)
   NOLOGGING COMPUTE STATISTICS ;

 CREATE BITMAP INDEX sales_cust_old_bix
   ON sales_old_q1_1998 (cust_id)
   NOLOGGING COMPUTE STATISTICS ;

CREATE BITMAP INDEX sales_time_old_bix
   ON sales_old_q1_1998 (time_id)
   NOLOGGING COMPUTE STATISTICS ;

CREATE BITMAP INDEX sales_channel_old_bix
   ON sales_old_q1_1998 (channel_id)
   NOLOGGING COMPUTE STATISTICS ;

CREATE BITMAP INDEX sales_promo_old_bix
   ON sales_old_q1_1998 (promo_id)
   NOLOGGING COMPUTE STATISTICS ;


3. Create Constraints for This Table

Now create the constraints.

1.

From a SQL*Plus session logged on to the SH schema, execute the following query:

@create_constraints_old.sql

set echo on 

Rem *****
Rem CREATE CONSTRAINTS
Rem *****

ALTER TABLE sales_old_q1_1998
  ADD ( CONSTRAINT sales_product_old_fk
        FOREIGN KEY (prod_id) 
        REFERENCES products RELY ENABLE NOVALIDATE
      , CONSTRAINT sales_customer_old_fk 
        FOREIGN KEY (cust_id)
        REFERENCES customers RELY ENABLE NOVALIDATE 
      , CONSTRAINT sales_time_old_fk
        FOREIGN KEY (time_id)
        REFERENCES times RELY ENABLE NOVALIDATE
      , CONSTRAINT sales_channel_old_fk
        FOREIGN KEY (channel_id)
        REFERENCES channels RELY ENABLE NOVALIDATE
      , CONSTRAINT sales_promo_old_fk
        FOREIGN KEY (promo_id)
        REFERENCES promotions RELY ENABLE NOVALIDATE
      ) ;


4. Show the Data in the Partition Before the Exchange

Before you perform the exchange, you will want to take a look at the 1998 Q1 data that will be aged out of the partition.

1.

From a SQL*Plus session logged on to the SH schema, execute the following query:

@show_partition.sql

set echo on

Rem *****
Rem show the actual content of the partition to be aged out 
Rem BEFORE exchange
Rem *****
  
SELECT COUNT(*) 
FROM   sales PARTITION (sales_q1_1998);


5. Exchange the Partition

You now need to exchange the empty table with the existing Q1-1998 partition. To do this, you perform the following step:

1.

From a SQL*Plus session logged on to the SH schema, execute the following SQL statement:

@exchange_old_partition.sql

ALTER TABLE sales 
   EXCHANGE PARTITION sales_q1_1998 
   WITH TABLE sales_old_q1_1998
   INCLUDING INDEXES;

Note that we could have used a DROP PARTITION instead. Table sales_old_q1_1998 now stores all the data of the first quarter of 1998. We could drop this table to remove the data entirely from the system.


6. Show the Data in the Partition After the Exchange

After you perform the exchange, you will want to take a look at the data in the partition.

1.

From a SQL*Plus session logged on to the SH schema, execute the following queries:

SELECT COUNT(*) 
FROM   sales PARTITION (sales_q1_1998);

SELECT COUNT(*) 
FROM   sales_old_q1_1998;
       

Unlike before the EXCHANGE command, the standalone table now stores thousands of rows whereas the equivalent partition of SALES is empty.

2.

Local Indexes are not affected.

@show_sales_idx_status.sql

SELECT   ui.index_name,
         DECODE(uip.status,null,ui.status,uip.status) status, 
         count(*) num_of_part
FROM     user_ind_partitions uip, user_indexes ui
WHERE    ui.index_name=uip.index_name(+)
AND      ui.table_name='SALES'
GROUP BY ui.index_name, 
         DECODE(uip.status,null,ui.status,uip.status);


Exchange Back

Because you are going to perform the same partition maintenance operations again, you need to reset the environment back to its original state. You do this by simply issuing the same partition exchange commands again:

1.

From a SQL*Plus session logged on to the SH schema, execute the following query:

@exchange_back_to_start.sql
       
set echo on 

Rem *****
Rem let's bring the data back in original state
Rem *****

Rem added partition data
      
SELECT COUNT(*) 
FROM   sales PARTITION (sales_q1_2001);
       
ALTER TABLE sales 
   EXCHANGE PARTITION sales_q1_2001 
   WITH TABLE sales_delta INCLUDING INDEXES;
       
SELECT COUNT(*) 
FROM   sales PARTITION (sales_q1_2001);
       
Rem deleted partition data
       
SELECT COUNT(*) 
FROM   sales PARTITION (sales_q1_1998);
       
ALTER TABLE sales 
   EXCHANGE PARTITION sales_q1_1998 
   WITH TABLE sales_old_q1_1998 
   INCLUDING INDEXES;
       
SELECT COUNT(*) 
FROM   sales PARTITION (sales_q1_1998);


1.4 SAVE SPACE AND COMPRESS OLD DATA IN THE FACT TABLE INSTEAD OF DELETING IT

Data stored in relational databases keep growing as a result of businesses requirements for more information. A big portion of the cost of keeping large amounts of data is in the cost of disk systems, and the resources utilized in managing that data. Oracle9i Release 2 Enterprise Edition introduces a unique way to deal with this cost by compressing data stored in relational tables with virtually no negative impact on query time against that data, thereby enabling substantial cost savings.

Commercially available relational database systems have not heavily utilized compression techniques on data stored in relational tables. One reason is that the trade-off between time and space for compression is not always attractive for relational databases. A typical compression technique may offer space savings, but only at a cost of much increased query time against the data. Furthermore, many of the standard techniques do not even guarantee that data size does not increase after compression.

Oracle9i Release 2 Enterprise Edition introduces a unique compression technique that is very attractive for large data warehouses. It is unique in many ways. Its reduction of disk space can be significantly higher than standard compression algorithms, because it is optimized for relational data. It has virtually no negative impact on the performance of queries against compressed data; in fact, it may have a significant positive impact on queries accessing large amounts of data, as well as on data management operations like backup and recovery. It ensures that compressed data is never larger than uncompressed data.

CREATE A STANDALONE TABLE USING THE TABLE COMPRESSION FEATURE

You will now run the SQL to create the table:

1.

You will create a compressed table for the first six months of the 1998 data. To set up the table including its constraints, you can run create_comp_h1_1998_1.sql.

@create_comp_h1_1998_1.sql

DROP TABLE sales_1_1998_compress;

CREATE TABLE sales_1_1998_compress COMPRESS
AS SELECT * 
   FROM   sales 
   WHERE  time_id >= TO_DATE('01-JAN-1998','dd-mon-yyyy') 
   AND    time_id < TO_DATE('01-JUN-1998','dd-mon-yyyy');
   
exec dbms_stats.gather_table_stats('SH','sales_1_1998_compress', 
                                   estimate_percent=>10);

Let’s now compare the compression ratio for this newly created compressed table:

1.

From a SQL*Plus session logged on to the SH schema, execute the following statements:

@compression_rate1.sql

 select segment_name||
        decode(partition_name,
               null,
               ': compressed table',
               ': partitions SALES_Q1_1998,SALES_Q2_1998') what, 
        sum(bytes) 
 from   user_segments 
 where  segment_name ='SALES_1_1998_COMPRESS' 
 or     ( segment_name='SALES' and 
          partition_name in ('SALES_Q1_1998','SALES_Q2_1998'))
 and    segment_type in('TABLE','TABLE PARTITION')
 group by segment_name||decode(partition_name,
                        null,
                        ': compressed table',
                        ': partitions SALES_Q1_1998,SALES_Q2_1998');

Most likely the compression ratio with real world data will be higher than the one experienced with the sales history schema. The data in the fact table SALES is artificially generated and does not show the typical ‘natural sorting’ you will find in any data warehouse environment where the data was cleansed, consolidated, or even aggregated prior to its INSERT.

CREATE CONSTRAINTS FOR THIS TABLE

You will now create the same constraints than the ones for table SALES.

1.

From a SQL*Plus session logged on to the SH schema, execute the following statement:

@create_compress_constr.sql

ALTER TABLE sales_1_1998_compress
   ADD ( CONSTRAINT sales_product_fk_compress
         FOREIGN KEY (prod_id)
         REFERENCES products RELY ENABLE NOVALIDATE
        , CONSTRAINT sales_customer_fk_compress
          FOREIGN KEY (cust_id)
          REFERENCES customers RELY ENABLE NOVALIDATE
        , CONSTRAINT sales_time_fk_compress
          FOREIGN KEY (time_id)
          REFERENCES times RELY ENABLE NOVALIDATE
        , CONSTRAINT sales_channel_fk_compress
          FOREIGN KEY (channel_id)
          REFERENCES channels RELY ENABLE NOVALIDATE
        , CONSTRAINT sales_promo_fk_compress
          FOREIGN KEY (promo_id)
          REFERENCES promotions RELY ENABLE NOVALIDATE
        ) ;

Since we merged two current partitions when creating the compressed table sales_1_1998_compress, we have to drop one of them prior to the EXCHANGE command to extend the range window for the existing partition to macth the range window of the compressed standalone table.

1.

From a SQL*Plus session logged on to the SH schema, execute the following statement:

@prep_exchange_comp1.sql

PROMPT CURRENT PARTITIONS
Rem *****

SELECT   partition_name, high_value
FROM     user_tab_partitions
WHERE    table_name='SALES'
ORDER BY partition_position;

REM since we compressed data and merged two partitions, we have
REM to drop partition SALES_Q1_1998; this will bring down the 
REM lower boundary of the 'new' sales_q2_1998 to the upper 
REM boundary of sales_2_1997

ALTER TABLE sales DROP PARTITION sales_q1_1998;

PROMPT CURRENT PARTITIONS

SELECT   partition_name, high_value
FROM     user_tab_partitions
WHERE    table_name='SALES'
ORDER BY partition_position;

PREPARE A PARTITIONED TABLE FOR GETTING ITS FIRST COMPRESSED PARTITION

When storing some or all partitions of a partitioned table in a compressed format, you have to have at least one compressed partition prior to the creation of bitmap index structures. If you have already existent bitmap index structures, you have to mark them unusable – or drop them - prior to adding a compressed partition and to rebuild them afterwards. This is only true for the very first time when a compressed partition is added to an existing partitioned table that has bitmap indexes. If you plan to store some or all partitions in a compressed manner, you should create the partitioned table already with data compression enabled.

The steps for adding a compressed partition to our existing uncompressed partitioned table with bitmap indexes are:

1.

Create the new compressed standalone table containing data

2. Drop all existing bitmap indexes of the target partitioned table
3. Issue your partition maintenance operation that involves a compressed partition
1. Create the new compressed standalone table containing data, this step has been completed, sales_data.
2.

Drop the existing bitmap index structures. From a SQL*Plus session logged on to the SH schema, execute the following statement:

@drop_bix_for_sales.sql

Rem usable bitmap indexes on a partitioned table before we can
Rem add a compressed partition
Rem ORA-14646: Specified alter table operation involving 
Rem compression cannot be performed in the presence of usable
Rem bitmap indexes

DROP INDEX sales_channel_bix ;
DROP INDEX sales_cust_bix ;
DROP INDEX sales_prod_bix ;
DROP INDEX sales_promo_bix ;
DROP INDEX sales_time_bix ;

3.

You can now exchange the compressed table with the existing uncompressed partition.

In this case, exchange the former uncompressed partition with the compressed standalone table, having the same logical structure. This could also be an ALTER TABLE MOVE PARTITION command. We will demonstrate an ALTER TABLE MERGE command later

From a SQL*Plus session logged on to the SH schema, execute the following statement:

@exchange_comp_part.sql

REM now exchange the newly created and compressed 
REM partition back ...

ALTER TABLE sales 
   EXCHANGE PARTITION sales_q2_1998 
   WITH TABLE sales_1_1998_compress;

REM rename it

ALTER TABLE sales RENAME 
   PARTITION sales_q2_1998 
   TO sales_H1_1998_compress;

REM  drop exchanged table

DROP TABLE sales_1_1998_compress;

The data dictionary shows the partitions having the COMPRESS attribute set:

1.

From a SQL*Plus session logged on to the SH schema, execute the following statement:

@show_comp_parts.sql

PROMPT control the compress attribute for the partitioned table

SELECT partition_name, compression 
FROM   user_tab_partitions 
WHERE  table_name='SALES' 
ORDER BY partition_position;

Let’s now recreate the bitmap indexes for Sales:

1.

From a SQL*Plus session logged on to the SH schema, execute the following statement:

@recreate_bix_for_sales.sql

PROMPT recreate indexes for table SALES
CREATE BITMAP INDEX sales_prod_bix
   ON sales (prod_id) LOCAL
   NOLOGGING COMPUTE STATISTICS ;

CREATE BITMAP INDEX sales_cust_bix
   ON sales (cust_id) LOCAL
   NOLOGGING COMPUTE STATISTICS ;