High-Speed Data Loading and Rolling Window Operations with Partitioning
In this tutorial, you learn how to use Oracle10g for high-speed data loading and leveraging Oracle Partitioning for a rolling window operation.
Approximately 2 hours
This tutorial covers the following topics:
Place the cursor over this icon to load and view all the
screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously,
so response time may be slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.
Most of the time, OLTP (Source) systems, which feed the Data Warehouse, are not directly connected to the data warehousing system for extracting new data. Commonly, those OLTP systems send data feeds in the form of external files. This data must be loaded into the Data Warehouse, preferably in parallel, thus leveraging the existing resources.
For example, due to the business needs and disk space constraints of the sample company MyCompany, only the data of the last three years are relevant for the analysis needs. This means that with the insertion of new data, disk space has to be freed, either by purging the old data or by leveraging Oracle's table compression. The maintenance of this so-called rolling window operation is done with Oracle Partitioning.
Before starting this tutorial, you should:
| 1. |
Perform the Installing Oracle Database 10g on Windows tutorial. |
| 2. |
Download and unzip etl.zip into your working directory (i.e. c:\wkdir). |
To load the external files into their data warehouse, MyCompany uses the Oracle10g external table feature, which allows external data, such as flat files, to be exposed within the database just like a regular database table. External tables can be accessed by SQL, so that external files can be queried directly and in parallel using the full power of SQL, PL/SQL, and Java. External tables are often used in the Extraction, Transformation, and Loading (ETL) process to combine data-transformations (through SQL) with data-loading into a single step. External tables are a very powerful feature with many possible applications in ETL and other database environments where flat files are processed. It is an alternative to using SQL*Loader.
Parallel execution dramatically reduces response time for data-intensive operations on large databases typically associated with decision support systems (DSS) and data warehouses. You can also implement parallel execution on certain types of online transaction processing (OLTP) and hybrid systems. Simply expressed, parallelism is the idea of breaking down a task so that, instead of one process doing all of the work in a query, many processes do parts of the work at the same time. For example, parallel execution can be used when four processes handle four different quarters in a year instead of one process handling all four quarters by itself.
A very important task in the back office of a data warehouse is to keep the data synchronized with the various changes that are taking place in the OLTP (source) systems. In addition, the life span of the data from an analysis perspective is very often limited, so that older data must be purged from the target system while new data is loaded; this operation is often called a rolling window operation. This should be done as fast as possible without any implication for the concurrent online access of the data warehousing system.
Before starting the tasks for this OBE, you need to implement some changes on the existing Sales History schema. Additional objects are necessary, and additional system privileges must be granted to the user SH. The SQL file for applying those changes is modifySH_10gR2.sql . To use the setup files for the Data Warehousing tutorials, perform the following steps:
| 1. |
Start a SQL *Plus session. Select Start > Programs > Oracle-OraDB10g_home > Application Development > SQL Plus. (Note: This tutorial assumes you have an c:\wkdir folder. If you do not, you will need to create one and unzip the contents of etl.zip into this folder. While executing the scripts, paths are specified)
|
| 2. |
Log in as the SH user. Enter SH as the User Name and SH as the Password. Then click OK.
|
| 3. |
Run the modifySH_10gR2.sql script from your SQL*Plus session. @c:\wkdir\modifySH_10gR2.sql The bottom of your output should match the image below.
|
In this section of the tutorial, you load data into the Data Warehouse using External Tables.
To show you how external tables can be created and used, perform the following steps:
| 1. | |
| 2. | Create the external table. |
| 3. | Select from the external table. |
| 4. | Provide transparent high-speed parallel access of external tables. |
| 5. | Review Oracle's parallel insert capabilities. |
| 6. |
1. Create the Necessary Directory Objects
Before you can create the external table, you need to create a directory object in the database that points to the directory on the file system where the data files will reside. Optionally, you can separate the location for the log, bad and discard files from the location of the data files. To create the directory, perform the following steps:
|
From a SQL*Plus session logged on to the SH schema, run create_directory.sql, or copy the following SQL statement into your SQL*Plus session: @create_directory.sql DROP DIRECTORY data_dir; DROP DIRECTORY log_dir; CREATE DIRECTORY data_dir AS 'c:\wkdir';
The scripts are set up for a Windows system and assume
that the Hands-On workshop was
|
When creating an external table, you are defining two parts of information:
| 1. |
The metadata information for the table representation inside the database |
| 2. | The HOW access parameter definition to extract the data from the external file |
After the creation of this meta information, the external data can be accessed from within the database, without the necessity of an initial load.
To create the external table, you perform the following steps:
|
From a SQL*Plus session logged on to the SH schema, run create_external_table.sql. The results are as follows: @c:\wkdir\create_external_table REM 10gR1 CREATE TABLE sales_delta_XT
(
PROD_ID NUMBER,
CUST_ID NUMBER,
TIME_ID DATE,
CHANNEL_ID CHAR(2),
PROMO_ID NUMBER,
QUANTITY_SOLD NUMBER(3),
AMOUNT_SOLD NUMBER(10,2)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY data_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
BADFILE log_dir:'sh_sales.bad'
LOGFILE log_dir:'sh_sales.log_xt'
FIELDS TERMINATED BY "|" LDRTRIM
(prod_id, cust_id,
time_id CHAR(11) DATE_FORMAT DATE MASK "DD-MON-YYYY",
channel_id, promo_id, quantity_sold, amount_sold
)
)
location
(
'salesDec01.dat'
)
)REJECT LIMIT UNLIMITED NOPARALLEL
You can view information about external tables through the following data dictionary views: - [USER | ALL| DBA]_EXTERNAL_TABLES
- [ALL| DBA]_DIRECTORIES
- [USER | ALL| DBA]_EXTERNAL_LOCATIONS
|
3. Select From the External Table
You can now access the data in the external file without any further action, as shown with the following SQL command:
|
From a SQL*Plus session logged on to the SH schema, execute the following queries (Note that you can run the select_et.sql script file.): @c:\wkdir\select_et.sql SELECT COUNT(*) FROM sales_delta_xt; SELECT MAX(time_id) FROM sales_delta_xt; If you copied the files correctly, the maximum TIME_ID is the last day of December, 2001.
|
4. Provide transparent high-speed parallel access of external tables
Unlike SQL*Loader, the access of external tables can be done in parallel - independent of the number of external files. SQL*Loader can only operate on a per-file bases. This means that you have to split large source files manually if you want to parallelize. With external tables, the degree of parallelism is controlled in exactly the same way as it is for a normal table. In this case, you define the external table NOPARALLEL by default. The following section shows you how to control the degree of parallelism on a statement level by using a hint.
| 1. |
The script parallel_select_from_ET.sql contains the SQL statements for the next three steps. Run the following query, from a SQL*Plus session logged on to the SH schema, to see the current parallel session statistics: @c:\wkdir\parallel_select_from_ET.sql
SELECT *
FROM v$pq_sesstat
WHERE statistic in ('Queries Parallelized',
'Allocation Height');
|
| 2. |
Run the same query you used before to access the external table with a parallel degree of 4, controlled with a hint. The select statement is: @c:\wkdir\parallel_select_from_ET_2.sql SELECT /*+ parallel(a,4) */ COUNT(*) FROM sales_delta_XT a; You are selecting from the external table in parallel, although the external table points only to one input source file. file. Alternatively, you could change the PARALLEL property of the external table with an ALTER TABLE command: rem ALTER TABLE sales_delta_XT PARALLEL 4;
|
| 3. |
View the session statistics again to see the differences. You see the parallel session statistics have changed. It shows that the last query was parallelized, and it also show the degree of parallelism. @c:\wkdir\parallel_select_from_ET.sql
SELECT *
FROM v$pq_sesstat
WHERE statistic in
('Queries Parallelized', 'Allocation Height');
|
5. Review Oracle's Parallel Insert Capabilities
Oracle10g provides unlimited parallel direct path INSERT capabilities within each partition. The execution plan can be used to determine whether or not the INSERT will be done in parallel. Alternatively you can check the execution plan of an operation after it was running right out of the SQL cache without the necessity of an explain plan command at all.
Below, examine what a serial plan looks like. Since none of the objects is defined in parallel, you automatically have serial execution unless you either change the default degree of parallelism of one of the objects or use a hint.
| 1. |
To show the execution plan for SERIAL INSERT behavior, run show_serial_exec_plan.sql, or copy the following SQL statement into your SQL*Plus session: @c:\wkdir\show_serial_exec_plan.sql Rem 10gR1 EXPLAIN PLAN FOR INSERT /*+ APPEND */ INTO sales
(
PROD_ID,
CUST_ID,
TIME_ID,
CHANNEL_ID,
PROMO_ID,
QUANTITY_SOLD,
AMOUNT_SOLD
)
SELECT
PROD_ID,
CUST_ID,
TIME_ID,
case CHANNEL_ID
when 'S' then 3
when 'T' then 9
when 'C' then 5
when 'I' then 4
when 'P' then 2
else 99
end,
PROMO_ID,
sum(QUANTITY_SOLD),
sum(AMOUNT_SOLD)
FROM SALES_DELTA_XT
GROUP BY 1, prod_id,time_id,cust_id,channel_id,promo_id;
set linesize 140 set pagesize 40 SELECT * FROM TABLE(dbms_xplan.display); |
| 2. |
To show the PARALLEL INSERT execution plan, you need to run show_parallel_exec_plan.sql logged in to the SH schema. A parallel DML must always be the first statement of a transaction. Furthermore, a DML operation cannot run in the presence of primary key foreign key constraints. Therefore, you have to disable the constraints prior to the parallel DML operation: @c:\wkdir\show_parallel_exec_plan.sql Rem 10gR1 COMMIT; ALTER SESSION ENABLE PARALLEL DML; Rem ***** PROMPT SHOW PARALLEL EXECUTION PLAN Rem ***** EXPLAIN PLAN FOR INSERT /*+ APPEND PARALLEL(SALES,4) */ INTO sales ( PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID, QUANTITY_SOLD, AMOUNT_SOLD ) SELECT /*+ parallel (sales_delta_XT,4) */ PROD_ID, CUST_ID, TIME_ID, case CHANNEL_ID when 'S' then 3 when 'T' then 9 when 'C' then 5 when 'I' then 4 when 'P' then 2 else 99 end, PROMO_ID, sum(QUANTITY_SOLD), sum(AMOUNT_SOLD) FROM SALES_DELTA_XT GROUP BY 1, prod_id,time_id,cust_id,channel_id,promo_id; set linesize 140 set pagesize 40 SELECT * FROM TABLE(dbms_xplan.display);
|
In this step of the tutorial you execute the parallel insert discussed before. Note that you not only select the data from the external table but also do an aggregate as part of the select prior to the insertion. You are combining a transformation with the actual loading process. This cannot be accomplished with the SQL*Loader utility only.
| 1. |
Run SQL to perform a parallel insert. Run parallel_insert_file.sql or copy the following statements into your SQL*Plus session set timing on. @c:\wkdir\parallel_insert_file.sql Rem 10gR1 ALTER SESSION ENABLE PARALLEL DML; Rem ***** Write down the execution time of this statement and compare it to the total amount of time you need with SQL*Loader and a subsequent insertion. Note that you do not see the full benefit of parallelizing the external table access and combining the transformation with the loading, because you are accessing a very small amount of data in parallel on a single CPU machine using one disk.
|
| 2. |
Perform a row rollback. In the next tutorial the same data is inserted using SQL*Loader. ROLLBACK; |
| 3. |
After issuing a rollback, you need to enable the constraints again. @c:\wkdir\enable_cons.sql
ALTER TABLE sales
MODIFY CONSTRAINT sales_product_fk ENABLE NOVALIDATE;
ALTER TABLE sales
MODIFY CONSTRAINT sales_customer_fk ENABLE NOVALIDATE;
ALTER TABLE sales
MODIFY CONSTRAINT sales_time_fk ENABLE NOVALIDATE;
ALTER TABLE sales
MODIFY CONSTRAINT sales_channel_fk ENABLE NOVALIDATE;
ALTER TABLE sales
MODIFY CONSTRAINT sales_promo_fk ENABLE NOVALIDATE;
|
Before Oracle9i, you did the same operation by using SQL*Loader. The external table method you previously performed is the preferred method of data loading and transformation. However, to demonstrate the benefit of using external tables, you can perform the tasks to load and transform the data by using SQL*Loader.
To show how to load and transform data by using SQL*Loader, perform the following steps:
| 1. | |
| 2. | |
| 3. | Load the staging table into the target database. |
| 4. | Drop the staging table. |
You need a staging table to load the data into so that you can transform it within the database in a second step.
|
Run the sql script create_stage.sql to create a staging table: @c:\wkdir\create_stage.sql CREATE TABLE sales_dec01 AS |
Please note: The scripts are set up for a Windows system and assume that the Hands-On workshop was extracted on drive C:\.
Load the data file from the sales_dec01.ctl file into the staging table by performing the following steps:
| 1. |
Execute the following command from the OS command line. cd wkdir sqlldr sh/sh control=sales_dec01.ctl direct=true Note: You may need to specify your database alias when connecting using SQL*Loader. Start SQL*Loader with the statement: sqlldr sh/sh@<database alias> control=sales_dec01.ctl direct=true |
| 2. |
Note that you cannot parallelize this task. Check the SQL*Loader log file sales_dec01.log and write down the execution time for the loading process. You can check the sales_dec01.log file using any editor. The file is located in the c:\wkdir folder.
Unlike with an External Table, space is consumed in the database for making the data accessible from within the database. The space consumed by the staging table is linearly dependent on the amount of data to be loaded for further transformation. Also note that it is not reasonably possible to parallelize the loading with SQL*Loader without having several external files. You can use the SKIP option for several Loader processes, accessing the same file. However, this enforces every SQL*Loader process to scan the whole external file. This is detrimental to the overall performance. The information about space usage of an object can be accessed through the following data dictionary views:[USER | ALL| DBA]_SEGMENTS [USER | ALL| DBA]_EXTENTS |
After loading the external data —making it accessible for the database, you can perform your transformation.
|
Run SQL to transform and insert the external data, which is already staged in the database, into the SALES fact table by executing the load_stage_table.sql script. @c:\wkdir\load_stage_table.sql REM 10gR1
|
You can now drop or truncate the staging table to free its consumed space.
Drop the staging table information. You can execute the drop_sales_dec01.sql script. @c:\wkdir\drop_sales_dec01.sql DROP TABLE sales_dec01; |
Processing this simple loading and transformation process
with external tables enabled you to combine the loading with the transformation,
which simplifies and speeds up the process.
Furthermore, an initial staging of the data in the database is not necessary with external tables, thus saving space consumption. The larger the external data volume, the more you save in staging space and processing time with external tables compared to SQL*Loader.
After successfully loading the December data into the Q4 partition of your SALES fact table, this partition will encounter none to minimal further DML operations. This makes the table partition an optimal candidate for being stored using Oracle's table compression functionality that was introduced in Oracle9i Release 2. Data stored in relational databases keeps growing as businesses require more information. A big portion of the cost of keeping large amounts of data is in the cost of disk systems, and the resources utilized in managing that data.The Oracle database enables a unique way to deal with this cost by compressing data stored in relational tables with virtually no negative impact on query time against that data, which results in substantial cost savings.
Commercially available relational database systems have not heavily utilized compression techniques on data stored in relational tables. One reason is that the trade-off between time and space for compression is not always attractive for relational databases. A typical compression technique may offer space savings, but only at a cost of much increased query time against the data. Furthermore, many of the standard techniques do not even guarantee that data size does not increase after compression.
Oracle Database 10g Enterprise Edition offers a unique compression technique that is very attractive for large data warehouses. It is unique in many ways. Its reduction of disk space can be significantly higher than standard compression algorithms, because it is optimized for relational data. It has virtually no negative impact on the performance of queries against compressed data; in fact, it may have a significant positive impact on queries accessing large amounts of data, as well as on data management operations like backup and recovery. It ensures that compressed data is never larger than uncompressed data.
To measure the tremendous benefits of table compression, first ensure that the most recent partition does not have compression enabled and how large it is.
| 1. | Run the part_before_compression.sql script or by copying the following SQL into your SQL*Plus session: @c:\wkdir\part_before_compression.sql PROMPT Space consumption before compression COLUMN partition_name FORMAT a50 |
| 2. |
Now compress the partition and transparently maintain all existing indexes. All local and global indexes will be maintained as part of this SQL statement. The functionality of online index maintenance for Partition Maintenance Operation is discussed later in this tutorial. Note that compressing a partition is not an in-place compression. You create a new compressed segment and remove the old uncompressed one at the end of the operation. From a SQL*Plus session logged on to the SH schema, execute the following statements: @c:\wkdir\compress_salesQ4_2001.sql PROMPT now compress the partition including index maintenance (details later))
|
| 3. | See how much space the new compressed partition is allocating and compare it to the size of the uncompressed partition: @c:\wkdir\part_after_compression.sql PROMPT Space consumption AFTER compression Most likely the compression ratio with real-world data will be higher
than the one experienced with the sales history schema. Compression ratios
reported from customers are in an average a factor of 2 to 5. This means
that a table that formerly allocated 500 GB of disk space is now reduced
to a size of 100 GB for a compression factor of five. |
Many data warehouses maintain a rolling window of data. For example, the data warehouse stores the data from the most recent 12 months of sales. Just as a new partition can be added to the SALES table, an old partition can be quickly (and independently) removed from the SALES table. Partitioning provides the ideal framework for those operations. The two benefits (reduced resources utilization and minimal end user impact) are just as pertinent to removing a partition as they are to adding a partition.
To perform the rolling windows operation perform the following steps:
| 1. | |
| 2. | |
| 3. | Delete old data from the fact table |
To perform the rolling window operation, you need to create and load a stand-alone table with the new data by performing the following steps. Note that you are going to use the external table you already defined, but point it to a different external file:
| 1.1 | |
| 1.2 | |
| 1.3 | Load this table. |
| 1.4 | |
| 1.5 |
In this section, you use the external table you already defined. However, this time you use a different external file, sales_Q1_data. So, you have to modify the location attribute of the external table to point to the new data file.
| 1. |
First, check the number of rows in the current external table. (Note that you can run the select_et.sql script file.) @c:\wkdir\select_et.sql SELECT COUNT(*) FROM sales_delta_xt; SELECT MAX(time_id) FROM sales_delta_xt;
The file, containing all sales transactions for December 2001, will show the value for the last day of December, 2001. You see that the number of rows and the MAX(time_id) will be different after changing the external file on OS level.
|
| 2. |
Change the LOCATION attribute. Run the alter_loc_attrib.sql to change the LOCATION attribute: @c:\wkdir\alter_loc_attrib.sql ALTER TABLE sales_delta_xt location ( 'salesQ1.dat' ); Then check the new data. Run the select_et.sql to check the new data: @c:\wkdir\select_et.sql
SELECT COUNT(*) FROM sales_delta_xt;
SELECT MAX(time_id) FROM sales_delta_xt;
The number of rows as well as the maximum TIME_ID
have changed. If your external
|
You create an empty table for the new sales Q1 data. This table will be added to the already existing partitioned SALES table later.
Run the script create_stage_table.sql to create the table: @c:\wkdir\create_stage_table.sql |
To load this table, you perform the following steps:
| 1. |
From a SQL*Plus session logged on to the SH schema, run the script load_stage_tables.sql to load the table: @c:\wkdir\load_stage_table2.sql INSERT /*+ APPEND */ INTO sales_delta After loading the SALES_DELTA table, gather statistics for this newly created table.
|
| 2. |
From a SQL*Plus session logged on to the SH schema, run the script gather_stat_stage_table.sql to gather statistics about the table: @c:\wkdir\gather_stat_stage_table.sql Rem gather statistics for the table
|
Because you are going to exchange this stand-alone table with an empty partition of the SALES table at a later point in time, you have to build exactly the same index structure as the existing SALES table to keep the local index structures of this particular table in a usable state after the exchange.
| 1. |
Before creating any bitmap indexes, you need to modify the newly created table into a compressed table without actually compressing any data. This operation is necessary to create bitmap indexes that are valid to be exchanged into the partitioned table that already contains compressed partitions. Run the script alter_sales_delta.sql to modify the table. @c:\wkdir\alter_sales_delta.sql ALTER TABLE sales_delta COMPRESS; ALTER TABLE sales_delta NOCOMPRESS; |
| 2. |
From a SQL*Plus session logged on to the SH schema, run the script create_static_bitmap_index.sql to create the bitmapped indexes on the SALES_DELTA table. @c:\wkdir\create_static_bitmap_index.sql CREATE BITMAP INDEX sales_prod_local_bix ON sales_delta (prod_id) NOLOGGING COMPUTE STATISTICS ; CREATE BITMAP INDEX sales_cust_local_bix ON sales_delta (cust_id) NOLOGGING COMPUTE STATISTICS ; CREATE BITMAP INDEX sales_time_local_bix ON sales_delta (time_id) NOLOGGING COMPUTE STATISTICS ; CREATE BITMAP INDEX sales_channel_local_bix ON sales_delta (channel_id) NOLOGGING COMPUTE STATISTICS ; CREATE BITMAP INDEX sales_promo_local_bix ON sales_delta (promo_id) NOLOGGING COMPUTE STATISTICS ; Note that the statistics for those indexes will be created as part of the index creation.
|
The same is true for the existing constraints of the SALES table.
|
From a SQL*Plus session logged on to the SH schema, run the script create_constraints.sql to modify the constraints on the SALES table: @c:\wkdir\create_constraints.sql set echo on Rem ***** ALTER TABLE channels MODIFY CONSTRAINT CHANNELS_PK RELY;
|
The next task in performing a rolling window operation is to add the newly loaded and indexed data to the fact table. To do this, you perform the following steps:
You need to create a new, empty partition. You can either create the new partition with a distinct upper boundary or by choosing the keyword MAXVALUE. The latter option ensures that records violating the potential upper boundary condition are not rejected and the INSERT operation succeeds.
In this business scenario, you issue a SPLIT PARTITION after the loading operation to identify any potential violations. All records violating the upper boundary will be "separated" into an extra partition.
You need to create a new, empty partition. To do this, perform the following step:
|
From a SQL*Plus session logged on to the SH schema, run the script create_partition_for_sales_etl.sql to add a partition to the SALES table. @c:\wkdir\create_partition_for_sales_etl.sql
Rem *****
Rem current partitions
Rem *****
COLUMN partition_name FORMAT a20
select partition_name, high_value
from user_tab_partitions
where table_name='SALES'
order by partition_position;
Rem *****
Rem create ADDITIONAL PARTITION on sales
Rem *****
ALTER TABLE sales
ADD PARTITION sales_q1_2002
VALUES LESS THAN (MAXVALUE);
Rem *****
Rem what is in the partition now?
Rem empty
Rem *****
SELECT COUNT(*)
FROM sales PARTITION (sales_q1_2002);
|
You are now ready to add the newly loaded and indexed data to the real SALES fact table by performing a PARTITION EXCHANGE command. Note that this is only a DDL command, which does not touch the actual data at all. To do this, you perform the following step:
|
From a SQL*Plus session logged on to the SH schema, run the script exchange_partition_wo_gim.sql to alter the SALES table, enabling the partition to be exchanged: @c:\wkdir\exchange_partition_wo_gim.sql Rem ***** Rem EXCHANGE IT Rem ***** ALTER TABLE sales EXCHANGE PARTITION sales_q1_2002
|
Now you can select from the newly added and exchanged partition to experience how fast you inserted thousands of rows.
Note that the more data you have to add to your partitioned fact table, the more time you are saving with this metadata-only operation, and the more you will experience the benefit of minimal to zero user impact.
You need a logical partitioning such as RANGE Partitioning. HASH Partitioning cannot be used for the very common Rolling Window Operation.
All indexes of the SALES table are maintained and usable.
| 1. |
From a SQL*Plus session logged on to the SH schema, execute the following queries. This will show you the number of rows in the exchanged partition and the stand-alone table (which is now empty). Run the select_count.sql script. @c:\wkdir\select_count.sql SELECT COUNT(*) Have you ever inserted that many rows that quickly?
|
| 2. |
Note that all local indexes of the SALES table are valid. Run the script show_sales_idx_status.sql to view the status of the indexes: @c:\wkdir\show_sales_idx_status.sql PROMPT INDEXES ARE MAINTAINED
You can also use the WITHOUT VALIDATION clause as part of the PARTITION EXCHANGE command. This causes the Oracle database to suppress the validity checking of the table that is going to be exchanged. Otherwise, the Oracle database guarantees that all values of the partition key match within the partition boundaries. |
As mentioned before, you decided to load the data into a partition with no fixed upper boundary to avoid any potential errors. To identify any potential violation, you split the most recent partition, thus creating two partitions, one with a fixed upper boundary.
Beginning with Oracle9i Release 2, Oracle is using an enhanced fast-split operation, where the RDBMS detects whether or not one of the two new partitions after a SPLIT operation will be empty. If this is the case, the Oracle database will not create two new segments. It will only create the one by using DBMS_STATS on the new empty partition, and it will use the existing segment as the new partition containing all the data.
This optimization is absolutely transparent. It improves the run time of a SPLIT operation, saves system resources, and does not require any index maintenance.
| 1. |
From a SQL*Plus session logged on to the SH schema, run the script fast_split.sql to alter the SALES table, and view the indexes status. @c:\wkdir\fast_split_sales.sql ALTER TABLE sales SPLIT PARTITION sales_q1_2002 PROMPT Let's control the count in the most recent partition. should be empty SELECT COUNT(*) FROM sales PARTITION (sales_beyond_q1_2002); PROMPT INDEXES ARE MAINTAINED Rem since no data is moved with the SPLIT operation, the indexes are still valid SELECT ui.index_name, DECODE(uip.status,null,ui.status,uip.status) FROM user_ind_partitions uip, user_indexes ui WHERE ui.index_name=uip.index_name(+) AND ui.table_name='SALES' GROUP BY ui.index_name, DECODE(uip.status,null,ui.status,uip.status); PROMPT Now you can drop the empty overflow partition ALTER TABLE sales DROP PARTITION sales_beyond_q1_2002;
|
| 2. |
Note that all local indexes of the SALES table are still valid. Run the script show_sales_idx_status.sql to view the status of the local indexes on the SALES table. @c:\wkdir\show_sales_idx_status.sql
|
The next task to perform in a Rolling Window Operation is to delete the old data from the fact table. You only want to analyze the most recent data of the last three years. Therefore, because you added Q1-2002, you have to delete the data of Q1-1998.
Without Range Partitioning, you have to perform a DML operation against the table. With partitioning, you can leverage the PARTITION EXCHANGE command again to remove the data from the fact table. Similar to the adding of new data, a Hash Partitioning does not help you here either.
Note that you are not deleting the data. Instead you are exchanging (logically replacing) the partition containing this data from the SALES fact table with an empty stand-alone table with the same logical structure. You can then archive this data or drop the exchanged partition, depending on your business needs.
You need to create an empty table in which to store the old 1998 data.
|
From a SQL*Plus session logged on to the SH schema, run the script create_empty_sat.sql to create an empty table that will hold the old 1998 data: @c:\wkdir\create_empty_sat.sql DROP TABLE sales_old_q1_1998; CREATE TABLE sales_old_q1_1998 NOLOGGING COMPRESS
|
Now create the local indexes.
|
From a SQL*Plus session logged on to the SH schema, run the script create_ndx.sql to create the local indexes @c:\wkdir\create_ndx.sql set echo on Rem *****
|
Now create the constraints.
|
From a SQL*Plus session logged on to the SH schema, run the script create_constraints_old.sql to modify and create the constraints. @c:\wkdir\create_constraints_old.sql Rem ***10g***
|
Before you perform the exchange, look at the 1998 Q1 data that will be aged out of the partition.
|
From a SQL*Plus session logged on to the SH schema, run the script show_partition.sql to view the data that will be aged out of the partition: @c:\wkdir\show_partition.sql set echo on Rem ***** Rem show the actual content of the partition to be aged out Rem BEFORE exchange Rem ***** SELECT COUNT(*) FROM sales PARTITION (sales_q1_1998);
|
You now need to exchange the empty table with the existing Q1-1998 partition. To do this, you perform the following step:
|
From a SQL*Plus session logged on to the SH schema, run the script exchange_old_partition.sql to exchange the partition: @c:\wkdir\exchange_old_partition.sql Rem *****
Note that you could have used a DROP PARTITION statement instead. The table SALES_OLD_Q1_1998 now stores all the data of the first quarter of 1998. You could drop this table to remove the data entirely from the system. |
After you perform the exchange, you will want to take a look at the data in the partition.
| 1. |
From a SQL*Plus session logged on to the SH schema, run the script count_sales.sql to view the data in the partition: @c:\wkdir\count_sales.sql PROMPT show the actual content of the partition to be aged out AFTER exchange
Unlike before the EXCHANGE command, the stand-alone table now stores thousands of rows whereas the equivalent partition of SALES is empty.
|
| 2. |
Local Indexes are not affected. Run the script show_sales_idx_status.sql to view the index information. @c:\wkdir\show_sales_idx_status.sql PROMPT INDEXES ARE MAINTAINED SELECT ui.index_name, DECODE(uip.status,null,ui.status,uip.status)
|
To learn about local index maintenance enhancements in Oracle Database 10g, you will split the most recent quarter partition into a monthly partition with the online local index maintenance. This is a new functionality in Oracle Database 10g. Then, using you will use the Global Index Maintenance feature that was introduced in the Oracle9i release.
| 1. |
Utilize Oracle Database 10g Enhancements for Local Index Maintenance |
| 2. |
1.Enhancements for Local Index Maintenance:
Beginning with Oracle10g, all partition maintenance
operations can be executed without any impact on its availability. Local Index
Maintenance allows you to keep local indexes of a partitioned table up-to-date
as part of any atomic partition maintenance operation, thus making no indexes
unusable and affecting its usage when the maintenance operation takes place.
Oracle extended the SQL syntax for partition maintenance operations to control the physical attributes, such as index placement, for all affected local index structures.
Steps:
| 1.1 |
Split the most recent partition by using the default placement rules. |
| 1.2 |
Split a partition by using the extended SQL syntax for local index maintenance. |
| 1.3 | Clean up. |
Examine this scenario: After successfully loading the data for the first quarter of 2002, you recognize that due to changing business requirements the query pattern has changed. Instead of being focused mostly on quarterly analysis, many
business users have started to rely on monthly reporting and analysis.
To address this changed business requirement and to optimize the query performance, you can leverage Oracle Partitioning and to split the most recent quarter into monthly partitions. This must be
accomplished without any impact on online availability.
The online availability for local index maintenance will not be demonstrated explicitly. Online availability is demonstrated for global index maintenance and works in exactly the same manner for local indexes.
You need to create a new, empty partition. You can either create the new partition with a distinct upper boundary or by choosing the keyword MAXVALUE. The latter option ensures that records violating the potential upper boundary condition is not rejected and the INSERT operation succeeds.
In this business scenario, a SPLIT PARTITION is issued after the loading operation to identify any potential violations. All records violating the upper boundary will be "separated" into an extra partition.
You need to create a new, empty partition. To do this, you perform the following steps:
| 1. |
From a SQL*Plus session logged on to the SH schema, execute the following SQL statement to split off the most recent month (March 2002) from the quarterly partition, including local index maintenance. You can run the script split1_10g.sql to accomplish this: @c:\wkdir\split1_10g.sql PROMPT Now leverage the new functionality and break down the existing Rem note that an index location is not specified, so they will be colocated with the partitions. Rem the name will be inherited by the partition name ALTER TABLE sales SPLIT PARTITION sales_q1_2002
AT (TO_DATE('01-MAR-2002','DD-MON-YYYY'))
INTO (PARTITION sales_1_2_2002 TABLESPACE example,
PARTITION sales_MAR_2002 TABLESPACE example NOCOMPRESS)
UPDATE INDEXES;
|
| 2. |
You can see that the new index partitions are colocated with the table partitions and that the index partition naming is inherited from the partition naming. Run the script see_split.sql to view the partition information: @c:\wkdir\see_split.sql COL segment_name format a25 COL partition_name format a25 COL tablespace_name format a25 Rem you will see that the newly created index segments Rem Also note the automatically derived naming
SELECT segment_name, partition_name, tablespace_name
FROM user_segments
WHERE segment_type='INDEX PARTITION'
AND segment_name IN
(SELECT index_name
FROM user_indexes
WHERE table_name='SALES');
...
|
Split the remainder of the former quarter partition into a January and February partition. For demonstration purposes, create one of the new partitions in the SYSAUX tablespace and name some of the indexes explicitly.
| 1. |
From a SQL*Plus session logged on to the SH schema, execute the following SQL statement to split the remainder partition, including local index maintenance. You can run the split2_10g.sql script to accomplish this: @c:\wkdir\split2_10g.sql ALTER TABLE sales SPLIT PARTITION sales_1_2_2002
|
| 2. |
You can see that the new index partitions are colocated with the table partitions and that the index partition naming is inherited from the partition naming. Run the see_split2.sql script to view the partition and segment information: @c:\wkdir\see_split2.sql SELECT segment_name, partition_name, tablespace_name
|
Perform clean up operations. Move the partition out of the SYSAUX tablespace and into the EXAMPLE tablesapce, and back to the standard naming (conventions).
|
From a SQL*Plus session logged on to the SH schema, run the cleanup_split_10g.sql script to move the partition and update the indexes. @c:\wkdir\cleanup_split_10g.sql PROMPT bring them back in shape (into tablespace EXAMPLE) PROMPT Let's move the last cowboy back to EXAMPLE .. ALTER INDEX sales_time_bix REBUILD PARTITION feb_02 TABLESPACE example; PROMPT no index structures outside EXAMPLE anymore
SELECT segment_name, partition_name, tablespace_name
FROM user_segments
WHERE segment_type='INDEX PARTITION'
AND segment_name IN (SELECT index_name
FROM user_indexes
WHERE table_name='SALES')
AND tablespace_name <> 'EXAMPLE';
|
Global Index Maintenance enables you to keep global indexes of a partitioned table up-to-date as part of any atomic partition maintenance operation. This keeps global indexes from being unusable and not affecting their usage when the maintenance operation takes place.
Steps:
Exchange the March data into the partitioned table in the presence of a global index. First, you have to build the necessary infrastructure:
|
From a SQL*Plus session logged on to the SH schema, run the prep4_global_index.sql script to prepare for global index maintenance. @c:\wkdir\prep4_global_index.sql ALTER TABLE sales TRUNCATE PARTITION sales_MAR_2002; Rem control ... Rem is empty SELECT COUNT(*)
FROM sales PARTITION (sales_MAR_2002);
ALTER TABLE sales_mar_2002_temp COMPRESS;
ALTER TABLE sales_mar_2002_temp NOCOMPRESS;
CREATE BITMAP INDEX sales_prod_mar_2002_bix
ON sales_mar_2002_temp (prod_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_cust_mar_2002_bix
ON sales_mar_2002_temp (cust_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_time_mar_2002_bix
ON sales_mar_2002_temp (time_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_channel_mar_2002_bix
ON sales_mar_2002_temp (channel_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_promo_mar_2002_bix
ON sales_mar_2002_temp (promo_id)
NOLOGGING COMPUTE STATISTICS ;
|
To demonstrate the global index maintenance functionality, you first need to create a global index.. To do this, you perform the following steps:
| 1. |
From a SQL*Plus session logged on to the SH schema, run the create_global_index.sql script to create a concatenated unique index on the SALES table: @c:\wkdir\create_global_index.sql CREATE UNIQUE INDEX sales_pk ON sales (prod_id, cust_id, promo_id, channel_id, time_id) NOLOGGING COMPUTE STATISTICS; This may take up to a minute.
|
| 2. |
Build a constraint leveraging this index by running the add_sales_pk.sql script: @c:\wkdir\add_sales_pk.sql ALTER TABLE sales ADD CONSTRAINT sales_pk PRIMARY KEY (prod_id, cust_id, promo_id, channel_id, time_id) USING INDEX; |
| 3. |
Note that if a constraint is defined using the global index, the same constraint must be defined for the table to be exchanged as well! Run the add_salestemp_pk.sql script to accomplish this: @c:\wkdir\add_salestemp_pk ALTER TABLE sales_mar_2002_temp |
To demonstrate the impact of a partition maintenance operation on concurrent online access, you need TWO sessions and therefore TWO windows. Please read the following section CAREFULLY before exercising.
In Window One, you perform the following steps:
|
From a SQL*Plus session logged on to the SH schema, execute the following query: You can run the use_global_index.sql script to create an explain plan and view the information: @c:\wkdir\use_global_index.sql Rem STATEMENT WHICH USES GLOBAL INDEX - Window 1 EXPLAIN PLAN FOR SELECT /*+ INDEX(sales, sales_pk) */ count(*) FROM sales WHERE prod_id BETWEEN 100 AND 500; set linesize 140 SELECT * FROM TABLE(dbms_xplan.display); After verifying the plan that you are using the global index, process this statement over and over again. You can use the SQL*Plus functionality "r" to rerun the last statement, or you can run the run_select.sql file. @c:\wkdir\run_select.sql SELECT /*+ INDEX(sales, sales_pk) */ count(*) FROM sales WHERE prod_id BETWEEN 100 AND 500;
While you are running the query, perform the steps below in Window two. You will see that there is no impact on concurrent query access using a global index when the partition maintenance operation takes place. The query will not fail. Recognize that the query result will change as soon as the partition exchange command succeeds. The Oracle database guarantees READ CONSISTENCY in this situation as well and provides the most efficient partition table and index maintenance operations without restricting the online usage. |
In Window Two, perform the following steps:
| 1. |
From a SQL*Plus session logged on to the SH schema, execute the following query, or run the script exchange_partition_w_gim.sql: @c:\wkdir\exchange_partition_w_gim.sql set echo on Rem ***** Rem EXCHANGE IT Rem - with global index maintenance Rem demonstrate influence of index on other queries in Rem second window Rem ***** ALTER TABLE sales EXCHANGE PARTITION sales_mar_2002
Although it is a DDL command, it might take some time because the global indexes are maintained as part of the atomic PARTITION EXCHANGE command.
|
| 2. | You will see that all indexes are still valid after the partition maintenance operation. Run the script show_sales_idx_status.sql: @c:\wkdir\show_sales_idx_status.sql
SELECT ui.index_name,
DECODE(uip.status,null,ui.status,uip.status) status,
count(*) num_of_part
FROM user_ind_partitions uip, user_indexes ui
WHERE ui.index_name=uip.index_name(+)
AND ui.table_name='SALES'
GROUP BY ui.index_name,
DECODE(uip.status,null,ui.status,uip.status);
|
| 3. | View the information in the exchanged partition and the stand-alone table: Run the script count_mar_sales.sql: @c:\wkdir\count_mar_sales.sql SELECT COUNT(*) FROM sales PARTITION (sales_mar_2002); SELECT COUNT(*) FROM sales_mar_2002_temp;
Thousands of rows were added to the partitioned table with this command and the stand-alone table is empty now. |
The new sales data of Q1-2001 is exchanged again. The global index was maintained as part of the PARTITION EXCHANGE command, thus not affecting any online usage.
Next, investigate the old behavior prior to Oracle9i without global index maintenance.
To demonstrate this functionality, you will need TWO windows. Please read the following section CAREFULLY before exercising it by yourself.
In Window One, you perform the following steps:
From a SQL*Plus session logged on to the SH schema, execute the following query. Run the script use_global_index.sql: @c:\wkdir\use_global_index.sql Rem STATEMENT WHICH USES GLOBAL INDEX - Window 1 explain plan for SELECT /*+ INDEX(sales, sales_pk) */ count(*) FROM sales WHERE prod_id BETWEEN 100 AND 500; set linesize 140 SELECT * FROM table(dbms_xplan.display); SELECT /*+ INDEX(sales, sales_pk) */ count(*) FROM sales WHERE prod_id BETWEEN 100 AND 500; Perform the steps in Window two below and then execute the query above to see the difference. It will fail as soon as the partition maintenance command is processed.
|
In Window Two, perform the following steps:
| 1. |
From a SQL*Plus session logged on to the SH schema, execute the following query. Run the script exchange_partition_wo_gim2.sql: @c:\wkdir\exchange_partition_wo_gim2.sql
Rem EXCHANGE IT
Rem - without global index maintenance
Rem demonstrate influence of index on other queries in
Rem second window
Rem *****
ALTER TABLE sales
EXCHANGE PARTITION sales_mar_2002
WITH TABLE sales_mar_2002_temp
|
| 2. | The global index is now marked unusable. Run the script show_sales_idx_status.sql to view this information. @c:\wkdir\show_sales_idx_status.sql
SELECT ui.index_name,
DECODE(uip.status,null,ui.status,uip.status) status,
count(*) num_of_part
FROM user_ind_partitions uip, user_indexes ui
WHERE ui.index_name=uip.index_name(+)
AND ui.table_name='SALES'
GROUP BY ui.index_name, DECODE(uip.status,null,ui.status,uip.status);
|
To clean up your environment, perform the following step:
|
From a SQL*Plus session logged on to the SH schema, execute the following statements to cleanup the module-specific modifications. Run the script cleanup_mod1.sql: @c:\wkdir\cleanup_mod1.sql Rem drop unique index sales_pk. you do not need it, and it Rem Cleanup - bring data back Rem Necessary - that way you do not have to check for it at the end - top goal is no Rem interference with the original SH schema ... Rem - data for 1998 Rem - no data for 2001 ALTER TABLE sales EXCHANGE PARTITION sales_q1_1998 WITH TABLE sales_old_q1_1998 INCLUDING INDEXES; ALTER TABLE sales DROP PARTITION sales_jan_2002; ALTER TABLE sales DROP PARTITION sales_feb_2002; ALTER TABLE sales DROP PARTITION sales_mar_2002; DROP TABLE sales_mar_2002_temp;
DROP TABLE sales_delta;
DROP TABLE sales_old_q1_1998;
PROMPT original situation again Rem SELECT COUNT(*) FROM sales PARTITION (sales_q1_2001); Rem SELECT COUNT(*) FROM sales PARTITION (sales_q1_1998); PROMPT just to be safe ... -cleanup module set serveroutput on exec dw_handsOn.cleanup_modules PROMPT control whether the correction script was applied properly .. SELECT * FROM TABLE(dw_handsOn.verify_env) Please ensure that the environment is properly reset. Having an environment that is not "clean" may affect the usage or run-time behavior in other tutorials. |
In this tutorial, you learned how to:
| Load data using external tables | ||
| Compare usage of SQL*Loader to external tables | ||
| Perform a table compression to save disk space | ||
| Perform a rolling window operation using Oracle partitioning | ||