Part 4: Data Warehousing and Integration Features
New features for more efficiently managing materialized views, Query Rewrite, Transportable Tablespace, and table partitions make your data warehouse an even more powerful, and less resource-intensive, asset.
Covered in This Installment:
To understand this enhancement, you first have to understand the concept of partition pruning during a materialized view (MV) refresh process.
Suppose the table ACCOUNTS has been partitioned on the column ACC_MGR_ID, with one partition per the value of ACC_MGR_ID. You have created an MV called ACC_VIEW based on ACCOUNTS, which is also partitioned on the column ACC_MGR_ID with one partition per ACC_MGR_ID, as shown in the figure below:
SQL> select * 2 from dba_mview_logs 3 where master = 'ACCOUNTS'; no rows selectedNow, update a record in the table.
update accounts set last_name = '...' where acc_mgr_id = 3;The record is in partition P3. Now you are ready to refresh the MV. But first, record the segment-level statistics on all segments of the table ACCOUNTS. You will use these stats later to see which segments were used.
select SUBOBJECT_NAME, value from v$segment_statistics where owner = 'ARUP' and OBJECT_NAME = 'ACCOUNTS' and STATISTIC_NAME = 'logical reads' order by SUBOBJECT_NAME / SUBOBJECT_NAME VALUE ------------------------------ ---------- P1 8320 P10 8624 P2 12112 P3 11856 P4 8800 P5 7904 P6 8256 P7 8016 P8 8272 P9 7840 PMAX 256 11 rows selected.Refresh the materialized view ACC_VIEW using fast refresh.
execute dbms_mview.refresh('ACC_VIEW','F')The 'F' parameter indicates a fast refresh. But will it work without an MV log on the table? After the refresh is complete, check the segment statistics of the table ACCOUNTS again. The results are shown below:
SUBOBJECT_NAME VALUE ------------------------------ ---------- P1 8320 P10 8624 P2 12112 P3 14656 P4 8800 P5 7904 P6 8256 P7 8016 P8 8272 P9 7840 PMAX 256The segment statistics show the segments that were selected in a logical read. These statistics being cumulative, you have to see the change in the value instead of an absolute value. If you examine the above values, you can see that only the value for partition P3 changed. So, only the partition P3 has been selected in the refresh process, not the entire tableconfirming that PCT kicked in even without an MV log on the table.
The ability to fast-refresh an MV even when the base tables do not have MV logs is a powerful and useful feature, allowing you to do fast refreshes in partitioned MVs without the added performance overhead. This gets my vote as the most useful data warehousing enhancement in Oracle Database 10g Release 2.
Query Rewrite with Multiple MVs The Query Rewrite feature introduced in Oracle8i was an instant hit with data warehouse developers and DBAs. Essentially, it rewrites user queries to select from MVs instead of tables in order to take advantage of the summary already in place. For example, consider these three tables in the database of a major hotel chain.
SQL> DESC HOTELS Name Null? Type ----------------------------------------- -------- ------------- HOTEL_ID NOT NULL NUMBER(10) CITY VARCHAR2(20) STATE CHAR(2) MANAGER_NAME VARCHAR2(20) RATE_CLASS CHAR(2) SQL> DESC RESERVATIONS Name Null? Type ----------------------------------------- -------- ------------- RESV_ID NOT NULL NUMBER(10) HOTEL_ID NUMBER(10) CUST_NAME VARCHAR2(20) START_DATE DATE END_DATE DATE RATE NUMBER(10) SQL> DESC TRANS Name Null? Type ----------------------------------------- -------- ------------- TRANS_ID NOT NULL NUMBER(10) RESV_ID NOT NULL NUMBER(10) TRANS_DATE DATE ACTUAL_RATE NUMBER(10)The table HOTELS holds the information on the hotels. When a customer makes a reservation, a record is created in the table RESERVATIONS, which includes the quoted room rate. When the customer checks out of the hotel, the money transactions are recorded in a different table, TRANS. However, prior to check-out, the hotel may decide to offer a different rate to the customer based on room availability, upgrades, incentives, and do on. Hence the final room rate could differ from the rate quoted during reservation, and may even vary from day to day. To record these fluctuations correctly, the table TRANS holds a row containing rate information for each day of the stay. To enhance query response times, you may decide to build MVs based on the different queries issued by users, such as
create materialized view mv_hotel_resv refresh complete enable query rewrite as select city, resv_id, cust_name from hotels h, reservations r where r.hotel_id = h.hotel_id;and
create materialized view mv_actual_sales refresh complete enable query rewrite as select resv_id, sum(actual_rate) from trans group by resv_id;Thus, a query such as
select city, cust_name from hotels h, reservations r where r.hotel_id = h.hotel_id;will be rewritten to
select city, cust_name from mv_hotel_resv;provided some parameters are set, such as query_rewrite_enabled = true. You can confirm the MV by running the query and enabling autotrace.
SQL> set autot traceonly explain SQL> select city, cust_name 2> from hotels h, reservations r 3> where r.hotel_id = h.hotel_id; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=80 Bytes=2480) 1 0 MAT_VIEW ACCESS (FULL) OF 'MV_HOTEL_RESV' (MAT_VIEW) (Cost=3 Card=80 Bytes=2480)Note how the query selected from the materialized view MV_HOTEL_RESV instead of the tables HOTELS and RESERVATIONS. This is exactly what you wanted. Similarly, when you write a query summarizing the actual rates for each reservation number, the materialized view MV_ACTUAL_SALES will be used, not the table TRANS. Let's take a different query. If you want to find out the actual sales in each city, you will issue
select city, sum(actual_rate) from hotels h, reservations r, trans t where t.resv_id = r.resv_id and r.hotel_id = h.hotel_id group by city;Note the query structure: from MV_ACTUAL_SALES, you can get the RESV_ID and the total sales for the reservation. From MV_HOTEL_RESV, you can get the CITY and RESV_ID. Can you join these two MVs? Sure you can, but prior to Oracle Database 10g Release 2, the Query Rewrite mechanism automatically rewrites the user query using only one of the MVs, not both of them.
Here is the execution plan output from Oracle9i Database. As you can see, only MV_HOTEL_RESV and the full table scan of TRANS are used.
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=8 Card=6 Bytes=120) 1 0 SORT (GROUP BY) (Cost=8 Card=6 Bytes=120) 2 1 HASH JOIN (Cost=7 Card=516 Bytes=10320) 3 2 MAT_VIEW REWRITE ACCESS (FULL) OF 'MV_HOTEL_RESV' (MAT_VIEW REWRITE) (Cost=3 Card=80 Bytes=1040) 4 2 TABLE ACCESS (FULL) OF 'TRANS' (TABLE) (Cost=3 Card=516 Bytes=3612)This approach results in sub-optimal execution plans, even if an MV is available. The only recourse is to create another MV joining all three tables. However, that approach leads to a proliferation of MVssignificantly increasing the time required to refresh them. In Oracle Database 10g Release 2, this problem disappears. Now the above query will be rewritten to use both MVs, as shown in the execution plan.
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=8 Card=6 Bytes=120) 1 0 SORT (GROUP BY) (Cost=8 Card=6 Bytes=120) 2 1 HASH JOIN (Cost=7 Card=80 Bytes=1600) 3 2 MAT_VIEW REWRITE ACCESS (FULL) OF 'MV_ACTUAL_SALES' (MAT_VIEW REWRITE) (Cost=3 Card=80 Bytes=560) 4 2 MAT_VIEW REWRITE ACCESS (FULL) OF 'MV_HOTEL_RESV' (MAT_VIEW REWRITE) (Cost=3 Card=80 Bytes=1040)Note how only the MVs are used, not any other base tables. This enhancement has significant advantages in data warehouse environments because you don't have to create and refresh an MV for each possible query. Instead, you can strategically create a few MVs without too many joins and aggregations and Oracle will use them all to rewrite queries.
Transportable Tablespace from Backup Transportable tablespaces, introduced in Oracle8i, provided much-needed support for faster data transfer across databases. Using this feature, you can export just the metadata of the tablespace, transfer the data file and the export dump file to the target database host, and import the metadata to "plug" the tablespace into the target database. The data in the tablespace is then instantly available in the target database. This approach solves what was then one of the thorniest issues in data warehousing: moving data across databases quickly and efficiently. In an OLTP database, however, this condition is almost always impossible, and thus so is transporting tablespaces. If the OLTP database is the data source for the data warehouse, then you may never be able to use Transportable Tablespace to load it. In Oracle Database 10g Release 2, you can transport a tablespace and plug it in from another source: your backups. For example, if you want to transport the tablespace ACCDATA, you can issue the RMAN command
RMAN> transport tablespace accdata 2> TABLESPACE DESTINATION = '/home/oracle' 3> auxiliary destination = '/home/oracle';which creates an auxiliary instance on the location /home/oracle and restores the files from the backup there. The auxiliary instance name is generated randomly. After creating the instance, the process creates a directory object on the directory and restores the files of the tablespace ACCDATA (the one we are transporting)all by itself, without a single command from you! The directory /home/oracle will have all the datafiles of the tablespace ACCDATA, the dump file containing the metadata of the tablespace, and, most important, a script named impscrpt.sql. This script contains all the necessary commands to plug this tablespace into a target tablespace. The tablespace is not transported by the impdp command but rather through the call to the dbms_streams_tablespace_adm.attach_tablespaces package. All the necessary commands can be found in the script. But what if something goes wrong, you may ask? In that case, making a diagnosis is easy. First, the auxiliary instance creates the alert log file in the location $ORACLE_HOME/rdbms/log, so you can examine the log for potential problems. Second, while giving the RMAN command, you can redirect the commands and the output to a log file by issuing the RMAN command
rman target=/ log=tts.logwhich places all the output in the file tts.log. You can then examine the file for the exact cause of the failure. Finally, the files are restored into the directory TSPITR_<SourceSID>_<AuxSID> in /home/oracle. For instance, if the SID of the main database is ACCT and the SID of the auxiliary instance created by RMAN is KYED, the directory name is TSPITR_ACCT_KYED. The directory also has two other subdirectories: datafile (for datafiles) and onlinelog (for redo logs). Before the new tablespace creation is complete, you can examine the directory to see which files are restored. (These files are deleted at the end of the process.) DBAs have been waiting for the ability to create a transportable tablespace from RMAN backups for a long time. But bear in mind that you are plugging-in the transported tablespace from backup, not from the online tablespace. So it won't be current.
Quick Partition Split for Partitioned, Index-Organized Tables Consider this situation: Let's say that you have a partitioned table. The end of the month arrives, but you have forgotten to define a partition for the next month. What are your options now? Well, your only recourse is to split the maxvalue partition into two parts: the partition for the new month and the new maxvalue partition. However, there is a slight problem when you take that approach for partitioned, index-organized tables. In this case, the physical partition is created first and rows are moved there from the maxvalue partitionthereby consuming I/O as well as CPU cycles. In Oracle Database 10g Release 2, this process is simplified considerably. As illustrated in the figure below, let's say you have defined partitions up until May and then the PMAX partition has been defined as a catch-all partition. Because there is no specific partition for June, the June data goes into the PMAX partition. The grey-shaded rectangle shows data populated in this segment. Only part of the PMAX partition is filled, so you see only a portion of shading.
LONG to LOB Conversion via Online Redef If your data warehouse database has been in place for a while and you work with large textual data, you probably have a lot of columns with the datatype LONG. And, needless to say, the LONG datatype is useless in most cases of data manipulation such as searching via SUBSTR. You definitely want to convert them to LOB columns. You can do that online, using the DBMS_REDEFINITION package. However, prior to Oracle Database 10g Release 2, there is a big limitation. When converting LONG columns to LOB, performance is highly desirable; you want to make the process as fast as possible. If the table is partitioned, the process is done in parallel across partitions. However, if the table is un-partitioned, then the process becomes serial and can take a long time. Thankfully, in Oracle Database 10g Release 2, online conversion from LONG to LOB can occur in parallel inside the DBMS_REDEFINITION package, even if the table is non-partitioned. Let's see how it works with an example. Here is a table for holding email messages sent to customers. Because the body of the message, stored in MESG_TEXT, is typically long textual data, the column has been defined as LONG.
SQL> desc acc_mesg Name Null? Type ----------------------------------------- -------- --------- ACC_NO NOT NULL NUMBER MESG_DT NOT NULL DATE MESG_TEXT LONGYou want to convert this column to CLOB. First, create an empty interim table with an identical structure except the last column, which is defined as CLOB.
create table ACC_MESG_INT ( acc_no number, mesg_dt date, mesg_text clob );Now start the redefinition process.
1 begin 2 dbms_redefinition.start_redef_table ( 3 UNAME => 'ARUP', 4 ORIG_TABLE => 'ACC_MESG', 5 INT_TABLE => 'ACC_MESG_INT', 6 COL_MAPPING => 'acc_no acc_no, mesg_dt mesg_dt, to_lob(MESG_TEXT) MESG_TEXT' 7 ); 8* end;Note line 6, where the columns have been mapped. The first two columns have been left the same but the third column MESG_TEXT has been mapped so that the destination table's column MESG_TEXT is populated by applying the function TO_LOB on the source table's column. If the table to be redefined is large, you will need to synchronize the data between the source and target tables periodically. This approach makes the final sync-up faster.
begin dbms_redefinition.sync_interim_table( uname => 'ARUP', orig_table => 'ACC_MESG', int_table => 'ACC_MESG_INT' ); end; /You may have to give the above command a few times, depending on the size of the table. Finally, complete the redefinition process with
begin dbms_redefinition.finish_redef_table ( UNAME => 'ARUP', ORIG_TABLE => 'ACC_MESG', INT_TABLE => 'ACC_MESG_INT' ); end; /The table ACC_MESG has changed:
SQL> desc acc_mesg Name Null? Type ----------------------------------------- -------- --------- ACC_NO NOT NULL NUMBER MESG_DT NOT NULL DATE MESG_TEXTNote the column MESG_TEXT is now CLOB, instead of LONG. This feature is very useful for converting incorrectly defined or legacy leftover data structures into more manageable datatypes.
Online Reorg of a Single Partition Suppose you have a table TRANS that contains history of transactions. This table is partitioned on the TRANS_DATE, with each quarter as a partition. During the normal course of business, the most recent partitions are updated frequently. After a quarter is complete, there may not be much activity on that partition and it can be moved to a different location. However, the move itself will require a lock on the table, denying public access to the partition. How can you move the partition with no impact on its availability? In Oracle Database 10g Release 2, you can use online redefinition on a single partition. You can perform this task just as you would for the entire tableusing the DBMS_REDEFINITION packagebut the underlying mechanism is different. Whereas regular tables are redefined by creating a materialized view on the source table, a single partition is redefined through an exchange partition method. Let' see how it works. Here is the structure of the TRANS table:
SQL> desc trans Name Null? Type --------------------------------- -------- ------------------------- TRANS_ID NUMBER TRANS_DATE DATE TXN_TYPE VARCHAR2(1) ACC_NO NUMBER TX_AMT NUMBER(12,2) STATUSThe table has been partitioned as follows:
partition by range (trans_date) ( partition y03q1 values less than (to_date('04/01/2003','mm/dd/yyyy')), partition y03q2 values less than (to_date('07/01/2003','mm/dd/yyyy')), partition y03q3 values less than (to_date('10/01/2003','mm/dd/yyyy')), partition y03q4 values less than (to_date('01/01/2004','mm/dd/yyyy')), partition y04q1 values less than (to_date('04/01/2004','mm/dd/yyyy')), partition y04q2 values less than (to_date('07/01/2004','mm/dd/yyyy')), partition y04q3 values less than (to_date('10/01/2004','mm/dd/yyyy')), partition y04q4 values less than (to_date('01/01/2005','mm/dd/yyyy')), partition y05q1 values less than (to_date('04/01/2005','mm/dd/yyyy')), partition y05q2 values less than (to_date('07/01/2005','mm/dd/yyyy')) )At some point in time, you decide to move the partition Y03Q2 to a different tablespace (TRANSY03Q2), which may be on a different type of disk, one that is a little slower and cheaper. To do that, first confirm that you can redefine the table online:
begin dbms_redefinition.can_redef_table( uname => 'ARUP', tname => 'TRANS', options_flag => dbms_redefinition.cons_use_rowid, part_name => 'Y03Q2'); end; /There is no output here, so you have your confirmation. Next, create a temporary table to hold the data for that partition:
create table trans_temp ( trans_id number, trans_date date, txn_type varchar2(1), acc_no number, tx_amt number(12,2), status varchar2(1) ) tablespace transy03q2 /Note that because the table TRANS is range partitioned, you have defined the table as un-partitioned. It's created in the desired tablespace, TRANSY03Q2. If the table TRANS had some local indexes, you would have created those indexes (as non-partitioned, of course) on the table TRANS_TEMP. Now you are ready to start the redefinition process:
begin dbms_redefinition.start_redef_table( uname => 'ARUP', orig_table => 'TRANS', int_table => 'TRANS_TEMP', col_mapping => NULL, options_flag => dbms_redefinition.cons_use_rowid, part_name => 'Y03Q2'); end; /Note a few things about this call. First, the parameter col_mapping is set to NULL; in a single-partition redefinition, that parameter is meaningless. Second, a new parameter, part_name, specifies the partition to be redefined. Third, note the the absence of the COPY_TABLE_DEPENDENTS parameter, which is also meaningless because the table itself is not changed in any way; only the partition is moved. If the table is large, the operation may take a long time; so sync it mid-way.
begin dbms_redefinition.sync_interim_table( uname => 'ARUP', orig_table => 'TRANS', int_table => 'TRANS_TEMP', part_name => 'Y03Q2'); end; /Finally, finish the process with
begin dbms_redefinition.finish_redef_table( uname => 'ARUP', orig_table => 'TRANS', int_table => 'TRANS_TEMP', part_name => 'Y03Q2'); end;At this time, the partition Y03Q2 is in the tablespace TRANSY03Q2. If you had any global indexes on the table, they would be marked UNUSABLE and must be rebuilt. Single-partition redefinitions are useful for moving partitions across tablespaces, a common information lifecycle management task. Obviously, however, there are a few restrictionsfor example, you can't change partitioning methods (say, from range to hash) or change the structure of the table during the redefinition process.
Drop a Table in Chunks Have you ever noticed how much time it takes to drop a partitioned table? That's because each partition is a segment that has to be dropped. In Oracle Database 10g Release 2, when you drop a partitioned table, partitions are dropped one by one. Because each partition is dropped individually, fewer resources are required than when the table is dropped as a whole. To demonstrate this new behavior, you can trace the session with a 10046 trace.
alter session set events '10046 trace name context forever, level 12';Then drop the table. If you examine the trace file, you'll see how a partitioned-table drop looks:
delete from tabpart$ where bo# = :1 delete from partobj$ where obj#=:1 delete from partcol$ where obj#=:1 delete from subpartcol$ where obj#=:1Note that the partitions are deleted in serial fashion. This approach minimizes resource utilization during the drop process and enhances performance.
In Part 5, I'll cover backup and availability features, including Oracle Secure Backup.
Back to Series Index