TECHNOLOGY: Data Warehousing
Room with a Better View
By Arup Nanda
Get a good, fast look at your data with materialized views.
Users at Acme Bank were complaining that typical and repeated queries were taking too long to execute. Because these queries are executed over and over by many users, any improvement in response time is bound to help.
The DBAs at Acme Bank have tuned the most commonly used queries, all the needed indexes are present, and no further SQL tuning is going to make any difference to the performance of these queries.
The Acme Bank DBAs' solution: Use materialized views (MVs).
This article discusses how to plan for MVs, how to set up and confirm different MV capabilities, how to automatically generate the scripts to create MVs, how to make query rewrite (QR) available, and how to make sure that QR gets used.
From Queries to Views
The following is a common query at Acme Bank:
SELECT acc_type, SUM(cleared_bal) totbal FROM accounts GROUP BY acc_type;
And the following is an MV, mv_bal , for this query:
CREATE OR REPLACE MATERIALIZED VIEW mv_bal REFRESH ON DEMAND AS SELECT acc_type, SUM(cleared_bal) totbal FROM accounts GROUP BY acc_type;
MVs are segments similar to tables, in which the output of queries is stored in the database.
Now suppose a user wants to get the total of all account balances for the account type 'C' and issues the following query:
SELECT SUM(cleared_bal) FROM accounts WHERE acc_type = 'C';
Because the mv_bal MV already contains the totals by account type, the user could have gotten this information directly from the MV, by issuing the following:
SELECT totbal FROM mv_bal WHERE acc_type = 'C';
This query against the mv_bal MV would have returned results much more quickly than the query against the accounts table. Running a query against the MV will be faster than running the original query, because querying the MV does not query the source tables.
But there is another difference between querying the source tables and the MV. When a user executes a query against the source tables, the results return the current data. Querying the MV, however, often returns the data as of the time the MV was created . And because the MV is not updated when the data in the source tables is, it is bound to get out of sync.
To keep the data in sync, the MV is refreshed from time to time, either manually or automatically. There are two ways to refresh data in MVs. In one of them, the MV is completely wiped clean and then repopulated with data from the source tables—a process known as complete refresh . In some cases, however, when the source tables may have changed very little, it is possible to refresh the MV only for changed records on the source tables—a process known as fast refresh . To use fast refresh, however, you must have created the MV as fast-refreshable. Because it updates only changed records, fast refresh is faster than complete refresh. (See the Oracle Database Data Warehousing Guide for more information on refreshing MVs.)
To make sure that users will query MVs at Acme Bank, even if they don't know anything about the MVs, the bank uses the QR feature. With QR, the database engine can rewrite the original query from the user to use an MV, automatically , so that the user need not be aware of all the MVs in place. In some cases, such as in an OLTP system, it may not be desirable to query from an MV that may not be current with the data in the source tables. QR can be disabled in such cases—either databasewide, for a specific session, or just for specific MVs.
The DBA can enable QR for a session or for the entire system, by setting the QUERY_REWRITE_INTEGRITY parameter to true. For any MV to be used in a QR, the MV must have been built to include the ENABLE QUERY REWRITE clause. For example, the following creates the acc_mgr_view MV with the QR feature from the complex Acme Bank query shown in Listing 1:
CREATE OR REPLACE MATERIALIZED VIEW acc_mgr_view ENABLE QUERY REWRITE AS <the query shown in Listing 1> /
Code Listing 1: Original query
select acc_mgr_id, acc_type_desc, decode (a.sub_acc_type,null,'?', sub_acc_type_desc) sub_acc_type_desc, sum(cleared_bal) tot_cleared_bal, sum(uncleared_bal) tot_uncleared_bal, avg(cleared_bal) avg_cleared_bal, avg(uncleared_bal) avg_uncleared_bal, sum(cleared_bal+uncleared_bal) tot_total_bal, avg(cleared_bal+uncleared_bal) avg_total_bal, min(cleared_bal+uncleared_bal) min_total_bal from balances b, accounts a, acc_types at, sub_acc_types sat where a.acc_no = b.acc_no and at.acc_type = a.acc_type and sat.sub_acc_type (+) = a.sub_acc_type group by acc_mgr_id, acc_type_desc, decode (a.sub_acc_type,null,'?', sub_acc_type_desc)
Converting the query in Listing 1 to an MV is easy, but there are several questions to answer before creating the MV, including how much space it will occupy and how many rows it will contain.
The DBMS_MVIEW package in Oracle Database 10g will help get these answers. To estimate the size of the proposed MV, the script in Listing 2 calls DBMS_MVIEW.ESTIMATE_MVIEW_SIZE.
Code Listing 2: Estimating MV size
set serveroutput on size 999999 declare l_num_rows number; l_num_bytes number; l_stmt varchar2(2000); begin l_stmt := 'select acc_mgr_id, <the query shown in Listing 1> (a.sub_acc_type,null,''?'', sub_acc_type_desc)'; dbms_mview.estimate_mview_size ( stmt_id => 'Est1', select_clause => l_stmt, num_rows => l_num_rows, num_bytes => l_num_bytes ); dbms_output.put_line('Number of rows = '||l_num_rows); dbms_output.put_line('Size (bytes) = '||l_num_bytes); end; / Number of rows = 2829000 Size (bytes) = 667644000
The output in Listing 2 estimates that the MV will contain 2,829,000 rows and that it's going to be about 667MB in size. These are approximations made from optimizer statistics gathered on the source tables earlier, and the exact values may be different. But it helps plan for space and determine in which tablespace to place this MV.
The DBMS_MVIEW.EXPLAIN_MVIEW procedure checks the features and capabilities of an MV before it is created and writes the results to a table named MV_CAPABILITIES_TABLE . First, the DBA creates this table, by running the utlxmv.sql script in the rdbms/admin directory under Oracle Home.
Listing 3 uses the DBMS_MVIEW.EXPLAIN_MVIEW procedure to see what type of operations the proposed MV can be used for.
After the first part of Listing 3 populates the results in MV_CAPABILITIES_TABLE , the second part (after the -- Now check the capabilities comment) goes on to select from that table, as shown in the SELECT ROWNUM, CAPABILITY_NAME, ... query. To aid in the explanation of the output, the query uses ROWNUM to denote line numbers.
Code Listing 3: Checking capabilities
declare l_stmt varchar2(2000); begin l_stmt := 'select acc_mgr_id, <the query shown in Listing 1> (a.sub_acc_type,null,''?'', sub_acc_type_desc)'; dbms_mview.explain_mview ( stmt_id => 'MV_Tune1', mv => l_stmt ); end; / -- -- Now check the capabilities -- SELECT ROWNUM, CAPABILITY_NAME, POSSIBLE, MSGTXT, RELATED_TEXT FROM mv_capabilities_table WHERE STATEMENT_ID = 'Est1' AND CAPABILITY_NAME LIKE 'REFRESH%' ORDER BY SEQ / -- Output -- LN CAPABILITY_NAME P MSGTXT RELATED_TEXT --- ------------------------------ -- ----------------------- --------------- 1 REFRESH_COMPLETE Y 2 REFRESH_FAST N 3 REFRESH_FAST_AFTER_INSERT N agg(expr) requires AVG_CLEARED_BAL correspondingCOUNT(expr) function 4 REFRESH_FAST_AFTER_INSERT N one or more joins present in mv 5 REFRESH_FAST_AFTER_INSERT N GROUP BY clause in mv 6 REFRESH_FAST_AFTER_INSERT N aggregate function in mv 7 REFRESH_FAST_AFTER_INSERT N the detail table does not ARUP.SUB_ACC_TYPES not have a materialized view log 8 REFRESH_FAST_AFTER_ONETAB_DML N SUM(expr) without TOT_TOTAL_BAL COUNT(expr) 9 REFRESH_FAST_AFTER_ONETAB_DML N SUM(expr) without TOT_UNCLEARED_BAL COUNT(expr) 10 REFRESH_FAST_AFTER_ONETAB_DML N SUM(expr) without TOT_CLEARED_BAL COUNT(expr) 11 REFRESH_FAST_AFTER_ONETAB_DML N see the reason why REFRESH_FAST_AFTER_INSERT is disabled 12 REFRESH_FAST_AFTER_ANY_DML N see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled 13 REFRESH_FAST_PCT N PCT is not possible on any of the detail tables in the materialized view
The results in Listing 3 tell the story. The CAPABILITY_NAME column lists the different refresh capabilities, and the column POSSIBLE (P) shows via a simple Y (yes) or N (no) whether that capability is possible in this MV. The MV is capable of being completely refreshed, as shown in line 1, but it's not capable of being fast-refreshed, as shown in line 2. Why not? The DBA goes down the list to see why the refresh-related features are not possible. Line 3 indicates that the fast refresh is not possible because the MV does not have a COUNT() expression in the query where aggregation functions such as SUM() are used. If an MV uses aggregation functions, then COUNT() must be in the query to make it fast-refreshable. The column RELATED_TEXT in the output shows which columns in the MV are being referred to in the MSGTXT column. The result agg(expr) —short for aggregation(expression)—in MSGTXT in line 3 refers to the column AVG_CLEARED_BAL in the MV. Because the AVG(CLEARED_BAL) clause is used in the query to build the MV, there must additionally be a COUNT(CLEARED_BAL) or COUNT(*) expression to make the MV fast-refreshable.
Additionally, fast refreshes require the creation of materialized view logs on the base table. These logs record the changes occurring in the base tables. During fast refresh, these logs are read by the refresh process to determine the changes to apply to the MV. In line 7, the MSGTXT column shows the detail table does not have a materialized view log and the RELATED_TEXT column shows the table on which MV logs are not present— ARUP.SUB_ACC_TYPES.
After the Acme DBA uses the information in Listing 3 to make the planned MV capable of fast refresh, it's time to generate the file that will generate the MV. The procedure in Listing 4 specifies a task name and passes it and the SQL query (in Listing 1) to the DBMS_ADVISOR.TUNE_MVIEW procedure to generate all necessary SQL statements to create the MV. After the DBA runs the script in Listing 4, running the following script generates the recommendations of the DBMS Advisor:
CREATE DIRECTORY TMP_DIR AS '/tmp' / BEGIN DBMS_ADVISOR.CREATE_FILE ( DBMS_ADVISOR.GET_TASK_SCRIPT ('MV_Tune_Task1'), 'TMP_DIR', 'mv_tune_task1.sql' ); END;
Code Listing 4: Generating scripts
declare l_stmt varchar2(2000); l_task_name varchar2(30); begin l_task_name := 'MV_Tune_Task1'; l_stmt := 'create materialized view acc_mgr_view enable query rewrite as ... <the query shown in Listing 1>...'; dbms_advisor.tune_mview ( task_name => l_task_name, mv_create_stmt => l_stmt ); end; /
This creates a file named mv_tune_task1.sql in the /tmp directory that contains all the SQL statements needed to create the MV logs on all the necessary base tables and create the MV with all the appropriate parameters to make it fast-refreshable. Running the mv_tune_task1.sql file in SQL*Plus will create all the necessary objects. The mv_tune_task1.sql file is available as Listing 8.
Confirming Query Rewrite
Listing 5 contains another popular user query at Acme Bank; it uses column, table, and function information similar to that of the query in the acc_mgr_view MV. Listing 5 also uses the AUTOTRACE feature of SQL*Plus to provide information on whether QR occurs when the query runs. The AUTOTRACE output clearly shows that the query uses all the source tables, not the materialized view, acc_mgr_view . Why?
Code Listing 5: Confirming query rewrite
SQL> alter session set query_rewrite_enabled = true; Session altered. SQL> set autotrace traceonly explain SQL> select acc_type_desc, sum(cleared_bal) 2 from balances b, accounts a, acc_types at 3 where a.acc_no = b.acc_no 4 and at.acc_type = a.acc_type 5 group by acc_type_desc; Execution Plan ------------------------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=533 Card=4 Bytes=112) 1 0 SORT (GROUP BY) (Cost=533 Card=4 Bytes=112) 2 1 HASH JOIN (Cost=523 Card=50000 Bytes=1400000) 3 2 HASH JOIN (Cost=146 Card=50000 Bytes=850000) 4 3 TABLE ACCESS (FULL) OF 'ACC_TYPES' (TABLE) (Cost=3 Card=4 Bytes=40) 5 3 TABLE ACCESS (FULL) OF 'ACCOUNTS' (TABLE) (Cost=140 Card=100000 Bytes=700000) 6 2 TABLE ACCESS (FULL) OF 'BALANCES' (TABLE) (Cost=106 Card=100000 Bytes=1100000)
Running the utlxrw.sql script in $ORACLE_HOME/rdbms/admin sets up a table named REWRITE_TABLE , whose results will help determine why the optimizer did not use QR on the query in Listing 5. Running the DBMS_VIEW.EXPLAIN_REWRITE procedure, shown in Listing 6, populates the REWRITE_TABLE table with the findings. Querying the MESSAGE column in REWRITE_TABLE provides the reason why:
SQL> SELECT MESSAGE FROM REWRITE_TABLE; QSM-01110: a lossy join in MV, ACC_MGR_VIEW, between tables, ACCOUNTS and SUB_ACC_TYPES, not found in query
Code Listing 6: Checking for query rewrite
truncate table rewrite_table / declare l_stmt varchar2(2000); l_task_name varchar2(30); begin l_stmt := 'select acc_type_desc, sub_acc_type, sum(cleared_bal) from balances b, accounts a, acc_types at where a.acc_no = b.acc_no and at.acc_type = a.acc_type group by acc_type_desc, sub_acc_type'; dbms_mview.explain_rewrite ( query => l_stmt, mv => 'ACC_MGR_VIEW', statement_id => 'MV_Explain_RW1' ); end; / commit / select message from rewrite_table /
The result shows that QR will not occur because the joins used in the MV and in the query are different. In the MV, four tables— ACCOUNTS, BALANCES, ACC_TYPES , and ACC_SUB_TYPES —are joined. However, the new query joins only three tables, leaving SUB_ACC_TYPES , which is reported in the MESSAGE column. The absence of this join would have meant inaccurate results if QR were used, so the optimizer decided not to use it.
Rewriting the query in Listing 5 to include another predicate, AND SAT.ACC_SUB_TYPE = A.ACC_SUB_TYPE , and place the table ACC_SUB_TYPE SAT in the FROM clause should make QR possible. This revised query is in Listing 7. After the DBA passes the revised query to the procedure in Listing 6, examining the REWRITE_TABLE provides the good news:
SQL> SELECT MESSAGE FROM REWRITE_TABLE; QSM-01033: query rewritten with materialized view, ACC_MGR_VIEW
Code Listing 7: Rewrite of Listing 5, confirming query rewrite
SQL> set autotrace traceonly explain SQL> select acc_type_desc, 2 sum(cleared_bal) 3 from balances b, 4 accounts a, 5 acc_types at, 6 sub_acc_types sat, 7 where a.acc_no = b.acc_no 8 and at.acc_type = a.acc_type 9 and sat.sub_acc_type = a.sub_acc_type 10 group by acc_type_desc SQL> / Execution Plan ---------------------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=6 Card=4 Bytes=64) 1 0 SORT (GROUP BY) (Cost=6 Card=4 Bytes=64) 2 1 MAT_VIEW REWRITE ACCESS (FULL) OF 'ACC_MGR_VIEW'
Running the revised query in Listing 7 confirms that the optimizer rewrites the query and generates the execution plan, also included in Listing 7. This plan shows that the ACC_MGR_VIEW MV is used instead of the base tables, even though the user specified them in the query. QR is indeed happening.
Rem SQL Access Advisor: Version 10.1.0.1 - Production Rem Rem Username: ARUP Rem Task: MV_Tune_Task1 Rem Execution date: Rem set feedback 1 set linesize 80 set trimspool on set tab off set pagesize 60 whenever sqlerror CONTINUE CREATE MATERIALIZED VIEW LOG ON "ARUP"."ACCOUNTS" WITH ROWID, SEQUENCE("ACC_NO","ACC_TYPE","SUB_ACC_TYPE","ACC_MGR_ID") INCLUDING NEW VALUES; ALTER MATERIALIZED VIEW LOG FORCE ON "ARUP"."ACCOUNTS" ADD ROWID, SEQUENCE("ACC_NO","ACC_TYPE","SUB_ACC_TYPE","ACC_MGR_ID") INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW LOG ON "ARUP"."BALANCES" WITH ROWID, SEQUENCE("ACC_NO","CLEARED_BAL","UNCLEARED_BAL") INCLUDING NEW VALUES; ALTER MATERIALIZED VIEW LOG FORCE ON "ARUP"."BALANCES" ADD ROWID, SEQUENCE("ACC_NO","CLEARED_BAL","UNCLEARED_BAL") INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW LOG ON "ARUP"."ACC_TYPES" WITH ROWID, SEQUENCE("ACC_TYPE","ACC_TYPE_DESC") INCLUDING NEW VALUES; ALTER MATERIALIZED VIEW LOG FORCE ON "ARUP"."ACC_TYPES" ADD ROWID, SEQUENCE("ACC_TYPE","ACC_TYPE_DESC") INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW LOG ON "ARUP"."SUB_ACC_TYPES" WITH ROWID, SEQUENCE("SUB_ACC_TYPE","SUB_ACC_TYPE_DESC") INCLUDING NEW VALUES; ALTER MATERIALIZED VIEW LOG FORCE ON "ARUP"."SUB_ACC_TYPES" ADD ROWID, SEQUENCE("SUB_ACC_TYPE","SUB_ACC_TYPE_DESC") INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW ARUP.ACC_MGR_VIEW REFRESH FAST WITH ROWID ENABLE QUERY REWRITE AS SELECT ARUP.SUB_ACC_TYPES.SUB_ACC_TYPE_DESC C1, ARUP.ACC_TYPES.ACC_TYPE_DESC C2, ARUP.ACCOUNTS.ACC_MGR_ID C3, ARUP.ACCOUNTS.SUB_ACC_TYPE C4, SUM("ARUP"."BALANCES"."CLEARED_BAL") M1, COUNT("ARUP"."BALANCES"."CLEARED_BAL") M2, SUM("ARUP"."BALANCES"."UNCLEARED_BAL") M3, COUNT("ARUP"."BALANCES"."UNCLEARED_BAL") M4, SUM(("ARUP"."BALANCES"."UNCLEARED_BAL" + "ARUP"."BALANCES"."CLEARED_BAL")) M5, COUNT(("ARUP"."BALANCES"."UNCLEARED_BAL" + "ARUP"."BALANCES"."CLEARED_BAL")) M6, COUNT(*) M7 FROM ARUP.SUB_ACC_TYPES, ARUP.ACC_TYPES, ARUP.BALANCES, ARUP.ACCOUNTS WHERE ARUP.ACCOUNTS.SUB_ACC_TYPE = ARUP.SUB_ACC_TYPES.SUB_ACC_TYPE AND ARUP.ACCOUNTS.ACC_TYPE = ARUP.ACC_TYPES.ACC_TYPE AND ARUP.ACCOUNTS.ACC_NO = ARUP.BALANCES.ACC_NO GROUP BY ARUP.SUB_ACC_TYPES.SUB_ACC_TYPE_DESC, ARUP.ACC_TYPES.ACC_TYPE_DESC, ARUP.ACCOUNTS.ACC_MGR_ID, ARUP.ACCOUNTS.SUB_ACC_TYPE; whenever sqlerror EXIT SQL.SQLCODE begin dbms_advisor.mark_recommendation('MV_Tune_Task1',1,'IMPLEMENTED'); end; /
Preventing Runaway Queries
As demonstrated in Listing 5, if the user writes the query a little differently, the optimizer does not rewrite the query to select from the MV and the query selects from the base tables directly. The user may think that the query is rewritten and that the results are coming from the MV, but in fact, the query is not rewritten and the results are coming from the source tables, extending the response time significantly.
The REWRITE_OR_ERROR hint in the SQL query will make sure the query is rewritten to select from the MV or at least inform the user if that does not happen. The following adds the REWRITE_OR_ERROR hint to the SQL query in which QR did not occur (Listing 5):
SELECT /*+ REWRITE_OR_ERROR */ SELECT acc_type_desc, sub_acc_type_desc, ...
The query returns the following error message if it does not rewrite:
ORA-30393: a query block in the statement did not rewrite
Oracle provides tools for creating, managing, and tuning MVs in the DBMS_MVIEW package. Acme Bank is seeing increased query performance, using QR whenever possible and using error messages from queries that do not rewrite to further optimize its use of MVs and QR.
Arup Nanda (firstname.lastname@example.org) is the manager of Database Systems at Starwood Hotels and Resorts in White Plains, New York. He is the recipient of Oracle Magazine's 2003 DBA of the Year award and the coauthor of Oracle Privacy Security Auditing, from Rampant Press (rampant-books.com), 2003.