As Published In
Oracle Magazine
July/August 2014

TECHNOLOGY: Ask Tom

  

On Table Updates and SQL Plan Baselines

By Tom Kyte Oracle Employee ACE

 

Our technologist updates tables quickly via CTAS and optimizes queries for evolution.

Due to an application upgrade, we have to change all the records in one of our tables, based on new data supplied in a temporary table. (That table exists just for the upgrade; it is not a global temporary table.) What is the best approach for accomplishing this? The table has millions of records, and we need to complete the upgrade as quickly as possible.

This is a frequently asked question with an answer that might surprise you. The best way to update this table is not to update it but, rather, to re-create it! Because this is part of an application upgrade and the application is scheduled to be offline during the upgrade, re-creating the table will almost certainly be the most efficient way to accomplish this.

Oracle Learning Library

 

I’d like to remind everyone out there about a great resource on Oracle Technology Network: the Oracle Learning Library, at oracle.com/goto/oll. Here you will find tons of tutorials for new features and old alike. One of the nooks in the Oracle Learning Library I recommend is a video series recorded by Andrew Holdsworth that covers much of the material we cover in the Oracle Real-World Performance seminars. See bit.ly/1q6jDAN for a direct link to the Oracle Real-World Performance material on the Oracle Learning Library site, or search for Real-World Performance on the Oracle Learning Library home page.

 

Rather than updating every row, which would necessarily generate UNDO for every modification and generate REDO for the table modification as well as more UNDO, simply use CREATE TABLE NEW_TABLE AS SELECT (CTAS). Using this approach, you can definitely skip all generated UNDO, and if you choose, all REDO as well. Additionally, instead of trying to maintain any and all indexes on these modified columns during the application update, simply create fresh indexes after the CREATE TABLE process is complete. If you were to update every row, the index on these modified columns would likely be at least twice as large as it needs to be, so you would almost certainly want to rebuild these indexes anyway.

So, your update steps are as follows:
  1. CREATE TABLE NEW_TABLE AS SELECT <join of old_table with the data to be updated>. This may involve an outer join if most, but not all, of the rows are to be updated. (There might not be a row in the stage table for every row in the existing table.)
  2. DROP TABLE OLD_TABLE. Get rid of the existing table with the old data.
  3. ALTER TABLE NEW_TABLE RENAME TO OLD_TABLE.
  4. CREATE INDEX. Re-create the indexes.
  5. GRANT the appropriate privileges on the table.
If you are in archivelog mode during the application upgrade, consider using the NOLOGGING clause to prevent REDO from being generated. (But make sure to disable NOLOGGING and back up the database after the upgrade.) If you are in noarchivelog mode, the CREATE TABLE NEW_TABLE AS SELECT and CREATE INDEX commands will not generate any REDO. (And remember, they never create UNDO for the CREATE TABLE or CREATE INDEX commands.) You’ll not only find this CREATE TABLE AS SELECT approach—using data definition language (DDL) instead of data manipulation language (DML)—to be faster for large objects but, in addition, the resulting segments will not be in need of a reorganization, as they probably would be after a MERGE/UPDATE operation.


To see this in action against a large data set, I recommend that you look at the four-part video series at bit.ly/1fXdD8L, bit.ly/1idIpdK, bit.ly/1mO3dwL, and bit.ly/1lIg7yO. These videos show the loading, validating, and transforming (you are interested in the transforming portion most of all) of a large set of data with DDL and compare that approach with trying to use DML to do the same.

If You Can Hint It, You Can Baseline It

I recently had a problem with two SQL statements performing semantically equivalent queries but with very different runtimes. I think I narrowed the problem down to a simple case where the optimizer is wildly off in cardinality estimates. The statistics are up to date and represent the actual data.

When I run the query with a subquery (returning a single constant value) in the WHERE clause, the query is excruciatingly slow—about 350 seconds. If I evaluate the subquery by hand (which takes some milliseconds) and pass the return value as a constant to the query, the query will run in some milliseconds—with a completely different plan.

The subquery is a simple MAX(...) query, so why can’t the optimizer use the same plan or predict that a single value will be returned and optimize the query?

[Editor's note: Listing 1 shows the example table, a slow query and plan, and a faster query and plan.]

What is the best way to achieve this faster query plan without having to rewrite the original query?

Code Listing 1: Example table, slow query and plan, and faster query and plan

Here is the setup for the example table:

CREATE TABLE t AS SELECT * FROM all_objects;
ALTER TABLE t ADD CONSTRAINT t_id_pk PRIMARY KEY (object_id);
CREATE INDEX t_idx_id ON t(object_id);
CREATE INDEX t_idx_type ON t(object_type);
exec DBMS_STATS.GATHER_TABLE_STATS(user, 'T' );

 

Here is the slow query I would like to tune, along with its plan:

 

SQL> SELECT *
  2    FROM t t1
  3   WHERE t1.object_type = 'TABLE'
  4     AND t1.object_id > (SELECT MAX(t2.object_id) - 500000 FROM t t2);
6115 rows selected.

Execution Plan
———————————————————————————————————————————————————————————————————————
| Id  | Operation                        | Name       | Rows  | Bytes |
———————————————————————————————————————————————————————————————————————
|   0 | SELECT STATEMENT                 |            |   413 | 39235 |
|   1 |  TABLE ACCESS BY INDEX ROWID     | T          |   413 | 39235 |
|   2 |   BITMAP CONVERSION TO ROWIDS    |            |       |       |
|   3 |    BITMAP AND                    |            |       |       |
|   4 |     BITMAP CONVERSION FROM ROWIDS|            |       |       |
|   5 |      SORT ORDER BY               |            |       |       |
|*  6 |       INDEX RANGE SCAN           | T_IDX_ID   |  8252 |       |
|   7 |        SORT AGGREGATE            |            |     1 |     5 |
|   8 |         INDEX FULL SCAN (MIN/MAX)| T_IDX_ID   |     1 |     5 |
|   9 |     BITMAP CONVERSION FROM ROWIDS|            |       |       |
|* 10 |      INDEX RANGE SCAN            | T_IDX_TYPE |  8252 |       |
———————————————————————————————————————————————————————————————————————

Predicate Information (identified by operation id):
———————————————————————————————————————————————————————————————————————

   6 - access("T1"."OBJECT_ID"> (SELECT MAX("T2"."OBJECT_ID")-500000 FROM
              "T" "T2"))
       filter("T1"."OBJECT_ID"> (SELECT MAX("T2"."OBJECT_ID")-500000 FROM
              "T" "T2"))
  10 - access("T1"."OBJECT_TYPE"='TABLE')

 
Here is the query and the faster plan I would like it to use:
 


SQL> SELECT MAX(t2.object_id) - 500000 FROM t t2;

MAX(T2.OBJECT_ID)-500000
————————————————————————————————————
                   19975

SQL> SELECT *
  2    FROM t t1
  3   WHERE t1.object_type = 'TABLE' AND t1.object_id > 19975;
6115 rows selected.

Execution Plan
—————————————————————————————————————————————————————————————————
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)|
———————————————————————————————————————————————————————————————————————————————
|   0 | SELECT STATEMENT            |            |  7198 |   667K|   102   (2)|
|*  1 |  TABLE ACCESS BY INDEX ROWID| T          |  7198 |   667K|   102   (2)|
|*  2 |   INDEX RANGE SCAN          | T_IDX_TYPE |  8553 |       |     6   (0)|
———————————————————————————————————————————————————————————————————————————————

Predicate Information (identified by operation id):
————————————————————————————————————————————————————————————————————————————

   1 - filter("T1"."OBJECT_ID">19975)
   2 – access("T1"."OBJECT_TYPE"='TABLE')


Although these two queries are semantically equivalent, they present very different challenges to the optimizer. In the first case, the optimizer will have to “guess” at the value returned by the subquery, because it has not yet executed the subquery. That is, the query

SQL> SELECT *
  2    FROM t t1
  3   WHERE t1.object_type = 'TABLE'
  4     AND t1.object_id > (SELECT
  5         MAX(t2.object_id) - 500000
  6         FROM t t2);


is very much like

SQL> SELECT *
  2    FROM t t1
  3   WHERE t1.object_type = 'TABLE'
  4     AND t1.object_id > ???;


The optimizer cannot see what the constraint against OBJECT_ID will be, and it cannot see what value will ultimately be plugged in there when the query executes. Therefore, the optimizer will guess at the estimated cardinality.

However, when you query with the construct

SQL> SELECT *
  2    FROM t t1
  3   WHERE t1.object_type = 'TABLE'
  4     AND t1.object_id > 19975;


the optimizer has a lot more information to work with. It knows approximately how many rows will come back, due to the predicate with constant values. So what you are dealing with is a query in which the optimizer will have a really hard time obtaining correct cardinality estimates at hard parse time and therefore stands a good chance of not selecting the optimal plan—at least for the first execution. The question is, How can you fix this without rewriting the code to use two queries and, optimally, without using any hints in the production code? The answer is to temporarily use hints to get the plan you desire but then use the SQL Plan Management feature of Oracle Database and query plan baselines to implement the fix, removing the need for hints in the production environment and ensuring that you get the plan you want, unless and until the optimizer discovers a better plan.

The reason you do not want hints in your production code is that it is very difficult to correctly hint a query. To properly hint a query, you must fully, verbosely hint it. You might discover in your testing environment that using hints A, B, and C achieves the plan you want but that in production, after statistics are gathered and the data has changed, or after a database upgrade, hints A, B, and C may no longer do the job. Maybe you didn’t know you also needed hint D to ensure that the optimizer would choose the plan you wanted, even though the query was working well “by accident” before. Or, maybe after a database upgrade, the optimizer introduces a new query transformation/rewrite that invalidates your hint.

I’ll demonstrate the shortest list of hints you would need to ensure that the above query uses the plan you want, and you’ll see that this list of hints is much longer than any list of hints you would have coded yourself.

Another reason to avoid putting hints into the query and to use SQL Plan Management is that you enable the optimizer to evolve your query plan over time. That is, you won’t freeze your query plan, and you’ll have the opportunity to use a better plan in the future. If you fully hinted the query, you’d be stuck with whatever plan you had today. See “Baselines and Better Plans” at for a discussion of this query plan evolution.

Now let’s see how to get the plan you want and then how to use SQL Plan Management to put that plan in place and have it be used transparently by your query. In a testing environment, I discovered that by using either the FIRST_ROWS(1) or the OPT_PARAM( ‘_b_tree_bitmap_plans’, ‘FALSE’ ) hint to disable the use of B-tree bitmap plans for this SQL statement, I get the plan I want, as shown in Listing 2.

Code Listing 2: Getting a plan by using hints

SQL> SELECT /*+ first_rows(1) */ *
  2    FROM t t1
  3   WHERE t1.object_type = 'TABLE'
  4     AND t1.object_id > (SELECT MAX(t2.object_id) - 500000 FROM t t2);

Execution Plan
—————————————————————————————————————————————————————————————————————————————
Plan hash value: 1289158178

——————————————————————————————————————————————————————————————————————————————
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)
——————————————————————————————————————————————————————————————————————————————
|   0 | SELECT STATEMENT            |            |     1 |    97 |     4   (0)
|*  1 |  TABLE ACCESS BY INDEX ROWID| T          |     1 |    97 |     2   (0)
|*  2 |   INDEX RANGE SCAN          | T_IDX_TYPE |  1871 |       |     1   (0)
|   3 |   SORT AGGREGATE            |            |     1 |     5 |
|   4 |    INDEX FULL SCAN (MIN/MAX)| T_ID_PK    |     1 |     5 |     2   (0)
——————————————————————————————————————————————————————————————————————————————

SQL> select /*+ opt_param( '_b_tree_bitmap_plans', 'FALSE' ) */ *
  2    FROM t t1
  3   WHERE t1.object_type = 'TABLE'
  4     AND t1.object_id > (SELECT MAX(t2.object_id) - 500000 FROM t t2);

Execution Plan
—————————————————————————————————————————————————————————————————————————————
Plan hash value: 1289158178

——————————————————————————————————————————————————————————————————————————————
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)
——————————————————————————————————————————————————————————————————————————————
|   0 | SELECT STATEMENT            |            |    94 |  9118 |    85   (0)
|*  1 |  TABLE ACCESS BY INDEX ROWID| T          |    94 |  9118 |    83   (0)
|*  2 |   INDEX RANGE SCAN          | T_IDX_TYPE |  1871 |       |     6   (0)
|   3 |   SORT AGGREGATE            |            |     1 |     5 |
|   4 |    INDEX FULL SCAN (MIN/MAX)| T_ID_PK    |     1 |     5 |     2   (0)


Now, if I wanted to ensure that the plan in Listing 2 is, in fact, the plan used by default, I would need more than just a FIRST_ROWS(1) or OPT_PARAM hint. In fact, when I use DBMS_XPLAN to display not only the plans but also the full set of hints necessary to reproduce this plan reliably, I see the plan and the outline data in Listing 3.

Code Listing 3: Determining plan and outline info needed to make a better hinted plan

SQL> select *
  2    from table(dbms_xplan.display_cursor (sql_id=>'4rg4kcxr83kup',
  3                cursor_child_no => 0, format=>'+outline'));

PLAN_TABLE_OUTPUT
——————————————————————————————————————————
SQL_ID  4rg4kcxr83kup, child number 0
———————————————————————————————————————————————————————
select /*+ opt_param( '_b_tree_bitmap_plans', 'FALSE' )
gather_plan_statistics */ *   FROM t t1  WHERE t1.object_type = 'TABLE'
   AND t1.object_id > (SELECT MAX(t2.object_id) - 500000 FROM t t2)

Plan hash value: 1289158178

——————————————————————————————————————————————————————————————————————————————————————
| Id  | Operation                   | Name      | Rows | Bytes| Cost (%CPU)| Time    |
——————————————————————————————————————————————————————————————————————————————————————
|   0 | SELECT STATEMENT            |           |      |      |    83 (100)|         |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T         |    91|  8827|    81   (0)| 00:00:01|
|*  2 |   INDEX RANGE SCAN          | T_IDX_TYPE|  1823|      |     6   (0)| 00:00:01|
|   3 |   SORT AGGREGATE            |           |     1|     5|            |         |
|   4 |    INDEX FULL SCAN (MIN/MAX)| T_ID_PK   |     1|     5|     2   (0)| 00:00:01|
——————————————————————————————————————————————————————————————————————————————————————

Outline Data
———————————————————

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
      DB_VERSION('11.2.0.3')
      OPT_PARAM('_b_tree_bitmap_plans' 'false')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T"."OBJECT_TYPE"))
      PUSH_SUBQ(@"SEL$2")
      INDEX(@"SEL$2" "T2"@"SEL$2" ("T"."OBJECT_ID"))
      END_OUTLINE_DATA
  */

Next Steps


 ASK Tom
Tom Kyte answers your most difficult technology questions. Highlights from that forum appear in this column.

 READ more Tom

 DOWNLOAD Oracle Database 12c 

LEARN more about
 Oracle Database 12c
 Baselines
Baselines and Better Plans

 FOLLOW Tom on Twitter

FOLLOW Oracle Database
 on Twitter
 on Facebook

As you can see in Listing 3, the full list of hints in the outline data that you need is quite extensive—far beyond just the OPT_PARAM hint. I can assure you that properly hinting a query to repeatedly get the same plan is much harder than it looks. Also, I don’t really want to freeze the plan forever, because future releases of Oracle Database will likely be able to rewrite the query or perform some optimization that just doesn’t exist yet, resulting in an even better plan. I’d like to be able to take advantage of future features, but if I’ve hinted the query, I won’t be able to.

So, what would the steps be to get the optimizer to choose the desired plan until a provably better plan comes along?

First, I would capture a SQL plan baseline for the existing query—the query with the “bad” plan. Why would I want to capture the bad plan? To establish a SQL plan baseline for this query (into which I will load the desired plan), I need the plan baseline to exist before I can load the plan into it. I execute the code in Listing 4 to create the baseline.

Code Listing 4: Creating the baseline

SQL> begin
  2      dbms_output.put_line(
  3      dbms_spm.load_plans_from_cursor_cache
  4         ( sql_id => 'crvk9z6mx9n4d' )
  5      );
  6  end;
  7  /
1

SQL> select sql_handle,
  2         substr(sql_text,1,10)||'...'||
  3               substr(sql_text,length(sql_text)-10) stext,
  4         plan_name, enabled
  5    from dba_sql_plan_baselines
  6   where sql_text like
  7     'SELECT%FROM t t1%(SELECT MAX(t2.object_id) - 500000 FROM t t2)';

SQL_HANDLE           STEXT            PLAN_NAME                      ENA
———————————————————— ———————————————  —————————————————————————————— ———
SQL_e738c19a5191e8fd SELECT *         SQL_PLAN_fff61m98t3u7xda64b1bb YES
                      ... FROM t t2)

That is my baseline, and that plan will be used. But it isn’t what I want. The next step is to disable that plan:

SQL> begin
  2    dbms_output.put_line(
  3    dbms_spm.alter_sql_plan_baseline
  4    ( sql_handle => 
        'SQL_e738c19a5191e8fd',
  5      attribute_name => 'enabled',
  6      attribute_value => 'NO' )
  7    );
  8  end;
  9  /
1


The database will still use that plan, but only because it is the only game in town right now. So let’s create something better: the first query plan in Listing 2—the plan for the SELECT /*+ first_rows(1) */ * query.

That is the plan I want. All I need to do is load that plan in place of the bad one, as shown in Listing 5.

Code Listing 5: Loading the new—good—plan

SQL> set autotrace off
SQL> begin
2 dbms_output.put_line(
3 dbms_spm.load_plans_from_cursor_cache
4 ( sql_id => '5mn39tz7fpjnu',
5 plan_hash_value => 1289158178,
6 sql_handle => 'SQL_e738c19a5191e8fd' )
7 );
8 end;
9 /
1
SQL> select sql_handle,
2 substr(sql_text,1,10)||'...'||
3 substr(sql_text,length(sql_text)-10) stext,
4 plan_name, enabled
5 from dba_sql_plan_baselines
6 where sql_text like
7 'SELECT%FROM t t1%(SELECT MAX(t2.object_id) - 500000 FROM t t2)';
SQL_HANDLE           STEXT    PLAN_NAME                      ENA
———————————————————— ———————— —————————————————————————————— ———
SQL_e738c19a5191e8fd SELECT * SQL_PLAN_fff61m98t3u7x971f1a3f YES
... FROM t t2)
SQL_e738c19a5191e8fd SELECT * SQL_PLAN_fff61m98t3u7xda64b1bb NO
... FROM t t2)


There are now two plans loaded for that SQL statement: one—the bad plan—is disabled; the other—the good plan (even though it is for a “different query,” it can work with this query)—is enabled. When I run the query again, as shown in Listing 6, I see that it used the query plan baseline for my good plan.

Code Listing 6: Confirming use of the good plan

SQL> set autotrace traceonly
SQL> SELECT *
2 FROM t t1
3 WHERE t1.object_type = 'TABLE'
4 AND t1.object_id > (SELECT MAX(t2.object_id) - 500000 FROM t t2);
Execution Plan
———————————————————————————————————————————————————
Plan hash value: 1289158178
—————————————————————————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)
—————————————————————————————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 97 | 4 (0)
|* 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 97 | 2 (0)
|* 2 | INDEX RANGE SCAN | T_IDX_TYPE | 1871 | | 1 (0)
| 3 | SORT AGGREGATE | | 1 | 5 |
| 4 | INDEX FULL SCAN (MIN/MAX)| T_ID_PK | 1 | 5 | 2 (0)
—————————————————————————————————————————————————————————
Predicate Information (identified by operation id):
———————————————————————————————————————————————————————————————————————————
1 - filter("T1"."OBJECT_ID"> (SELECT /*+ PUSH_SUBQ INDEX ("T2" "T_ID_PK") *
MAX("T2"."OBJECT_ID")-500000 FROM "T" "T2"))
2 - access("T1"."OBJECT_TYPE"='TABLE')
Note
———————
- SQL plan baseline "SQL_PLAN_fff61m98t3u7x971f1a3f" used for this statement


With this baseline solution, there is no permanent hinting—and permanent hinting is much harder to set up and use than you think it is! Many hinted queries out there in production today may be ticking time bombs.

And there are no upgrade issues with this baseline solution. (You may have to completely change all your hints after a database upgrade as you add new query rewrites, access paths, and so on. With this baseline solution, you won’t.)

And, finally, if a better plan comes along, you’ll be able to use it if you are using query plan evolution.


Tom Kyte Headshot


Tom Kyte is a database evangelist in Oracle’s Server Technologies division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture (Apress, 2005, 2010) and Effective Oracle by Design (Oracle Press, 2003), among other books.

 




Send us your comments