Our technologist improves partitions, adapts query plans, and optimizes statistics gathering.
Usually I take three or four user-submitted questions from the past two months and answer them here in each Ask Tom column. The last two and next two columns, however, take a look at some key Oracle Database 12c features. These features are all part of the “12 Things About Oracle Database 12c” presentation I gave at Oracle OpenWorld 2012 in San Francisco. (You can find the slides for that presentation on asktom.oracle.com on the Files tab). The first three Oracle Database 12c features I looked at in the initial article were improved defaults, bigger datatypes, and top-n queries. In the last issue, I discussed a new row-pattern-matching clause and how undo for temporary tables has changed in Oracle Database 12c. In this issue, I cover some partitioning improvements, adaptive execution plans, and enhanced statistics.
Partitioning was first introduced in Oracle8 Database, back in 1997. In the last 16 years, many new partitioning capabilities have been added, and Oracle Database 12c adds quite a few more. I’ll take a look at five new Oracle Database 12c partitioning capabilities here. If you are unfamiliar with partitioning and the terms associated with it, I suggest that you first read “Partition Decisions,” by Arup Nanda. That article provides a great overview of partitioning for those who do not work with it day to day.
The first new Oracle Database 12c partitioning capability is asynchronous global index maintenance. A global index is an index on a partitioned table that is partitioned with different rules than the table it is defined on. For example, suppose you have an ORDERS table that is range-partitioned by a date column, ORDER_DATE. You might want an index on a column in that ORDERS table, such as CUSTOMER_NAME, and the CUSTOMER_NAME index would likely be a global index, either on a single partition or partitioned by range or hash on CUSTOMER_NAME. If you dropped the oldest ORDERS table partition in Oracle Database 11g Release 2 and before, the global index either became immediately unusable (and required a rebuild before it could be used) or had to be maintained during execution of the DROP partition command. But if you maintained the index, execution of the DROP partition command could take hours instead of completing immediately.
In Oracle Database 12c, you get the best of both worlds: the DROP partition command executes immediately—the index will not be maintained during the DROP operation (the index entries pointing to the dropped partition will remain in the global index) but will remain usable. After the DROP partition command finishes executing, all scans of the global index will ignore the entries that point to the dropped (or truncated) partition. Later on, as a separate transaction, the index can be cleaned up, the deleted “orphaned” index entries can be removed, and the space can be reclaimed. This process can either be manually invoked by the DBA or can be automated to run as a job during a normal maintenance window. In any case, the dropping or truncation of a partition completes immediately, the index is always available for use, and the maintenance of the index is deferred.
Another new partitioning capability is the use of reference partitioning on tables that employ interval partitioning. Both reference and interval partitioning were introduced in Oracle Database 11g (see “More Partitioning Choices,” by Arup Nanda, for a description of interval and reference partitioning), but you could not use them simultaneously. Oracle Database 12c removes that restriction.
For example, let’s say you have an ORDERS table such as
SQL> create table orders 2 ( 3 order# number primary key, 4 order_date date, 5 data varchar2(3) 6 ) 7 enable row movement 8 partition by range(order_date) 9 interval (numtodsinterval(1,'day')) 10 (partition p0 values less than 11 (to_date('01-jan-2013', 'dd-mon-yyyy')) 12 ) 13 / Table created.
In Oracle Database 12c, you can now create a child table (LINE_ITEMS) that references the partitioning scheme of the parent table (ORDERS) successfully:
SQL> create table line_items 2 ( order# number not null, 3 line# number, 4 data varchar2(3), 5 constraint c1_pk primary key(order#,line#), 6 constraint c1_fk_p foreign key(order#) references orders 7 on delete cascade 8 ) 9 enable row movement 10 partition by reference(c1_fk_p) 11 / Table created.
The parent/child connection leads me to another new partitioning capability: the ability to do CASCADE DROP and TRUNCATE operations on parent/child tables.
In the past, if you truncated a partition in (or droppred it from) the ORDERS table, you had to truncate the corresponding partition in (or drop it from) the child table first, meaning that you had to truncate/drop the table partitions of all child tables and work your way up the chain of referential integrity constraints, all the way up to the parent. This involved multiple independent SQL statements, and each statement committed itself. That made the truncation/dropping of partitions somewhat error-prone in a parent/child hierarchy—and unnecessarily hard. (It also introduced short periods of logical inconsistency.)
In Oracle Database 12c, you can accomplish a CASCADE DROP or TRUNCATE in a single command. For example, I add some data to the ORDERS and LINE_ITEMS tables and look up the parent table’s partition:
SQL> insert into orders(order#,order_date,data) 2 values ( 1, to_date( '15-mar-2013' ), 'xxx' ); 1 row created. SQL> insert into line_items(order#,line#,data) 2 values ( 1, 1, 'yyy' ); 1 row created. SQL> commit; Commit complete. SQL> select partition_name 2 from user_tab_partitions 3 where table_name = 'ORDERS' 4 and partition_name like 'S%'; PARTITION_NAME ———————————————————— SYS_P853
You can see that there is a link between the parent and child partitions when I attempt to truncate the parent table partition:
SQL> alter table orders truncate partition SYS_P853; alter table orders truncate partition SYS_P853 * ERROR at line 1: ORA-02266: unique/primary keys in table referenced by enabled foreign keys
The truncation fails, because there is some child data referencing the data that is about to be truncated. If I apply the new CASCADE option—
SQL> alter table orders truncate partition SYS_P853 cascade; Table truncated.
—the truncate will succeed and, in fact, will truncate not only the parent but the child partition as well. (Note that to ensure that data is not removed unintentionally, the cascading functionality requires the foreign key constraint to be defined as ON DELETE CASCADE.) A similar option exists for exchanging partitions. You can now exchange partitions in a full parent/child hierarchy of table partitions in a single command. For example, I load some staging tables with data, as shown in Listing 1.
Code Listing 1: Loading staging tables in preparation for partition exchange
SQL> create table orders_tmp 2 ( 3 order# number primary key, 4 order_date date, 5 data varchar2(3) 6 ); Table created. SQL> create table line_items_tmp 2 ( order# number not null, 3 line# number, 4 data varchar2(3), 5 constraint c1_tmp_pk primary key(order#,line#), 6 constraint c1_tmp_fk_p foreign key(order#) references orders_tmp 7 on delete cascade 8 ); Table created. SQL> insert into orders_tmp (order#,order_date,data) 2 values (100,to_date(‘31-dec-2012','dd-mon-yyyy'),'abc'); 1 row created. SQL> insert into line_items_tmp (order#,line#,data) 2 values (100,1,'def'); 1 row created. SQL> commit; Commit complete.
I can now swap the ORDERS and ORDERS_TMP data and LINE_ITEMS and LINE_ITEMS_TMP data in a single command, using the new CASCADE option:
SQL> alter table orders 2 exchange partition p0 3 with table orders_tmp 4 cascade; Table altered. SQL> select * from orders; ORDER# ORDER_DAT DAT ————————— ————————— ———— 100 31-DEC-12 abc SQL> select * from line_items; ORDER# LINE# DAT ————————— ————————— ———— 100 1 def
So, instead of having to use multiple independent data definition language (DDL) statements, you can now exchange partitions in a single atomic DDL statement.
Next in the area of partitioning improvements in Oracle Database 12c is the ability to perform multiple partition operations in a single DDL statement. This enables you to split a partition into multiple partitions or merge many partitions into a single partition in a single DDL statement. Previously, if you wanted to split a single partition into four partitions, you had to execute three split statements. That involved a lot of reading and rereading, writing, and rewriting of data. Now those multiple partition operations can be achieved in a single atomic DDL statement.
Last in the area of partitioning improvements is the ability to move a partition in an online fashion with the new ALTER TABLE MOVE PARTITION command. This command also transparently maintains global indexes, providing 100 percent data availability without any downtime. For detailed information on these new capabilities, see Oracle Database VLDB and Partitioning Guide 12c Release 1 (12.1).
Adaptive query plans in Oracle Database 12c are plans that can “change their mind” while they are executing. For example, the optimizer might believe that using a nested loops join is the best way to join two tables, based on the number of rows it estimates the tables will produce. But runtime reveals that the driving table chosen by the optimizer returns many more rows than anticipated. In Oracle Database 12c, the query plan can now adapt, change, and switch from a nested loops join to a hash join.
Oracle’s Maria Colgan, known to some as the “optimizer lady,” wrote up a nice post on the Oracle Optimizer blog—“What’s New in 12c: Adaptive Joins.” The “Adaptive Joins” sidebar presented here is an excerpt from that post.
For more information on adaptive query plans, you can watch a video that describes and then demonstrates it. You can also read about it in Oracle Database SQL Tuning Guide 12c Release 1 (12.1).
There are many enhancements to statistics in Oracle Database 12c. Two I’ll be talking about here are statistics generation during data loads (online statistics gathering) and session private statistics for global temporary tables.
Online statistics gathering. Since Oracle Database 10g, statistics have been automatically generated on indexes whenever you create or rebuild them. (There is no need to gather index statistics in your reporting/warehouse database if you rebuild or create indexes immediately after a data load!) You can easily observe this behavior in the following example (the STAGE table is just a copy of the ALL_OBJECTS view):
SQL> create table t 2 as 3 select * 4 from stage 5 where 1=0; Table created. SQL> create index t_idx on t(object_id); Index created. SQL> alter index t_idx unusable; Index altered. SQL> insert /*+ append */ into t 2 select * 3 from stage; 87814 rows created. SQL> alter index t_idx rebuild; Index altered. SQL> select num_rows, last_analyzed 2 from user_indexes 3 where index_name = 'T_IDX'; NUM_ROWS LAST_ANALYZED ————————— ——————————————————— 87814 26-SEP-13
So the query against USER_INDEXES shows that the index has valid statistics. What is new in Oracle Database 12c is that statistics will be automatically gathered on the table as well, if the segment being loaded into is empty (just created or truncated) to begin with. In this example, my table was empty during my data load, so the table itself will now have basic statistics gathered:
SQL> select num_rows, last_analyzed 2 from user_tables 3 where table_name = 'T'; NUM_ROWS LAST_ANALYZED —————————— ——————————————————— 87814 26-SEP-13
Furthermore, if I look at the generated query plan for the load, I see a new step in Oracle Database 12c:
Row Source Operation ————————————————————————————————————— LOAD AS SELECT … OPTIMIZER STATISTICS GATHERING … TABLE ACCESS FULL STAGE …
The new step—OPTIMIZER STATISTICS GATHERING—shows that statistics were generated during this load. I have all the basic statistics on this table, but I won’t have nondefault statistics such as histograms. This is because the creation of a histogram requires additional scans of the data and online statistics gathering was implemented to have minimal impact, if any, on the data load. Fortunately, I can easily gather just the histograms without having to regenerate the basic statistics I’ve already generated. I’ll start by priming the column usage information. (See the “Why Does My Plan Change?” section of “On Joins and Query Plans” for details on what column usage information is.) I can do this by running a few representative queries that reference columns in the predicate I anticipate using in my real queries. Here’s one example:
SQL> select count(*) 2 from t 3 where owner = 'SCOTT'; COUNT(*) —————————————— 11
Now the database knows that I use the OWNER column in WHERE clauses, so the next time it gathers statistics by using the default METHOD_OPT setting, it will inspect the OWNER column to see if it is a candidate for new histogram generation. In this case, it will be, because the OWNER column contains skewed data values: the SYS user owns many objects, whereas SCOTT owns very few. I’ll start the process of gathering statistics by verifying that there are just basic column statistics on the OWNER column:
SQL> select count(*) 2 from user_tab_histograms 3 where table_name = 'T' 4 and column_name = 'OWNER'; COUNT(*) —————————————— 2
And then I’ll gather statistics by using the GATHER AUTO option. This will fill in only the missing bits of statistics and will not generate the statistics I already have:
SQL> begin 2 dbms_stats.gather_table_stats 3 ( user, 'T', 4 OPTIONS => 'GATHER AUTO' ); 5 end; 6 / PL/SQL procedure successfully completed.
And I can now verify that histograms have been generated for the OWNER column:
SQL> select count(*) 2 from user_tab_histograms 3 where table_name = 'T' 4 and column_name = 'OWNER'; COUNT(*) —————————————— 22
Now that you know about online statistics gathering for tables as well as indexes in Oracle Database 12c, you will want to review your statistics gathering methods as you upgrade. You might not have to gather statistics anymore on some tables!
FOLLOW Tom on Twitter
DOWNLOAD Oracle Database 12c
LEARN more about
For more details on online statistics gathering in Oracle Database 12c, you can watch the Oracle Database 12c: Enhanced Optimizer Statistics with Tom Kyte video.
Session private statistics for global temporary tables. The last new capability I want to discuss in this issue’s column is the ability to have statistics on global temporary tables that exist only for the session that generates them. Maintaining representative statistics on global temporary tables has been difficult, to say the least, because every session—even every transaction, in many cases—sees a completely different set of data from every other session or transaction, and getting statistics that convey to the optimizer what the data looks like is problematic. Every session and transaction is different.
Enter session private statistics for these temporary tables. By default, each session in Oracle Database 12c will have its own set of statistics on global temporary tables, enabling the optimizer to correctly estimate cardinalities and derive the correct plan.
To demonstrate this, I’ll generate a global temporary table that will subsequently be used in an “in” clause in a query. In this example, the application running against the database will put some number of rows into the global temporary table and then use that data in a WHERE clause to retrieve data from some other table. The issue that commonly arises is that sometimes a session will put in a few rows and some other session will put a large number of rows into the same global temporary table. I would like to have two different plans for these different cases, but if the optimizer sees only one set of statistics—one that represents only one or the other case above—I’ll get just one plan. This is the problem that session private statistics is trying to solve.
I start by creating a global temporary table and a table to query against:
SQL> create 2 global temporary 3 table gtt 4 ( x int ) 5 on commit delete rows 6 / Table created. SQL> create table t 2 as 3 select * 4 from stage; Table created. SQL> alter table t 2 add constraint 3 t_pk primary key 4 (object_id); Table altered.
Now I load some data into my global temporary table and see what statistics exist:
SQL> insert into gtt 2 select object_id 3 from t 4 where rownum <= 5; 5 rows created. SQL> select scope, num_rows 2 from user_tab_statistics 3 where table_name = 'GTT'; SCOPE NUM_ROWS ——————— —————————— SHARED
There are no statistics yet, so I gather them:
SQL> begin 2 dbms_stats.gather_table_stats 3 ( user, 'GTT' ); 4 end; 5 / PL/SQL procedure successfully completed. SQL> select scope, num_rows 2 from user_tab_statistics 3 where table_name = 'GTT'; SCOPE NUM_ROWS ——————— ————————————— SHARED SESSION 5
Now I can see that there are session statistics for this global temporary table. They are not “shared”; they are private to this session, and I can see that they accurately represent the data I loaded into the table.
I use my global temporary table in a query as shown in Listing 2, and the optimizer comes up with the best plan, based on valid, representative statistics.
Code Listing 2: Using global temporary table and session private statistics
SQL> select * 2 from t 3 where object_id in 4 (select x from gtt); ——————————————————————————————————————————————————————————————————————————————— | Id | Operation | Name | Rows| Bytes|Cost (%CPU)| Time | ——————————————————————————————————————————————————————————————————————————————— | 0 | SELECT STATEMENT | | 5| 550| 5 (0)| 00:00:01| | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 5| 550| 5 (0)| 00:00:01| | 3 | SORT UNIQUE | | 5| 15| 2 (0)| 00:00:01| | 4 | TABLE ACCESS FULL | GTT | 5| 15| 2 (0)| 00:00:01| |* 5 | INDEX UNIQUE SCAN | T_PK | 1| | 0 (0)| 00:00:01| | 6 | TABLE ACCESS BY INDEX ROWID| T | 1| 107| 1 (0)| 00:00:01| ——————————————————————————————————————————————————————————————————————————————— Predicate Information (identified by operation id): ————————————————————————————————————————————————————————————————————————————— 5 - access("OBJECT_ID"="X") Note ——————— - this is an adaptive plan - Global temporary table session private statistics used
I go into another session, load the global temporary table with entirely different data, run a query, and see the result in Listing 3.
Code Listing 3: In another session, loading different data and using another plan
SQL> connect / Connected. SQL> insert into gtt 2 select object_id 3 from stage; 87813 rows created. SQL> select * 2 from t 3 where object_id in 4 (select x from gtt); ——————————————————————————————————————————————————————————————————————————— | Id | Operation | Name | Rows | Bytes |Cost (%CPU)| Time | ——————————————————————————————————————————————————————————————————————————— | 0 | SELECT STATEMENT | | 87813 | 10M| 924 (1)| 00:00:01 | |* 1 | HASH JOIN | | 87813 | 10M| 924 (1)| 00:00:01 | | 2 | SORT UNIQUE | | 104K| 1331K| 40 (3)| 00:00:01 | | 3 | TABLE ACCESS FULL| GTT | 104K| 1331K| 40 (3)| 00:00:01 | | 4 | TABLE ACCESS FULL | T | 87813 | 9175K| 384 (1)| 00:00:01 | ——————————————————————————————————————————————————————————————————————————— Predicate Information (identified by operation id): ——————————————————————————————————————————————————————————————————————————— 1 - access("OBJECT_ID"="X") Note ——————— - dynamic statistics used: dynamic sampling (level=2) - this is an adaptive plan
In this new session, the optimizer found no statistics and hence used dynamic sampling to fill them in. Because there were many more than five rows in the temporary table this time, a different, better plan was chosen. This also demonstrated that the statistics from the first session “disappeared”—they were private to that session.
So, you’ve seen two methods—gathering statistics and dynamic sampling—that provide the optimizer with representative statistics. A third approach is to set the statistics directly. Suppose a developer just did an insert and knows how much data was loaded. That person could just give the optimizer this information. For example, I just loaded 300 rows into the global temporary table and set the statistics as shown in Listing 4. The optimizer uses the statistics I added to optimize the query, resulting in a different plan once again.
Code Listing 4: Setting statistics directly and seeing session private statistics used
SQL> begin 2 dbms_stats.set_table_stats 3 ( user, 'GTT', numrows => 300 ); 4 end; 5 / PL/SQL procedure successfully completed. SQL> set autotrace traceonly explain SQL> select * 2 from t 3 where object_id in 4 (select x from gtt); —————————————————————————————————————————————————————————————————————————————— | Id | Operation | Name| Rows| Bytes|Cost (%CPU)| Time | —————————————————————————————————————————————————————————————————————————————— | 0 | SELECT STATEMENT | | 9| 1080| 38 (0)| 00:00:01| | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 9| 1080| 38 (0)| 00:00:01| | 3 | SORT UNIQUE | | 300| 3900| 29 (0)| 00:00:01| | 4 | TABLE ACCESS FULL | GTT | 300| 3900| 29 (0)| 00:00:01| |* 5 | INDEX UNIQUE SCAN | T_PK| 1| | 0 (0)| 00:00:01| | 6 | TABLE ACCESS BY INDEX ROWID| T | 1| 107| 1 (0)| 00:00:01| —————————————————————————————————————————————————————————————————————————————— Predicate Information (identified by operation id): ————————————————————————————————————————————————————————————————————————————— 5 - access("OBJECT_ID"="X") Note ——————— - this is an adaptive plan - Global temporary table session private statistics used
For more information on all the new optimizer features and more, see Oracle Database New Features Guide 12c Release 1 (12.1).
Excerpted from “What’s New in 12c: Adaptive Joins,” by Maria Colgan. The goal of adaptive plans is to avoid catastrophic behavior of bad plans on first execution. If we detect during execution that the optimizer’s cardinality estimates were wrong, some plan choices can be changed on the fly to better options. Although we can’t completely change a plan during execution, there are certain local decisions, such as the join method, that can be changed. In this post, I’ll introduce the concepts and terminology related to adaptive plans and then go through an example in detail.
Concepts and Terminology
An adaptive plan enables certain decisions in a plan to be postponed until runtime, in case runtime conditions are different from optimizer assumptions. For the purposes of explaining the concepts, I will consider a plan for a simple two-table join, where the join method is adapted. The diagram below shows the two options for this plan.
An adaptive plan consists of a default plan, which is the plan that the optimizer picks, based on the current statistics, as well as alternatives to various portions of the default plan. In my example join, let’s assume that the nested loops plan is the default and that the alternative is the hash join. Each alternative portion of a plan is referred to as a subplan. A subplan is a set of related operations in a plan. In the diagram, the subplan on the left consists of the nested loops operation and the index scan; the alternative subplan on the right consists of the hash join and the table scan. For each decision that can be adapted, the plan contains two or more alternative subplans. During execution, one of those alternatives is chosen, in a process called adaptive plan resolution. Adaptive plan resolution occurs on the first execution of a plan; once the plan is resolved, future executions will use the same plan.
To resolve the plan, statistics are collected at various points during execution. The statistics collected during one part of execution are used to resolve parts of the plan that run later. For instance, statistics can be collected during the scan of table T1, and based on those statistics, we can choose the right join method for the join between T1 and T2. The statistics are collected with a “statistics collector.” Because the join of T1 to T2 and the scan of T1 would typically be pipelined, buffering is required in order to collect the statistics, resolve the choice of join method, and then perform the join. Some plan decisions can be adapted without row buffering, but adaptive joins require a buffering statistics collector.
The optimizer determines what statistics are to be collected and how the plan should be resolved for different values of the statistics. The optimizer computes an inflection point, which is the value of the statistics where the two plan choices are equally good. For instance, if the nested loops join is optimal when the scan of T1 produces fewer than 10 rows and the hash join is optimal when the scan of T1 produces more than 10 rows, the inflection point for these two plans is 10. The optimizer computes this value and configures a buffering statistics collector to buffer and count as many as 10 rows. If at least 10 rows are produced by the scan, the join method is resolved to the hash join; otherwise, it is resolved to the nested loops join.
The plan that is chosen by resolution is referred to as the final plan. The plan the optimizer expects to be chosen (based on its estimates) is the default plan. The physical plan, which actually contains all the operations from all the subplan options, is referred to as the full plan. As the plan is resolved, the plan hash value changes to indicate the new choice of plan. The plan displayed by our plan display APIs (in DBMS_XPLAN) changes as the plan is resolved. At any given point, some plan decisions may have been resolved whereas others have not. For the unresolved plan choices, the plan display APIs show the plan that is expected by the optimizer (based on its estimates).
When EXPLAIN PLAN generates a query plan, none of the adaptive subplans have been resolved, so we see the default plan when displaying the plan through DBMS_XPLAN.DISPLAY. For example, if the optimizer thinks the nested loops join plan is best, EXPLAIN PLAN and DBMS_XPLAN.DISPLAY will display the nested loops join plan as shown below.
—————————————————————————————————————————————— | Id | Operation | Name | Rows | Bytes | —————————————————————————————————————————————— | 0 | SELECT STATEMENT | | 20 | 280 | | 1 | NESTED LOOPS | | 20 | 280 | | 2 | TABLE ACCESS FULL| T1 | 10000| 107K | | *3| INDEX UNIQUE SCAN| T2_PK| 1 | 3 | ——————————————————————————————————————————————
——————————————————————————————————————————————— | Id | Operation | Name | Rows | Bytes | ——————————————————————————————————————————————— | 0 | SELECT STATEMENT | | | | | *1| HASH JOIN | | 20 | 280 | | 2 | TABLE ACCESS FULL | T1 | 10000| 107K | | 3 | TABLE ACCESS FULL | T2 | 20 | 60 | ———————————————————————————————————————————————