As Published In
Oracle Magazine
May/June 2009

TECHNOLOGY: Ask Tom


On Constraints, Metadata, and Truth

By Tom Kyte Oracle Employee ACE

Our technologist uses constraints to improve query performance.

It is interesting to note how important constraints are for query optimization. Many people think of constraints as a data integrity thing, and it’s true—they are. But constraints are used by the optimizer as well when determining the optimal execution plan. The optimizer takes as inputs

 

  • The query to optimize
  • All available database object statistics
  • System statistics, if available (CPU speed, single-block I/O speed, and so on—metrics about the physical hardware)
  • Initialization parameters
  • Constraints


And the optimizer uses them all to determine the best approach. Something I’ve noticed over time is that people tend to skip constraints in a data warehouse/reporting system. The argument is, “The data is good; we did a data cleansing; we don’t need data integrity constraints.” They might not need constraints for data integrity (and they might be unpleasantly surprised if they did enable them), but they do need integrity constraints in order to achieve the best execution plans. In a data warehouse, a bad query plan might be one that takes hours or days to execute—not just a couple of extra seconds or minutes. It is a data warehouse, therefore, that truly needs constraints—for performance reasons.

Let’s look at some examples (these were all executed in Oracle Database 11g Release 1, 11.1.0.7). The first is an example of partition view elimination—a feature of Oracle Database since Release 7.3. If we describe the data contained in tables, Oracle Database will frequently be able to eliminate tables from consideration in the execution plan.

In Listing 1, we set up two tables that contain mutually exclusive data (using the OBJECT_TYPE attribute) and a view that uses UNION ALL to pull them together.

Code Listing 1: Creating tables with mutually exclusive data and a view

 

SQL> create table t1
  2    as
  3    select * from all_objects
  4    where object_type in ( 'TABLE', 'VIEW' );
Table created.

SQL> alter table t1 modify object_type not null;
Table altered.

SQL> alter table t1 add constraint t1_check_otype 
  2    check (object_type in ('TABLE', 'VIEW'));
Table altered.

SQL> create table t2
  2    as
  3    select * from all_objects
  4    where object_type in ( 'SYNONYM', 'PROCEDURE' );
Table created.

SQL> alter table t2 modify object_type not null;
Table altered.

SQL> alter table t2 add constraint t2_check_otype 
  2    check (object_type in ('SYNONYM', 'PROCEDURE'));
Table altered.

SQL> create or replace view v
  2    as
  3    select * from t1
  4    union all
  5    select * from t2;
View created.


Now in Listing 2, we query this view (V) and use OBJECT_TYPE in the WHERE clause to see how the optimizer can optimize a table away.

Code Listing 2: Optimizing a table away

 

SQL> select * from v where object_type = 'TABLE';

Execution Plan
----------------------------------------------------------------------------
Plan hash value: 3982894595

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |    40 |  6320 |   151   (1)| 00:00:02 |
|   1 |  VIEW                | V    |    40 |  6320 |         (1)| 00:00:02 |
|   2 |   UNION-ALL          |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   |  3083 |   475K|    31   (0)| 00:00:01 |
|*  4 |    FILTER            |      |       |       |            |          |
|*  5 |     TABLE ACCESS FULL| T2   |     5 |   790 |   12    (1)| 00:00:02 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
-------------------------------

   3 - filter("OBJECT_TYPE"='TABLE')
   4 - filter(NULL IS NOT NULL)
   5 - filter("OBJECT_TYPE"='TABLE')


At first the execution plan in Listing 2 looks as if it did not get rid of an access to table T2, which it could have, because T2 is known to contain only SYNONYMs and PROCEDUREs but not TABLEs. But on closer inspection, we see a filter step at step 4, and the filter is

 

NULL IS NOT NULL


That is interesting. We never wrote that, but the optimizer added it for us. Because NULL IS NOT NULL is FALSE, that tree of the execution plan will never take place. (You can verify that by using TKPROF if you like. No I/O will be performed against table T2.)

As a second example, we’ll take a look at NOT NULL constraints and see why they are extremely important—in particular when it comes to INDEX utilization. We’ll create a table and index a nullable column:

 

SQL> create table t
  2    as
  3    select * from all_objects;
Table created.

SQL> create index t_idx on t(object_type);
Index created.

SQL> exec 
dbms_stats.gather_table_stats( user, 'T' );
PL/SQL procedure successfully completed.


Now, if we attempt to count the rows in this table, the optimizer has basically only one approach, as shown in Listing 3.

Code Listing 3: One way to count rows

 

SQL> set autotrace traceonly explain
SQL> select count(*) from t;

Execution Plan
----------------------------------------
Plan hash value: 2966233522

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |   283   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T    | 68437 |   283   (1)| 00:00:04 |
-------------------------------------------------------------------


Because OBJECT_TYPE is nullable and indexes do not contain entirely null key entries (if all columns in an index key are null, no entry will be made in the index), we cannot count the rows in the table via the index—we have to use a full table scan. If we tell the database, “OBJECT_TYPE IS NOT NULL,” the plan will immediately change (for the better), as shown in Listing 4.

Code Listing 4: A better way to count rows

 

SQL> alter table t modify object_type NOT NULL;
Table altered.

SQL> set autotrace traceonly explain
SQL> select count(*) from t;

Execution Plan
------------------------------------------
Plan hash value: 1058879072

------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows   | Cost (%CPU)| Time     |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1  |    54   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1  |            |          |
|   2 |   INDEX FAST FULL SCAN| T_IDX | 68437  |    54   (2)| 00:00:01 |
------------------------------------------------------------------------


But, what if OBJECT_TYPE were to permit nulls? Is there something else we could do? Yes—all we’d need is to add at least one attribute that is known to be not null to the index. Then the index could sometimes be used in place of the table for a full scan, and it could be used to answer predicates of the form “WHERE OBJECT_TYPE IS NULL.” That last bit might surprise many people, because they falsely believe that “IS NULL” predicates cannot use an index. I believe that this myth has arisen because people think nulls are not indexed. The fact is that nulls are indexed many times; just entire null index keys are not. So, if at least one attribute of the index is not null, every row will, in fact, appear in the index. Consider the plan in Listing 5.

Code Listing 5: Plan using full table scan when OBJECT_TYPE is nullable

 

SQL> alter table t modify object_type NULL;
Table altered.

SQL> set autotrace traceonly explain
SQL> select * from t where object_type is null;

Execution Plan
-----------------------------
Plan hash value: 1601196873

----------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |   101 |   284   (1) | 00:00:04 |
|*  1 |  TABLE ACCESS FULL | T    |     1 |   101 |   284   (1) | 00:00:04 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
-------------------------------------
   1 - filter("OBJECT_TYPE" IS NULL)


That shows that if OBJECT_TYPE is nullable, the optimizer will not (in fact, cannot) use the index to satisfy “OBJECT_TYPE IS NULL.” If we add an attribute to the index that is not null, the plan will change, however. Here I’ll add the constant zero to the index. (Any not null column will do, and in this case, I just need something not null, and zero is very small and known to be not null.)

 

SQL> drop index t_idx;
Index dropped.

SQL> create index t_idx 
on t (object_type, 0);
Index created.


Now, Listing 6 shows that the plan can and will tend to use the index.

Code Listing 6: Plan using an index when the index has a non-null attribute

 

SQL> select * from t where object_type is null;

Execution Plan
-----------------------------
Plan hash value: 470836197

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 101 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | T | 1 | 101 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IDX | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("OBJECT_TYPE" IS NULL)


Constraints, Primary Keys, and Foreign Keys

Now let’s take a look at primary and foreign keys and how they might affect the optimizer. In the following example, we use a copy of the SCOTT.EMP and SCOTT.DEPT tables—we pretend they are large, using DBMS_STATS.SET_TABLE_STATS to make the optimizer believe they are “big”—and we create the EMP_DEPT view, as shown in Listing 7.

Code Listing 7: Creating "big" EMP and DEPT tables and EMP_DEPT view

 

SQL> create table emp
  2    as
  3    select *
  4    from scott.emp;
Table created.

SQL> create table dept
  2    as
  3    select *
  4    from scott.dept;
Table created.

SQL> create or replace view emp_dept
  2    as
  3    select emp.ename, dept.dname
  4      from emp, dept
  5     where emp.deptno = dept.deptno; 
View created.

SQL> begin
  2       dbms_stats.set_table_stats
  3           ( user, 'EMP', numrows=>1000000, numblks=>100000 );
  4       dbms_stats.set_table_stats
  5           ( user, 'DEPT', numrows=>100000, numblks=>10000 );
  6    end; 
  7    /
PL/SQL procedure successfully completed.


Let’s also suppose that the EMP_DEPT view is used to query the EMP table, the DEPT table, and the result of joining EMP to DEPT. When the view is used to retrieve data from just the EMP table, we observe that the database accesses both the EMP and DEPT tables in the execution plan, as shown in Listing 8.

Code Listing 8: Query on EMP_DEPT view accessing both EMP and DEPT

 

SQL> select ename from emp_dept;

Execution Plan
-----------------------------
Plan hash value: 615168685

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc | Cost (%CPU) | Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 31M| | 31515 (1)| 00:06:19 |
|* 1 | HASH JOIN | | 1000K| 31M| 2448K| 31515 (1)| 00:06:19 |
| 2 | TABLE ACCESS FULL| DEPT | 100K| 1269K| | 2716 (1)| 00:00:33 |
| 3 | TABLE ACCESS FULL| EMP | 1000K| 19M| | 27151 (1)| 00:05:26 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")


Now, we know that access to the DEPT table is not really necessary, because DEPTNO is the primary key of DEPT, and DEPT in the EMP table is, in fact, a foreign key to the DEPT table. That is, when we join EMP to DEPT by DEPTNO, every row in the EMP table that has a non-null DEPTNO value will have exactly one matching record in the DEPT table. We know that it will have at least one match because of the foreign key constraint on EMP and at most one match because of the primary key constraint on DEPT. If we tell Oracle Database about these facts—

 

SQL> alter table dept add constraint 
dept_pk primary key(deptno);
Table altered.

SQL> alter table emp add constraint 
emp_fk_dept foreign key(deptno)
  2    references dept(deptno);
Table altered.


—we’ll see a query plan like the one in Listing 9.

Code Listing 9: Query on EMP_DEPT view, with DEPT access removed

 

SQL> select ename from emp_dept;

Execution Plan
------------------------------
Plan hash value: 3956160932

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50000 |   976K| 27152   (1)| 00:05:26 |
|*  1 |  TABLE ACCESS FULL| EMP  | 50000 |   976K| 27152   (1)| 00:05:26 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("EMP"."DEPTNO" IS NOT NULL)


Next Steps


 ASK Tom
Tom Kyte answers your most difficult technology questions. Highlights from that forum appear in this column.
 

 READ more Tom
Expert Oracle Database Architecture 9i and 10g Programming Techniques and Solutions
tkyte.blogspot.com
Oracle Database Concepts 11g Release 2 (11.2)
Expert Oracle Database Architecture: Oracle Database Programming 9I, 10g, and 11g Techniques and Solutions, Second Edition

 READ more about the Oracle Database 11g
 

 DOWNLOAD Oracle Database 11g Release 2

See how the optimizer has removed the DEPT table from consideration, the HASH JOIN is gone, and a predicate has been added : DEPTNO IS NOT NULL. The optimizer has recognized that, with the foreign key and the primary key in place, the SELECT ENAME FROM EMP WHERE DEPTNO IS NOT NULL query is equivalent to the query contained in the view. This style of optimization, unnecessary table elimination, will have a very positive effect on response time, especially when you’re using “generic views” that join many tables together. When end users query these generic views and do not need information from all of the joined tables, the optimizer will sometimes be able to eliminate the unnecessary joined tables—on a case-by-case basis.

As a side note, this example also demonstrates why SELECT * queries should not be used in real life. If your queries are all in the form “SELECT * FROM . . .”, you won’t benefit from this sort of optimization, and the optimizer will always have to access the DEPT table, because it appears that you want data from it. Always reference only the columns you need in your queries.

Constraints and Materialized Views

Another example of this query rewrite magic involves primary key constraints, foreign key constraints, not-null constraints, and materialized views. I often refer to materialized views as the “indexes of your data warehouse.” The goal, in general, of a materialized view is to “preanswer” complex or long-running queries against detail tables, to save these answers in a persistent table, and to later return these saved answers transparently when end users run ad hoc queries against the detail tables. Much as an index is used in a transactional system to speed up queries, materialized views are used to speed up queries in data warehouse/reporting systems.

Using the EMP and DEPT tables once again from the above example, we’ll remove the constraints:

 

SQL> alter table emp drop constraint emp_fk_dept;
Table altered.

SQL> alter table dept drop constraint dept_pk;
Table altered.


So, we have just the EMP and DEPT tables now, and the optimizer believes they are large (based on our telling it that by using DBMS_STATS). We’ll create a materialized view that joins EMP and DEPT together and computes the count of employees by department, as shown in Listing 10.

Code Listing 10: Materialized view joining EMP and DEPT and computing count

 

SQL> create materialized view mv enable query rewrite
  2    as
  3    select dept.deptno, dept.dname, count (*) from emp, dept
  4     where emp.deptno = dept.deptno
  5     group by dept.deptno, dept.dname;
Materialized view created.

SQL> begin
  2      dbms_stats.set_table_stats
  3      ( user, 'MV', numrows=>100000, numblks=>10000 );
  4    end; 
PL/SQL procedure successfully completed.


Now, if we issue a query of the form in Listing 11, we can see that the optimizer avoids the detail tables altogether and accesses the materialized view. Rather than joining EMP to DEPT for the SALES department and then counting the rows, the optimizer recognized that the materialized view already did that and used this view to quickly answer the query.

Code Listing 11: One query using the materialized view

 

SQL> select dept.dname, count (*) from emp, dept
  2      where emp.deptno = dept.deptno and dept.dname = 'SALES'
  3      group by dept.dname; 

Execution Plan
------------------------------
Plan hash value: 1703036361

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes| Cost (%CPU) | Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 22000 | 2716 (1)| 00:00:33 |
| 1 | SORT GROUP BY NOSORT | | 1000 | 22000 | 2716 (1)| 00:00:33 |
|* 2 | MAT_VIEW REWRITE ACCESS FULL | MV | 1000 | 22000 | 2716 (1)| 00:00:33 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("MV"."DNAME"='SALES')


However, if we ask a slightly different question, as shown in Listing 12, we’ll see that the optimizer does not rewrite the query to use the materialized view.

Code Listing 12: One query not using the materialized view



 

SQL> select count(*) from emp;

COUNT(*)
--------
      14

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-----------------------
SQL_ID  g59vz2u4cu404, child number 1
-----------------------
select count(*) from emp

Plan hash value: 2083865914

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       | 27142 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| EMP  |  1000K| 27142   (1)| 00:05:26 |
-------------------------------------------------------------------
14 rows selected.


As you see, the optimizer chooses to do a full table scan of the EMP table, count the rows, and return the results. Now, you know and I know that it could have chosen a full scan of the much smaller materialized view and summed up the precomputed count. That is, instead of counting individual employee records in EMP, the query could have summed up the DEPTNO-based count in the materialized view—in much less time. We know that this could have been done because we know that

 

  • DEPTNO in DEPT is a primary key
  • DEPTNO in EMP is a foreign key
  • DEPTNO in EMP is a NOT NULL column


We know that, but the database does not. If we tell the database that—

 

SQL> alter table dept add constraint 
dept_pk primary key(deptno);
Table altered.

SQL> alter table emp add constraint 
emp_fk_dept foreign key(deptno)
  2    references dept(deptno);
Table altered.

SQL> alter table emp modify deptno 
NOT NULL;
Table altered.


—then the next time we ask that same question, we’ll see that the optimizer rewrote the query to access the smaller materialized view (instead of the large EMP table), as shown in Listing 13.

Code Listing 13: Postconstraints, query now using materialized view

 

SQL> select count(*) from emp;

COUNT(*)
-------
     14

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-----------------------
SQL_ID  g59vz2u4cu404, child number 2
-----------------------
select count (*) from emp

Plan hash value: 1747602359

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2716 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | MAT_VIEW REWRITE ACCESS FULL| MV | 100K| 1269K| 2716 (1)| 00:00:33 |
--------------------------------------------------------------------------------------
14 rows selected.


 


Tom Kyte Headshot


Tom Kyte is a database evangelist in Oracle’s Server Technologies division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture (Apress, 2005, 2010) and Effective Oracle by Design (Oracle Press, 2003), among other books.

 



Send us your comments